LLM

612 items

ARTICLEDEV.to AI·4/16/2026

Complete Guide to AI-Powered Zero-Day Vulnerability Discovery — Claude Opus 4.6's 500+ Zero-Days and the Security Paradigm Shift

This article analyzes how Claude Opus 4.6 discovered over 500 zero-day vulnerabilities, including a 23-year-old Linux kernel bug, transforming LLMs into autonomous security research agents. It explores the technical mechanisms and DevSecOps implications of this AI-driven vulnerability discovery.

zero-day cybersecurity AI vulnerability discovery

ARTICLEDEV.to AI·6d ago

Bypassing the "Multimodal Tax": How I Cut Voice AI Costs and Secured Biometric Privacy

This article details a method to reduce costs and enhance privacy for voice-enabled AI agents by decoupling raw audio processing from LLM logic. It highlights the expensive and privacy-invasive nature of sending raw microphone data directly to multimodal APIs, proposing an alternative architecture exemplified by LangForge.

privacy security Cost Optimization LLM

ARTICLEDEV.to AI·5/6/2026

Structured Context Before AI: The Rule That Made My Legacy Analysis Tool Useful

AI-assisted coding tools often fail because they are asked to think too early without sufficient structured context. To get useful AI output, especially with legacy codebases, it's crucial to provide comprehensive context beyond a single file.

software development Context AI legacy systems

ARTICLEDEV.to AI·5/8/2026

Web4.0 Is Coming

This article explores the rarity of experienced "LLM integration developer" roles, even as AI emerges as a revolutionary computing platform. It points out the challenge for companies to find engineers with complete AI development experience given the rapid rise of LLM development.

hiring future-of-work AI development LLM

CASEDEV.to AI·18d ago

Our agent burned through $40 in 3 minutes. Here’s how we got it to $1.

An AI agent for incident response initially incurred high costs, burning $40 in 3 minutes due to excessive use of a large language model. By redesigning the architecture with dynamic routing and context retention, the team reduced inference costs by 65%.

inference costs Architecture Cost Optimization AI Agents

ARTICLEDEV.to AI·5/3/2026

I wrote a custom CUDA inference engine to run Qwen3.5-27B on $130 mining cards

A developer created a custom CUDA inference engine to successfully run the Qwen3.5-27B large language model on low-cost, repurposed mining graphics cards. This innovative approach demonstrates significant hardware optimization, making powerful AI models more accessible on affordable consumer-grade hardware.

CUDA Optimization inference hardware

RESEARCHDEV.to AI·5/8/2026

Physics‑based adaptation slashes edge LLM energy

QEIL v2 revolutionizes edge-LLM energy efficiency by replacing static heuristics with a physics-derived energy model and simulated annealing. This system dramatically cuts inference energy by adapting resource allocation based on semiconductor physics, achieving significant performance improvements.

Optimization Edge AI Energy Efficiency resource management

ARTICLEDEV.to AI·5/8/2026

Stop Rereading Your Documents. Let the AI Study Them Once.

This content highlights the inefficiency of naive RAG workflows that repeatedly re-synthesize answers for static knowledge, incurring costs and inconsistencies. It advocates for compiling knowledge at ingest time, a pattern proposed by Andrej Karpathy (llm-wiki.md), where an LLM reads a document once to build structured wiki pages. Zenii reportedly implements this optimized pattern out-of-the-box.

RAG AI workflow knowledge management Information Retrieval

RESEARCHarXiv CS.CL·4/13/2026

SynDocDis: A Metadata-Driven Framework for Generating Synthetic Physician Discussions Using Large Language Models

SynDocDis is a novel framework that utilizes Large Language Models and de-identified case metadata to generate clinically accurate synthetic physician-to-physician dialogues. This approach addresses the scarcity of real discussion data due to privacy concerns, aiming to enrich AI agents with valuable clinical knowledge.

synthetic data Medical Dialogue Generation privacy healthcare AI

NEWSDEV.to AI·4/26/2026

DeepSeek V4 Pro Just Dropped — Here's What Changed for AI Agents

DeepSeek V4 Pro launched on April 24, 2026, featuring 1.6T parameters, 1M token context, and dual Think/Non-Think modes with an MIT license. It is optimized for AI agent workloads, offering improved multi-step planning and more reliable function calling at a competitive price compared to Claude Sonnet 4.6 and GPT-4o.

deepseek-v4-pro performance AI Agents Pricing

ARTICLEDEV.to AI·15d ago

Most people starting with local LLMs jump straight to 4-bit quantization because it's fast and uses

This article compares 16-bit, 8-bit, and 4-bit LLM quantization, revealing that 4-bit, while faster, significantly compromises quality on reasoning and math tasks. The real trade-off is between the task and required precision, with 8-bit being optimal for precision-demanding tasks, offering minimal quality loss with only a slight speed reduction. Quantization choice should be based on the task and hardware considerations, not solely on hardware.

inference speed model performance quantization hardware

ARTICLEDEV.to AI·5/4/2026

Why Your Vector Index Returns Five Copies of the Same Doc

The content describes a common failure mode in RAG systems where the vector index returns multiple copies of the same document chunk, filling the LLM's context window with redundancy. This prevents the LLM from accessing diverse information and providing nuanced answers; the solution involves hash-deduplication before ranking and MMR.

RAG vector search AI Information Retrieval

DOCAWS Machine Learning Blog·19d ago

Break the context window barrier with Amazon Bedrock AgentCore

This post teaches how to implement Recursive Language Models (RLM) using Amazon Bedrock AgentCore Code Interpreter and the Strands Agents SDK. It covers processing documents of unlimited length and utilizing Bedrock AgentCore Code Interpreter as persistent memory for iterative analysis.

Agentcore RLM learning Amazon Bedrock

RESEARCHarXiv CS.LG·4/22/2026

Easy Samples Are All You Need: Self-Evolving LLMs via Data-Efficient Reinforcement Learning

This research introduces EasyRL, a novel data-efficient reinforcement learning approach for self-evolving LLMs, designed to overcome high annotation costs and performance issues in existing methods. Inspired by cognitive learning theory, EasyRL integrates knowledge transfer from easy labeled data with a progressive divide-and-conquer strategy for difficult unlabeled data.

Data efficiency reinforcement learning machine learning LLM

CASEDEV.to AI·4/15/2026

How to write a macOS window manager

The content describes how to use an AI agent to develop a custom macOS window manager, specifically a "Discrete Window Manager" tailored for individual workflow. It highlights the AI's role in generating all the code, with human involvement primarily in code review and testing.

software development Window Manager macOS AI Agents

RESEARCHarXiv CS.AI·5/6/2026

Programmatic Context Augmentation for LLM-based Symbolic Regression

This paper introduces a novel LLM-based evolutionary search framework for symbolic regression, addressing the limitations of existing methods that rely solely on scalar evaluation metrics. It incorporates programmatic context augmentation to enable code-based data analysis and richer information extraction, aiming to improve the discovery of mathematical expressions.

data analysis Symbolic Regression AI research LLM

RESEARCHarXiv CS.CL·5/7/2026

MedFabric and EtHER: A Data-Centric Framework for Word-Level Fabrication Generation and Detection in Medical LLMs

This paper introduces MedFabric, a data-centric pipeline to generate realistic and word-level fabrications in medical LLMs, addressing shortcomings in existing datasets. It also presents ETHER, a modular word-level fabrication detector that integrates various techniques to enhance factual evaluation.

hallucination data-centric AI Healthcare AI safety

RESEARCHarXiv CS.AI·21d ago

Operationalizing Document AI: A Microservice Architecture for OCR and LLM Pipelines in Production

This paper presents a microservice architecture for operationalizing document understanding pipelines, combining OCR and Large Language Models for structured field extraction at production scale. It details key design decisions like asynchronous processing and independent scaling, noting OCR's dominance in end-to-end latency.

microservices production Document AI OCR

RESEARCHarXiv CS.CL·14d ago

CroCo: Cross-Lingual Contrastive Preference Tuning on Self-Generations

This work introduces CroCo, a method for cross-lingual contrastive preference tuning on self-generated responses from LLMs, demonstrating effective transfer across 14 languages without language-specific preference annotations. An English-trained reward model yields useful rankings across most languages, improving existing models and preventing catastrophic forgetting, provided on-policy data is used.

research machine learning NLP multilingual

RESEARCHarXiv CS.LG·8d ago

BitsMoE: Efficient Spectral Energy-Guided Bit Allocation for MoE LLM Quantization

BitsMoE proposes a spectral-energy-guided bit-allocation framework for quantizing Mixture-of-Experts (MoE) large language models. It addresses memory-intensive deployment by decomposing MoE layers and using expert-specific spectral factors for fine-grained, activation-aware mixed-precision quantization.

MoE models deep learning AI optimization quantization