LLMs

722 items

ARTICLEDEV.to AI·4/17/2026

I Run 14 AI Agents 24/7 on a 16GB MacBook — Here's What Broke First

The author runs 14 AI agents 24/7 on a 16GB MacBook, challenging the consensus that powerful hardware is essential for serious AI workloads. These agents, which orchestrate a real business, are managed in waves with only 1-3 executing simultaneously to maintain persistent state.

AI orchestration LLMs Local AI hardware

ARTICLEDEV.to AI·22d ago

AI Coding Tools Need Better Boundaries, Not Better Prompts

AI coding tools excel at rapid prototyping but can degrade long-term maintainability due to insufficient boundaries and conventions. Instead of complex prompts, methodologies like Spec-Driven Development (SDD) are crucial to define contracts and validate specifications before implementation, treating LLMs as mere implementation engines.

LLMs spec-driven development code generation software engineering

ARTICLEDEV.to AI·26d ago

Your OpenClaw Bill Is Bleeding Tokens. Here’s What We Measured — and How to Fix It.

This article discusses the issue of high token consumption in LLM agent stacks like OpenClaw, driven by memory bloat and compaction loss. It proposes solutions to reduce token spend by approximately 32% without sacrificing agent intelligence, emphasizing a retrieval-first approach.

LLMs memory management cost reduction token optimization

ARTICLEDEV.to AI·5/2/2026

I Built a Benchmark for the Failures Generic LLM Evaluations Miss

The author highlights that generic LLM benchmarks fail to capture critical 'judgment failures' in real-world workflows, such as over-claiming or mishandling pricing. They developed a new benchmark to specifically measure these complex behavioral errors that typical evaluations miss.

LLMs AI limitations benchmarking AI evaluation

RESEARCHarXiv CS.CL·4/7/2026

Self-Execution Simulation Improves Coding Models

Este trabalho demonstra que LLMs de código podem ser treinados para simular a execução de programas passo a passo, melhorando o desempenho em programação competitiva. A abordagem combina fine-tuning supervisionado e aprendizado por reforço, permitindo que os modelos realizem auto-verificação e correção iterativa.

LLMs reinforcement learning code generation program execution simulation

RESEARCHarXiv CS.CL·19d ago

When Cases Get Rare: A Retrieval Benchmark for Off-Guideline Clinical Question Answering

This paper introduces OGCaReBench, a new retrieval-focused benchmark aimed at evaluating LLMs' ability to answer clinical questions that go beyond typical medical guidelines. It addresses the gap where most medical LLMs are trained on common, guideline-focused knowledge, while real-world care often involves rare cases not covered by guidelines.

LLMs benchmarking case reports medical AI

RESEARCHarXiv CS.LG·23d ago

Reducing the Safety Tax in LLM Safety Alignment with On-Policy Self-Distillation

This paper introduces on-policy self-distillation (OPSA) to reduce the "safety tax" in LLM safety alignment. OPSA addresses the distributional mismatch of off-policy training by having the model generate its own rollouts and receive dense per-token KL supervision from a frozen teacher.

LLMs machine learning alignment AI safety

RESEARCHarXiv CS.LG·16d ago

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

This research proposes that LLM reasoning is a dynamic decoding state, not a static property, observable through early-stage entropy dynamics during generation. Tasks benefiting from Chain-of-Thought exhibit consistent entropy reduction, interpreted as a phase-transition to a structured reasoning regime.

AI models LLMs Chain-of-Thought Reasoning

RESEARCHarXiv CS.CL·16d ago

When AI Takes Sides on Questions of Faith: Persistent Asymmetries in AI-Mediated Faith Guidance

Large language models (LLMs) exhibit consistent asymmetries when advising on religious conversions, favoring some religions like Catholic, Baháʼí, and Sikh, while subtly discouraging others such as Atheists and Jehovah's Witnesses. These biases vary by model and provider, with Grok 4.20 showing the strongest asymmetries, identified through an LLM-as-a-judge framework.

LLMs Religion faith AI ethics

RESEARCHarXiv CS.CL·6d ago

A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models

A large-scale empirical study assesses the robustness of linguistic signals for characterizing AI-generated text. The analysis shows that classifiers based solely on linguistic features reliably distinguish AI-generated from human-written text, highlighting lexical richness as a robust indicator.

robustness LLMs AI-generated text text detection

ARTICLEDEV.to AI·4/16/2026

"The Real Cost of Compute: Why AI Agents Are Rethinking Their Economics in 2026"

In 2026, the prohibitive cost of running large language models for autonomous AI agents is forcing enterprises to rethink AI economics. Many are finding that smaller, specialized models offer better cost-effectiveness and performance than state-of-the-art LLMs for real-world tasks.

LLMs AI economics Enterprise AI compute costs

RESEARCHarXiv CS.CL·8d ago

Toward Robust In-Context Learning: Leveraging Out-of-distribution Proxies for Target Inaccessible Demonstration Retrieval

This paper introduces DOPA, a demonstration search framework for robust in-context learning with Large Language Models (LLMs). DOPA uses an OOD proxy to approximate inaccessible target domains and a Mahalanobis distance-based global diversity constraint for demonstration retrieval.

LLMs learning machine learning in-context learning

RESEARCHarXiv CS.AI·6d ago

SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models

This paper introduces SMAC-Talk, a natural language extension of the StarCraft Multi-Agent Challenge, designed to evaluate LLM-based agents in cooperative multi-agent environments. It features a natural language communication channel to probe agent coordination and trust, including scenarios with deceptive communicators.

LLMs Natural Language Processing StarCraft multi-agent systems

RESEARCHarXiv CS.LG·12d ago

Molecular Lead Optimization via Agentic Tool Planning

This paper introduces TRACE, a trajectory-aware, LLM-reasoning agent for molecular lead optimization, addressing the limitation of one-step molecular optimization. It formulates tool selection as a sequential decision-making problem over action trajectories, crucial for transforming early hit compounds into viable drug candidates. TRACE aims to improve ADMET-related properties through subtle structural refinement while preserving key molecular substructures.

LLMs Molecular Optimization AI in chemistry drug discovery

ARTICLEDEV.to AI·25d ago

Word Embeddings Explained: The Math Behind AI, LLMs, and Chatbots

This article explains the concept of word embeddings, which represent words as vectors in a high-dimensional space. It details the key mathematical operations behind their functionality, such as distance, similarity, and dot product, illustrating them with numerical examples.

chatbots LLMs learning AI

ARTICLEDEV.to AI·22d ago

Looking for a Founding Engineer / Technical Partner (AI Agent + Fintech Rails)

A startup founder is seeking a senior full-stack technical lead to join as a founding engineer and own product architecture. The role involves building an intelligent AI for autonomously ingesting and parsing legal contracts, extracting deliverables and payment schedules, within a fintech context.

hiring LLMs FinTech Startup

ARTICLEDEV.to AI·5/8/2026

AI Slop Is a Commitment Problem

The article discusses how "AI slop," plausible content generated effortlessly by AI, is harming online communities. It argues that the ability to quickly generate large volumes of text has undermined the value of effort as a proxy for legitimacy and knowledge.

LLMs online-communities digital legitimacy content quality

ARTICLEDEV.to AI·4/8/2026

Why Skillware is the Next Evolution for Autonomous Agents

O Skillware é introduzido como um framework Python inovador para agentes de IA, visando superar as limitações das abordagens baseadas em prompts na execução de lógica de negócios complexa. Ele permite empacotar inteligência e capacidades como unidades instaláveis, definindo comportamentos complexos de forma modular para maior confiabilidade empresarial.

LLMs frameworks Python Enterprise AI

DOCDEV.to AI·28d ago

Build a Medical Chart Coding Pipeline with Daimon, Claude, and Neo4j

Daimon, a Go sidecar, simplifies LLM application development by automating infrastructure like JSON schemas and integration with vector stores and graph databases. It automatically generates LLM tools from configuration, demonstrated by building a medical chart coding pipeline.

LLMs Claude application development Neo4j

ARTICLEDEV.to AI·4/18/2026

Why Our LLM-Powered Data Analytics Pipeline in R Broke Down at Scale

This article recounts the breakdown of an LLM-powered R data analytics pipeline that performed well in a proof-of-concept but utterly failed at scale. The story aims to warn and educate about the challenges of integrating large language models into R data workflows in production.

scalability LLMs R programming Production issues