LLM Agents

35 items

RESEARCHarXiv CS.AI·4/27/2026

Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results

This work introduces an agentic reproduction system that uses LLMs to replicate social science research results, given only a paper's methods description and original data. Evaluating different agents and LLMs across 48 papers, it finds that published results can largely be recovered, though performance varies and failures are traceable to agent errors.

scientific methods social science research LLM Agents Reproducibility

RESEARCHarXiv CS.AI·4/20/2026

The World Leaks the Future: Harness Evolution for Future Prediction Agents

This research addresses the challenge of future prediction using LLM agents, where evidence evolves and useful supervision arrives only after an event is resolved. It introduces "internal feedback" derived from revisiting predictions over time and proposes "Milkyway", a self-evolving agent system that updates a persistent state to enhance prediction accuracy.

LLM Agents future prediction self-evolving agents Agent systems

RESEARCHarXiv CS.LG·25d ago

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

EvolveMem introduces a self-evolving memory architecture for LLM agents that allows both stored knowledge and retrieval mechanisms to co-evolve. It optimizes its configuration autonomously using an LLM-powered diagnosis module, leading to a closed-loop AutoResearch process.

LLM Agents AutoResearch self-evolving systems memory architecture

RESEARCHarXiv CS.AI·28d ago

SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents

SkillLens is a hierarchical skill-evolution framework for LLM agents that organizes and reuses skills at mixed granularity. It allows agents to directly reuse compatible subskills while adapting only locally mismatched parts, optimizing cost-efficiency and relevance.

Skill reuse LLM Agents AI frameworks Natural Language Processing

RESEARCHarXiv CS.AI·29d ago

From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms

Large Language Model (LLM)-based agents have reshaped artificial intelligence, yet research on memory mechanisms remains fragmented. This survey proposes a novel evolutionary framework for LLM agent memory mechanisms, formalizing the development process into three stages: Storage, Reflection, and Experience.

Evolutionary framework LLM Agents research Memory mechanisms

RESEARCHarXiv CS.AI·20d ago

Trustworthy Agent Network: Trust in Agent Networks Must Be Baked In, Not Bolted On

The rise of autonomous LLM-based agents forming Agent-to-Agent (A2A) networks introduces systemic vulnerabilities despite improved task performance. This paper argues that trustworthiness in A2A networks must be architected from the outset, rather than retrofitted, to mitigate risks like adversarial composition and cascading failures.

LLM Agents trustworthiness security agent networks

RESEARCHarXiv CS.AI·8d ago

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

This paper disentangles two self-evolving LLM agent capabilities: harness-updating (producing useful updates) and harness-benefit (gaining from these updates). The analysis reveals that harness-updating is surprisingly consistent across models of different base capabilities, suggesting that even less capable models can produce useful updates.

AI capabilities LLM Agents machine learning self-evolution

ARTICLEDEV.to AI·4/16/2026

Ai Financial Agents Hallucinating With Real Money How To Build Brokerage Grade Guardrails

Autonomous LLM agents in finance pose significant risks, as hallucinations can lead to real money losses and regulatory scrutiny. AI orchestration layers must be treated as Tier-1 infrastructure with brokerage-grade guardrails, integrating them into the control environment from day one.

LLM Agents Financial services risk management AI safety

RESEARCHarXiv CS.AI·4/6/2026

Aligning Progress and Feasibility: A Neuro-Symbolic Dual Memory Framework for Long-Horizon LLM Agents

O título sugere uma pesquisa sobre um framework neuro-simbólico de memória dupla para agentes LLM, visando alinhar progresso e viabilidade em tarefas de longo horizonte. Ele aborda a melhoria da capacidade de agentes de IA para planejar e executar ações complexas ao longo do tempo.

memory architectures LLMs LLM Agents Neuro-Simbólico

RESEARCHarXiv CS.AI·4/6/2026

AIVV: Neuro-Symbolic LLM Agent-Integrated Verification and Validation for Trustworthy Autonomous Systems

Este título descreve uma pesquisa focada na verificação e validação de sistemas autônomos confiáveis, utilizando uma abordagem neuro-simbólica integrada a agentes LLM. O objetivo é garantir a robustez e a segurança de sistemas de IA avançados.

LLM Agents Autonomous systems Verification and Validation trustworthy AI

RESEARCHarXiv CS.AI·21d ago

ANNEAL: Adapting LLM Agents via Governed Symbolic Patch Learning

ANNEAL is a neuro-symbolic agent that repairs recurring LLM agent failures via governed symbolic edits of a process knowledge graph. It localizes the responsible operator, synthesizes a typed patch, and validates it with symbolic guardrails and canary testing before committing the change.

LLM Agents Knowledge Graphs error recovery AI Governance

ARTICLEDEV.to AI·4/14/2026

Qwen Models for Hermes Agent — Open-Source Agent Workflows

Qwen3's Apache 2.0 license enables flexible Hermes Agent workflows, supporting fine-tuning, private deployment, and commercial use without restrictions. The entire Qwen3 lineup, running locally via Ollama, facilitates various agent use cases on modest hardware and at zero API cost.

Apache 2.0 LLM Agents Hermes Agent open-source AI

NEWSDEV.to AI·4/12/2026

LLM Agent Workflows: Local AI Support, Prompt Tooling, & Claude Code API Costs

This content discusses practical advancements in LLM applications, focusing on local AI agents for customer support, prompt engineering tools, and Claude Code API costs. It outlines a vision for fully offline and private LLM-based customer support agents for platforms like WhatsApp and Telegram, emphasizing data privacy.

prompt-engineering LLM Agents data privacy Local AI

ARTICLEDEV.to AI·5/2/2026

Stuck in the Birch Log Blues 🪵😩

This content describes a frustrating experience where an AI agent, Kiwi-chan, got stuck in a loop of failure trying to gather birch logs, despite code repair attempts by an LLM, Qwen. The issue highlights the AI's difficulty in self-correction and recognizing the need to explore rather than just focusing on immediate fixes.

LLM Agents AI debugging AI failure

RESEARCHarXiv CS.AI·4/6/2026

Let's Have a Conversation: Designing and Evaluating LLM Agents for Interactive Optimization

Este conteúdo aborda a concepção e avaliação de agentes LLM para otimização interativa. Ele explora métodos para criar e medir a eficácia de sistemas de IA conversacionais.

Interactive Optimization LLM Agents evaluation AI design