← heapsort-ai

LLM Agents

35 items

RESEARCHarXiv CS.CL·4/20/2026

PolicyBank: Evolving Policy Understanding for LLM Agents

PolicyBank proposes a novel memory mechanism for LLM agents to iteratively refine their understanding of organizational policies, addressing ambiguities and gaps through feedback. Unlike existing systems, it allows agents to evolve their interpretation instead of treating policies as immutable ground truth, also introducing a systematic testbed for alignment failures.

35
RESEARCHarXiv CS.AI·5/4/2026

Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents

This research challenges the assumption that tool-augmented reasoning always improves LLM performance, showing that it can underperform native CoT due to a "tool-use tax" from the tool-calling protocol, especially with semantic noise. A Factorized Intervention Framework is proposed to analyze this, and G-STEP is introduced as a partial mitigation for protocol-induced errors.

28
RESEARCHarXiv CS.AI·4/15/2026

The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break

This research addresses the breakdown of LLM agents in long-horizon tasks, which require extended, interdependent action sequences. It introduces HORIZON, a cross-domain diagnostic benchmark designed to systematically construct tasks and analyze failure behaviors, evaluating state-of-the-art agents and proposing an LLM-as-a-Judge pipeline for scalable failure attribution.

27
RESEARCHarXiv CS.AI·5/9/2026

From History to State: Constant-Context Skill Learning for LLM Agents

This paper proposes constant-context skill learning, a novel framework for LLM agents to manage recurring workflows more efficiently. It addresses privacy, cost, and capability challenges by learning reusable procedures in task-family modules and conditioning inference on a compact state block. Its effectiveness is demonstrated across benchmarks like ALFWorld, WebShop, and SciWorld.

27