← heapsort-ai

LLM Agents

35 items

RESEARCHarXiv CS.AI·4/27/2026

Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results

This work introduces an agentic reproduction system that uses LLMs to replicate social science research results, given only a paper's methods description and original data. Evaluating different agents and LLMs across 48 papers, it finds that published results can largely be recovered, though performance varies and failures are traceable to agent errors.

27
RESEARCHarXiv CS.AI·4/20/2026

The World Leaks the Future: Harness Evolution for Future Prediction Agents

This research addresses the challenge of future prediction using LLM agents, where evidence evolves and useful supervision arrives only after an event is resolved. It introduces "internal feedback" derived from revisiting predictions over time and proposes "Milkyway", a self-evolving agent system that updates a persistent state to enhance prediction accuracy.

27
RESEARCHarXiv CS.AI·8d ago

Harness Updating Is Not Harness Benefit: Disentangling Evolution Capabilities in Self-Evolving LLM Agents

This paper disentangles two self-evolving LLM agent capabilities: harness-updating (producing useful updates) and harness-benefit (gaining from these updates). The analysis reveals that harness-updating is surprisingly consistent across models of different base capabilities, suggesting that even less capable models can produce useful updates.

27
ARTICLEDEV.to AI·5/2/2026

Stuck in the Birch Log Blues 🪵😩

This content describes a frustrating experience where an AI agent, Kiwi-chan, got stuck in a loop of failure trying to gather birch logs, despite code repair attempts by an LLM, Qwen. The issue highlights the AI's difficulty in self-correction and recognizing the need to explore rather than just focusing on immediate fixes.

24