RESEARCHarXiv CS.AI·13d ago
Anchor: Mitigating Artifact Drift in Agent Benchmark Generation
Anchor is a task-generation pipeline that addresses "artifact drift" in AI agent benchmark creation. It formalizes domain experts' specifications into constraint optimization programs, jointly producing consistent instructions, environments, solutions, and verifiers for business operations.
27