RESEARCH27

Agentic Frameworks for Reasoning Tasks: An Empirical Study

arXiv CS.AI·April 21, 2026

This empirical study evaluates 22 agentic frameworks across three reasoning benchmarks (BBH, GSM8K, ARC) to compare their performance, efficiency, and practical suitability. Results indicate that 19 frameworks completed all tasks, with 12 demonstrating stable performance at 74.6-75.9% accuracy, 4-6 seconds execution time, and 0.14-0.18 cents per task cost.

AI frameworks performance evaluation Benchmarking AI agents

Read original ↗