performance tracking — AI articles, news & research

ARTICLEDEV.to AI·10d ago

I tracked Claude Code and Codex pass-rates for 95 days — what "getting dumber" actually looks like

This article tracks the daily SWE-Bench-Pro pass rates for Claude Code and Codex over 95 days, debunking the "getting dumber" myth with data. It reveals a significant 11 percentage point improvement in task completion from Opus 4.6 (54%) to Opus 4.7 (65%), demonstrating the model's material betterment.

AI models performance tracking SWE-Bench-Pro Claude Code