ARTICLEDEV.to AI·10d ago
I tracked Claude Code and Codex pass-rates for 95 days — what "getting dumber" actually looks like
This article tracks the daily SWE-Bench-Pro pass rates for Claude Code and Codex over 95 days, debunking the "getting dumber" myth with data. It reveals a significant 11 percentage point improvement in task completion from Opus 4.6 (54%) to Opus 4.7 (65%), demonstrating the model's material betterment.
28