RESEARCH27

GPT-5.4 Fails Client-Ready Test: 0% Pass Rate in Banking Benchmark

DEV.to AI·April 26, 2026

A new benchmark, BankerToolBench, revealed that top AI models like GPT-5.4 and Claude Opus 4.6 failed to produce client-ready work for junior investment banker tasks. Despite leading among models, GPT-5.4 still failed nearly half the criteria, indicating significant limitations in complex professional applications.

AI limitations Financial services professional tasks Benchmarking Generative AI

Read original ↗