ARTICLE27

LABBench2 Benchmark Shows AI Biology Agents Struggle with Real-World Tasks

DEV.to AI·15 de abril de 2026

Pesquisadores lançaram o LABBench2, um benchmark de 1.900 tarefas para IA em biologia, revelando que os modelos atuais têm um desempenho 26-46% pior em tarefas realistas. Isso expõe uma lacuna crítica entre o conhecimento teórico da IA e sua capacidade de realizar trabalho científico prático.

LABBench2 AI limitations scientific AI agents AI in biology benchmarking AI

Ler original ↗