LABBench2 — AI articles, news & research

ARTICLEDEV.to AI·4/15/2026

LABBench2 Benchmark Shows AI Biology Agents Struggle with Real-World Tasks

Researchers introduced LABBench2, a new 1,900-task benchmark for AI in biology, showing current models perform 26-46% worse on realistic tasks versus simplified ones. This exposes a critical gap between AI's theoretical understanding and its ability to perform practical scientific work.

LABBench2 AI limitations scientific AI agents AI in biology