RESEARCHarXiv CS.AI·4/14/2026
LABBench2: An Improved Benchmark for AI Systems Performing Biology Research
LABBench2 is introduced as an improved benchmark for evaluating AI systems performing biology research, evolving from the original LAB-Bench. It aims to measure real-world capabilities in useful scientific tasks, moving beyond basic knowledge and reasoning, and comprises nearly 1,900 tasks.
28