LABBench2: An Improved Benchmark for AI Systems Performing Biology Research
LABBench2 is introduced as an improved benchmark for evaluating AI systems performing biology research, evolving from the original LAB-Bench. It aims to measure real-world capabilities in useful scientific tasks, moving beyond basic knowledge and reasoning, and comprises nearly 1,900 tasks.
