← heapsort
RESEARCH40

WorldBench: Top MLLM Scores 64% on Visually Diverse Benchmark

DEV.to AIΒ·June 8, 2026

WorldBench, a new multimodal benchmark from MIT researchers, evaluates 15 MLLMs on visually diverse images, revealing fundamental gaps in visual understanding with the top model scoring only 64.0% accuracy. The benchmark prioritizes visual diversity over various task types to expose these shortcomings.

Read original β†—