RESEARCHDEV.to AI·1d ago
WorldBench: Top MLLM Scores 64% on Visually Diverse Benchmark
WorldBench, a new multimodal benchmark from MIT researchers, evaluates 15 MLLMs on visually diverse images, revealing fundamental gaps in visual understanding with the top model scoring only 64.0% accuracy. The benchmark prioritizes visual diversity over various task types to expose these shortcomings.
40