RESEARCH27

Do Vision-Language-Models show human-like logical problem-solving capability in point and click puzzle games?

arXiv CS.AI·May 13, 2026

This paper introduces VLATIM, a new benchmark designed to evaluate the human-like logical problem-solving capabilities of Vision-Language Models (VLMs) in point-and-click physics puzzle games. It reveals a significant disparity between reasoning and execution in large proprietary models when solving The Incredible Machine 2.

puzzle games Vision-Language Models interactive AI Benchmarking AI Reasoning

Read original ↗