← heapsort
RESEARCH27

PRISM: Perception Reasoning Interleaved for Sequential Decision Making

arXiv CS.AIΒ·May 9, 2026

PRISM is a novel framework that integrates perception (VLM) and decision (LLM) through a dynamic question-answer pipeline, enabling the LLM to actively refine the VLM's output for task-driven scene understanding. This approach significantly outperforms existing image-based models on benchmarks like ALFWorld and Room-to-Room.

Read original β†—