RESEARCH27
PRISM: Perception Reasoning Interleaved for Sequential Decision Making
arXiv CS.AIΒ·May 9, 2026
PRISM is a novel framework that integrates perception (VLM) and decision (LLM) through a dynamic question-answer pipeline, enabling the LLM to actively refine the VLM's output for task-driven scene understanding. This approach significantly outperforms existing image-based models on benchmarks like ALFWorld and Room-to-Room.
Read original β