RESEARCH27
MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes
arXiv CS.CLΒ·May 11, 2026
This paper introduces MIST, a synthetic multi-turn, voice-driven code generation dataset for IoT devices. The authors identify a significant performance gap between open- and closed-weight multimodal LLMs on this dataset, indicating substantial room for improvement.
Read original β