← heapsort
RESEARCH27

MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes

arXiv CS.CLΒ·May 11, 2026

This paper introduces MIST, a synthetic multi-turn, voice-driven code generation dataset for IoT devices. The authors identify a significant performance gap between open- and closed-weight multimodal LLMs on this dataset, indicating substantial room for improvement.

Read original β†—