RESEARCHarXiv CS.CL·29d ago
MIST: Multimodal Interactive Speech-based Tool-calling Conversational Assistants for Smart Homes
This paper introduces MIST, a synthetic multi-turn, voice-driven code generation dataset for IoT devices. The authors identify a significant performance gap between open- and closed-weight multimodal LLMs on this dataset, indicating substantial room for improvement.
27