RESEARCH27
Your Multimodal Speech Model Says I Have a Face for Radio
arXiv CS.CLΒ·June 1, 2026
This paper proposes the first bias evaluation of multimodal speech recognition, revealing significant quality-of-service differences across mWhisper-Flamingo and Gemini models based on self-declared gender and ethnicity. These findings highlight a priority for developers to evaluate, fix, and communicate such biases.
Read original β