RESEARCHarXiv CS.CL·8d ago
Your Multimodal Speech Model Says I Have a Face for Radio
This paper proposes the first bias evaluation of multimodal speech recognition, revealing significant quality-of-service differences across mWhisper-Flamingo and Gemini models based on self-declared gender and ethnicity. These findings highlight a priority for developers to evaluate, fix, and communicate such biases.
27