← heapsort
RESEARCH27

Your Multimodal Speech Model Says I Have a Face for Radio

arXiv CS.CLΒ·June 1, 2026

This paper proposes the first bias evaluation of multimodal speech recognition, revealing significant quality-of-service differences across mWhisper-Flamingo and Gemini models based on self-declared gender and ethnicity. These findings highlight a priority for developers to evaluate, fix, and communicate such biases.

Read original β†—