Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs
This paper investigates failures in Audio LLMs when transcribing English-Mandarin code-switching speech, identifying issues like language omission and translation. Applying Direct Preference Optimization (DPO) aligns models to preserve mixed-language content, leading to significant reductions in Mixed Error Rate (MER).