← heapsort-ai

Speech Recognition

18 items

RESEARCHarXiv CS.CL·4/10/2026

Contextual Earnings-22: A Speech Recognition Benchmark with Custom Vocabulary in the Wild

Apesar da estagnação da precisão em benchmarks acadêmicos de fala para texto, as aplicações industriais exigem melhor reconhecimento de vocabulário raro e contextual. Este artigo introduz o Contextual Earnings-22, um novo dataset e benchmark para promover a pesquisa e revelar avanços no reconhecimento contextual de fala com vocabulário personalizado.

29
RESEARCHarXiv CS.CL·5/1/2026

Selective Augmentation: Improving Universal Automatic Phonetic Transcription via G2P Bootstrapping

This research proposes Selective Augmentation, a bootstrapping method to improve universal automatic phonetic transcription (APT) by selectively transferring linguistic distinctions to address limited high-quality training data. Exemplified with the MultIPA model, the approach enhanced plosive voicing accuracy by 17.6% and introduced aspiration recognition using data augmented from a helper language like Hindi.

28
ARTICLEDEV.to AI·4/15/2026

Local Voice Controlled AI Agent

This content describes a self-built local voice-controlled AI agent that acts directly on your machine, rather than just conversing. It can perform various actions like creating files, generating code, opening applications, and browsing websites, significantly bridging the gap between thought and computer execution.

27
RESEARCHarXiv CS.CL·14d ago

Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs

This paper investigates failures in Audio LLMs when transcribing English-Mandarin code-switching speech, identifying issues like language omission and translation. Applying Direct Preference Optimization (DPO) aligns models to preserve mixed-language content, leading to significant reductions in Mixed Error Rate (MER).

27