ASR

11 items

RESEARCHHugging Face Blog·22h ago

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

This content benchmarks the capability of frontier Automatic Speech Recognition (ASR) systems and voice agents to process code-switched speech from bilingual customers. It evaluates their performance in understanding and responding to such complex linguistic inputs.

Code-Switching Voice Agents benchmarking Bilingual Speech

NEWS↑ trendingReddit r/LocalLLaMA·4/12/2026

mtmd: qwen3 audio support (qwen3-omni and qwen3-asr)

The Qwen3 model now supports audio input through its `qwen3-omni-moe` (multimodal with vision and audio input) and `qwen3-asr` (audio speech recognition) versions. GGUF models for Qwen3-Omni (30B variants) and Qwen3-ASR (1.7B and 0.6B) are available on Hugging Face for community use.

multimodal AI audio GGUF Qwen3

mtmd: qwen3 audio support (qwen3-omni and qwen3-asr)

ARTICLE↑ trendingReddit r/MachineLearning·4/10/2026

Building a chatbot with ASR [P]

Um desenvolvedor busca a melhor abordagem ASR para integrar speech-to-text em um chatbot, enfrentando restrições orçamentárias e de segurança que o levam a preferir modelos auto-hospedados como Whisper em vez de APIs externas. Ele solicita insights sobre os trade-offs entre modelos locais e APIs, performance e facilidade de implantação para um lançamento de MVP.

self-hosted AI Whisper Chatbot Speech-to-Text

DOCHugging Face Blog·6d ago

How to Fine-Tune Nemotron 3.5 ASR for Your Language, Domain, or Accent

This content provides a guide on how to fine-tune the Nemotron 3.5 Automatic Speech Recognition (ASR) model. It aims to help users adapt the model for specific languages, domains, or accents, optimizing its performance.

learning Nemotron 3.5 AI ASR

ARTICLEDEV.to AI·4/19/2026

The Unit Economics of Speech-to-Text Just Collapsed

The unit economics of speech-to-text have collapsed, as cloud ASR pricing remains high despite the near-zero marginal cost of running efficient models like Distil-Whisper locally on CPUs. Recent advancements, such as whisper.cpp, have made powerful AI inference feasible without expensive cloud GPUs, challenging existing service models.

Open Source AI cloud computing Speech-to-Text unit economics

RESEARCHarXiv CS.CL·5/6/2026

The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail

This paper introduces a self-contained TTS-STT flywheel to close the gap in niche-domain Indic ASR where commercial and open-source systems fail. It synthesizes entity-dense audio to significantly improve the Entity-Hit-Rate on challenging datasets for languages like Telugu.

Indic languages Machine Learning TTS ASR

DOCDEV.to AI·4/18/2026

Transcription Glossary: 25+ Terms You Need to Know

This glossary defines over 25 essential terms in transcription and speech recognition, such as WER and diarization. It aims to demystify technical jargon from speech science, machine learning, and audio engineering for AI tool users.

glossary audio-engineering Machine Learning ASR

RESEARCHarXiv CS.CL·4/16/2026

A Proactive EMR Assistant for Doctor-Patient Dialogue: Streaming ASR, Belief Stabilization, and Preliminary Controlled Evaluation

This paper introduces a proactive EMR assistant for doctor-patient dialogue, designed to overcome limitations of passive systems by integrating streaming ASR, belief stabilization, and action planning. The system was evaluated in a preliminary controlled setting, achieving an F1 of 0.84 and Recall@5 of 0.87.

Natural Language Processing ASR healthcare AI medical AI

RESEARCHarXiv CS.CL·21d ago

Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

This research introduces a new benchmark for evaluating commercial Automatic Speech Recognition (ASR) systems on code-switching speech. It assesses five ASR providers across four language pairs, including Arabic-English, Persian-English, and German-English, using a sophisticated two-stage data selection pipeline.

Code-Switching benchmarking ASR multilingual

CASETogether AI Blog·12d ago

How Together AI built the world’s fastest speech-to-text stack

Together AI developed the world's fastest speech-to-text stack on Artificial Analysis. They achieved this by treating ASR as a comprehensive full-path systems challenge, rather than solely a GPU inference problem.

AI systems Speech-to-Text Together AI ASR

RESEARCHHugging Face Blog·5/6/2026

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

This content announces the integration of Benchmaxxer Repellant into the Open ASR Leaderboard. This new addition aims to enhance the robustness and fairness of automatic speech recognition system evaluations.

AI models evaluation benchmarking ASR