TTS

14 items

ARTICLE↑ trendingReddit r/MachineLearning·4/15/2026

[P] Added 8 Indian languages to Chatterbox TTS via LoRA — 1.4% of parameters, no phoneme engineering [P]

A project successfully added eight Indian languages (Telugu, Kannada, Bengali, Tamil, Malayalam, Marathi, Gujarati, and Hindi) to the Chatterbox-Multilingual TTS model using LoRA adapters and tokenizer extension. This approach trained only 1.4% of the model's parameters, avoiding the complex phoneme engineering typically required for each language.

Multilingual AI Chatterbox TTS LoRA

ARTICLE↑ trendingReddit r/LocalLLaMA·4/22/2026

Qwen3 TTS is seriously underrated - I got it running locally in real-time and it's one of the most expressive open TTS models I've tried

The author revisited an old real-time, local ASR->LLM->TTS pipeline project and was pleasantly surprised by Qwen3 TTS. After significant experimentation, they managed to get Qwen3 TTS working reliably for local streaming, praising its expressiveness and suitable architecture.

Open Source Qwen3 TTS real-time local inference

Qwen3 TTS is seriously underrated - I got it running locally in real-time and it's one of the most expressive open TTS models I've tried

NEWS↑ trendingReddit r/LocalLLaMA·4/8/2026

New TTS Model: VoxCPM2

O VoxCPM2 é um novo modelo de Text-to-Speech (TTS) que oferece três modos de geração de fala: design de voz, clonagem controlável e clonagem definitiva. Ele alcança resultados de ponta em benchmarks importantes de TTS, sendo uma ferramenta robusta para síntese de voz e reprodução de nuances vocais.

Voice Cloning machine learning Speech Generation TTS

ARTICLE↑ trendingReddit r/LocalLLaMA·4/10/2026

making my own ai waifu app that can teach me any language.

Um desenvolvedor criou um aplicativo de IA 'waifu' para ensino de idiomas, utilizando Gemma-4, Omnivoice TTS e modelagem 3D. O app, com recursos como chamadas de voz/vídeo, impressionou o criador pela capacidade de Gemma-4 de seguir prompts sem censura.

App Development 3D modeling TTS AI

ARTICLEDEV.to AI·4/15/2026

Choosing the Right Voice: A Technical Comparison of Pocket Studio Models

The article compares three distinct Text-to-Speech (TTS) engines within Pocket Studio (Pocket TTS, XTTS-v2, and Qwen3-TTS) that run locally on a CPU. It details their trade-offs in terms of speed, multi-language support, and voice quality to help users select the appropriate model for their project requirements.

model comparison TTS Local AI CPU Inference

RESEARCHarXiv CS.CL·5/6/2026

The TTS-STT Flywheel: Synthetic Entity-Dense Audio Closes the Indic ASR Gap Where Commercial and Open-Source Systems Fail

This paper introduces a self-contained TTS-STT flywheel to close the gap in niche-domain Indic ASR where commercial and open-source systems fail. It synthesizes entity-dense audio to significantly improve the Entity-Hit-Rate on challenging datasets for languages like Telugu.

Indic languages machine learning TTS ASR

ARTICLEDEV.to AI·4/15/2026

How to prompt Gemini 3.1's new text to speech model

Gemini 3.1 Flash TTS is a new text-to-speech model allowing users to achieve precise audio performance through prompting. This article offers tips on guiding the model, using context such as audio profiles, scene descriptions, and tags to control delivery.

AI models Prompting Gemini 3.1 Flash TTS TTS

DOCDEV.to AI·5/3/2026

🐱 Kitten TTS — A Lightweight Text-to-Speech Model with Live GUI

Kitten TTS is a lightweight text-to-speech model. It features a live graphical user interface.

AI models speech synthesis TTS GUI

DOCDEV.to AI·5/2/2026

Gemini 3.1: Native TTS for Easier, More Powerful Summary Reading

Google has released Gemini 3.1 Flash TTS, a native text-to-speech model that simplifies audio output. This article details how to upgrade a LINE Bot's TTS function to use this new version, overcoming complexities and limitations of previous implementations.

Gemini API TTS AI development

ARTICLEDEV.to AI·4/11/2026

I Built an Easy-to-Use Local TTS with Google Colab Support

This content introduces an easy-to-use local Text-to-Speech (TTS) tool with Google Colab support. The project aims to simplify the development of AI applications, automation, and accessibility features, overcoming the need for complex setups or powerful hardware.

Google Colab IA TTS Desenvolvimento

DOCDEV.to AI·4/18/2026

Build a Voice OTP System: Phone-Based Two-Factor Authentication in 10 Minutes

This content presents a tutorial on building a Voice OTP system, offering a more secure alternative to SMS-based two-factor authentication. It highlights how an AI voice can read the one-time code aloud via a phone call, circumventing SMS vulnerabilities like SIM-swapping and SS7 attacks.

OTP two-factor authentication security AI voice

ARTICLEDEV.to AI·4/10/2026

Free Kokoro TTS API: Open-Source Voice Synthesis with No Monthly Fee

Este conteúdo apresenta a API gratuita Kokoro TTS, uma alternativa de síntese de voz open-source que elimina a necessidade de contas, chaves de API ou taxas mensais cobradas por outros serviços. Ele fornece exemplos práticos em `curl` e Python para utilização, destacando a facilidade e rapidez na geração de áudio de alta qualidade.

Open Source Kokoro API TTS

NEWSGoogle DeepMind Blog·4/15/2026

Gemini 3.1 Flash TTS: the next generation of expressive AI speech

The Gemini 3.1 Flash TTS introduces a new audio model featuring granular audio tags. This allows for precise control to direct AI speech, leading to more expressive audio generation.

expressive AI Gemini TTS AI speech

Gemini 3.1 Flash TTS: the next generation of expressive AI speech

NEWSQwen Blog·6/27/2025

Time to Speak Some Dialects, Qwen-TTS!

A nova atualização do Qwen-TTS, treinada em milhões de horas de fala, oferece naturalidade e expressividade de nível humano, ajustando automaticamente prosódia e emoções. Agora, ele suporta a geração de 3 dialetos chineses (pequinês, xangainês, sichuanês) e 7 vozes bilíngues chinês-inglês através da Qwen API.

Qwen-TTS Dialetos Chineses IA API