Multilingual AI

27 items

RESEARCHarXiv CS.CL·1d ago

Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning

This research introduces PolyFact, a multilingual factual QA dataset, to address cross-lingual factual inconsistency in LLMs. It finds that reinforcement learning via GRPO consistently improves cross-lingual factual recall and generalization compared to supervised fine-tuning.

Multilingual AI LLMs reinforcement learning machine learning

ARTICLE↑ trendingReddit r/MachineLearning·4/15/2026

[P] Added 8 Indian languages to Chatterbox TTS via LoRA — 1.4% of parameters, no phoneme engineering [P]

A project successfully added eight Indian languages (Telugu, Kannada, Bengali, Tamil, Malayalam, Marathi, Gujarati, and Hindi) to the Chatterbox-Multilingual TTS model using LoRA adapters and tokenizer extension. This approach trained only 1.4% of the model's parameters, avoiding the complex phoneme engineering typically required for each language.

Multilingual AI Chatterbox TTS LoRA

ARTICLEDEV.to AI·2d ago

Day 49: The Unseen Layers of Building Health AI for 22+ Indian Languages

Current LLMs like GPT-4 struggle with nuanced medical queries in Indian languages due to a fundamental bias in their English-heavy training data. GoDavaii aims to bridge this gap by developing advanced Health AI for over 22 Indian languages, focusing on making medical knowledge contextually relevant and accessible across diverse linguistic backgrounds.

Multilingual AI India AI bias Health AI

RESEARCHarXiv CS.CL·4/16/2026

A Multi-Model Approach to English-Bangla Sentiment Classification of Government Mobile Banking App Reviews

This study classifies sentiment in English and Bangla reviews of Bangladeshi government mobile banking apps, using a hybrid labeling approach for 5,652 reviews. It found that traditional machine learning models like Random Forest and Linear SVM significantly outperformed fine-tuned XLM-RoBERTa for this specific task.

Multilingual AI machine learning natural language processing sentiment analysis

ARTICLEDEV.to AI·3d ago

Day 48 of GoDavaii: Building Health AI for 22 Indian Languages - Why It's Harder Than You Think

The article details the challenges of building health AI that truly understands the nuances of India's 22 official languages, exemplified by the complexity of interpreting a simple phrase. On Day 48 since launch, GoDavaii is tackling immense linguistic complexities to create an AI that goes beyond English-first solutions.

Multilingual AI India natural language processing Health AI

RESEARCHarXiv CS.CL·4/20/2026

Think Multilingual, Not Harder: A Data-Efficient Framework for Teaching Reasoning Models to Code-Switch

This research introduces a data-efficient fine-tuning framework to teach large language models to effectively code-switch for reasoning tasks. It identifies beneficial code-switched behaviors, moving beyond treating code-switching as an error, through systematic analysis of diverse reasoning traces.

Multilingual AI Code-Switching Reasoning large language models

ARTICLEDEV.to AI·4/19/2026

Tại sao OCR đa ngôn ngữ thất bại dù đã mở rộng character set

Many OCR teams assume expanding the character set automatically improves recognition, but this article reveals it's a simplified view. Successful multilingual OCR critically depends on training with data reflecting actual glyph shapes, font variations, language distribution, and document layouts.

Multilingual AI AI development challenges OCR

RESEARCHarXiv CS.CL·4/14/2026

Toward Generalized Cross-Lingual Hateful Language Detection with Web-Scale Data and Ensemble LLM Annotations

This research explores improving cross-lingual hate speech detection by leveraging large-scale unlabelled web data and LLM-based synthetic annotations. It shows that continued pre-training of BERT models on web data and fine-tuning with synthetic labels generated by an ensemble of LLMs significantly boosts performance, especially in low-resource settings.

Multilingual AI pre-training ensemble learning Hate Speech Detection

ARTICLEDEV.to AI·4/23/2026

ERNIE Image Review: Open-Source Text-to-Image for Posters, Comics, and Bilingual Visuals

Baidu's ERNIE Image is an open-source text-to-image model focused on generating high-quality visuals with readable in-image text and Chinese-English bilingual support. It excels in structured compositions like poster layouts and comic scenes, proving useful for diverse creative workflows.

Multilingual AI Text-to-image open-source AI image generation

ARTICLEDEV.to AI·5/4/2026

The Aunty Test - what Hindi-speaking patients see when they ask Health AI in their own language

Many Health AI systems are English-first, leading to failures when patients ask queries in their native languages like Hindi. GoDavaii addresses this gap by reasoning natively in 22 Indian languages to provide accurate medical information.

AI applications language models Multilingual AI healthcare AI

RESEARCHarXiv CS.CL·20d ago

Prompting language influences diagnostic reasoning and accuracy of large language models

This research evaluated the impact of prompting language on the diagnostic reasoning and accuracy of large language models (LLMs) in clinical settings. Four out of five models performed better in English, highlighting the uncertainty regarding LLM reliability across different languages.

Multilingual AI LLMs clinical decision support Diagnostic Accuracy

ARTICLEDEV.to AI·28d ago

The Aunty Test - what Malayalam-speaking patients see when they ask Health AI in their own language

This content highlights the failure of English-first Health AI to accurately understand and respond to medical queries in non-English languages like Malayalam. It introduces GoDavaii as an AI capable of reasoning natively in 22 Indian languages, addressing a critical gap in healthcare accessibility for a billion non-English speakers.

Multilingual AI global accessibility language barrier Healthcare

ARTICLEDEV.to AI·5/2/2026

The Aunty Test - what Bengali-speaking patients see when they ask Health AI in their own language

This content exposes the limitations of English-first Health AI, which fails to provide accurate advice for queries in languages like Bengali due to poor translation layers. It highlights GoDavaii as an AI that reasons natively in 22 Indian languages, offering superior localized medical assistance.

Multilingual AI Healthcare localization

ARTICLEDEV.to AI·16d ago

How Google I/O 2026 Inspired Me to Start Building a Telugu Jarvis AI

Inspired by Google I/O 2026, the author plans to develop a Telugu-first AI assistant. This initiative aims to make AI more accessible to students in India who prefer learning and communicating in regional languages, fostering faster learning and confidence.

AI accessibility Multilingual AI India learning

ARTICLEDEV.to AI·5/2/2026

The Aunty Test - what Marathi-speaking patients see when they ask Health AI in their own language

This article highlights how most English-first Health AI systems fail to understand and respond accurately to medical queries in local languages like Marathi. It emphasizes the need for AI that reasons natively in multiple languages, rather than relying on translation layers or thin localized veneers, to provide effective healthcare guidance.

language models Multilingual AI AI bias healthcare AI

DOCDEV.to AI·4/24/2026

Build a Multilingual AI Voice Bot: Auto-Detect and Respond in the Caller's Language

This content details how to build a multilingual AI voice bot that automatically detects and responds in the caller's language. It covers the three essential technological layers (STT, LLM, TTS) required for a natural conversational experience.

language detection Multilingual AI AI voice bot Speech-to-Text

ARTICLEDEV.to AI·5/7/2026

The Aunty Test - what Marathi-speaking patients see when they ask Health AI in their own language

The content discusses how English-first Health AI struggles to process queries in native languages like Marathi, leading to inaccurate responses. It emphasizes the need for AI that can reason natively in multiple languages for effective healthcare support.

Multilingual AI language barrier localization Health AI

ARTICLEHugging Face Blog·4/17/2026

Building a Fast Multilingual OCR Model with Synthetic Data

This content discusses building a fast and multilingual Optical Character Recognition (OCR) model. The proposed methodology involves using synthetic data for model training and optimization.

synthetic data Multilingual AI machine learning OCR

RESEARCHarXiv CS.CL·5/1/2026

Cross-Lingual Response Consistency in Large Language Models: An ILR-Informed Evaluation of Claude Across Six Languages

This paper introduces an ILR-informed framework to evaluate Claude (Sonnet 4.6) for cross-lingual response consistency across six languages. It analyzes responses to semantically equivalent prompts using quantitative metrics and expert ILR qualitative assessment, revealing language-specific variations like response length differences and surface divergence in creative clusters.

Multilingual AI LLMs AI evaluation

RESEARCHarXiv CS.CL·26d ago

Mitigating Cross-Lingual Cultural Inconsistencies in LLMs via Consensus-Driven Preference Optimisation

Multilingual large language models (MLLMs) often exhibit inconsistent behavior regarding cultural identity when the prompt's language changes. Researchers introduce a new metric, Singleton Fleiss's "k_S", and a consensus-driven alignment framework, C-3PO, to mitigate these cross-lingual cultural inconsistencies, achieving significant improvements.

Multilingual AI LLMs AI alignment Cultural Bias