← heapsort-ai

Translation

38 items

RESEARCH↑ trendingReddit r/MachineLearning·4/14/2026

We benchmarked TranslateGemma against 5 other LLMs on subtitle translation across 6 languages. At first glance the numbers told a clean story, but then human QA added a chapter. [D]

This content presents a benchmark study evaluating six Large Language Models (LLMs), including TranslateGemma-12b, on English subtitle translation into six languages. The models were ranked using reference-free Quality Evaluation (QE) metrics and a custom combined metric called TQI, where TranslateGemma-12b emerged as the top-performing model overall.

We benchmarked TranslateGemma against 5 other LLMs on subtitle translation across 6 languages. At first glance the numbers told a clean story, but then human QA added a chapter. [D]
70
RESEARCH↑ trendingReddit r/LocalLLaMA·4/14/2026

We benchmarked TranslateGemma-12b against 5 frontier LLMs on subtitle translation - it won across the board, with one significant catch

A study benchmarked TranslateGemma-12b against five frontier LLMs on subtitle translation for six language pairs, showing the task-specific model consistently outperformed general-purpose models. While initial numbers indicated a clear win, human QA added a significant catch which will be detailed in the full report.

We benchmarked TranslateGemma-12b against 5 frontier LLMs on subtitle translation - it won across the board, with one significant catch
42
ARTICLE↑ trendingReddit r/LocalLLaMA·4/21/2026

An actual example of "If you dont run it, you dont own it" and Gemma 4 beats both Chat GPT and Gemini Chat

The author shares their experience using various AI models (GPT OOS 120B, Qwen 3 Max, Chat GPT 4o) for translating a Chinese novel, highlighting challenges with name consistency and unexpected censorship. Chat GPT 4o was initially the best for accuracy and translation quality, but some models showed degradation or filtering over time.

35
ARTICLEDEV.to AI·18d ago

How I use an LLM as a translation judge

The author utilizes an LLM-based system, GEMBA-MQM v2, to automate translation quality evaluation, classifying errors by type and severity, mimicking human linguist reviews. Despite its high correlation with human annotations, the system faces noise, requiring multiple passes to mitigate score variability.

27