MQM — AI articles, news & research

ARTICLEDEV.to AI·19d ago

How I use an LLM as a translation judge

The author utilizes an LLM-based system, GEMBA-MQM v2, to automate translation quality evaluation, classifying errors by type and severity, mimicking human linguist reviews. Despite its high correlation with human annotations, the system faces noise, requiring multiple passes to mitigate score variability.

Translation MQM benchmarking quality evaluation