RESEARCH27
Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech
arXiv CS.CLΒ·April 24, 2026
This paper introduces Hierarchical Policy Optimization (HPO) for Simultaneous Speech Translation (SST) using LLMs, addressing challenges like high computational cost and imperfect supervised fine-tuning data. HPO employs a hierarchical reward to balance translation quality and latency, demonstrating substantial improvements in COMET and MetricX scores.
Read original β