← heapsort
RESEARCH54

TinyJudge: Unverifiable Constraint Alignment via Lightweight Specialist Ensembles

arXiv CS.CLΒ·June 9, 2026

The paper introduces TinyJudge, a framework that uses an ensemble of specialized tiny language models (0.6B) to provide lightweight and high-precision rewards for soft, unverifiable constraints in LLM instruction following. This approach addresses the bottlenecks of reward hacking and high computational overhead found in traditional LLM-as-a-judge methods for constraint alignment.

Read original β†—