Model Alignment

3 items

RESEARCHarXiv CS.CL·20h ago

TinyJudge: Unverifiable Constraint Alignment via Lightweight Specialist Ensembles

The paper introduces TinyJudge, a framework that uses an ensemble of specialized tiny language models (0.6B) to provide lightweight and high-precision rewards for soft, unverifiable constraints in LLM instruction following. This approach addresses the bottlenecks of reward hacking and high computational overhead found in traditional LLM-as-a-judge methods for constraint alignment.

Tiny Models Model Alignment LLMs reinforcement learning

RESEARCHarXiv CS.LG·4/14/2026

Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model

This research investigates Deliberative Alignment in LLMs, a method designed to improve safety by distilling reasoning capabilities from stronger models. It uncovers an alignment gap between teacher and student models, showing that student models can retain unsafe behaviors from the base model despite learning advanced reasoning patterns. The paper proposes a BoN sampling method to address these challenges.

Model Alignment LLMs Deliberative Alignment Reasoning

RESEARCHarXiv CS.LG·4/27/2026

Mochi: Aligning Pre-training and Inference for Efficient Graph Foundation Models via Meta-Learning

Mochi is a Graph Foundation Model that improves efficiency and task unification by employing a meta-learning based training framework. It pre-trains on few-shot episodes directly mirroring downstream evaluation, addressing limitations of traditional reconstruction-based pre-training and achieving competitive performance.

Meta-Learning Model Alignment Graph Neural Networks Foundation Models