← heapsort
RESEARCH31

Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization

arXiv CS.LGΒ·April 16, 2026

This paper introduces STOMP, a novel offline reinforcement learning algorithm for multi-objective optimization using smooth Tchebysheff scalarization. It addresses the limitation of linear scalarization in recovering non-convex Pareto fronts, crucial for aligning large language models and other real-world applications with conflicting rewards.

Read original β†—