RESEARCH31
Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization
arXiv CS.LGΒ·April 16, 2026
This paper introduces STOMP, a novel offline reinforcement learning algorithm for multi-objective optimization using smooth Tchebysheff scalarization. It addresses the limitation of linear scalarization in recovering non-convex Pareto fronts, crucial for aligning large language models and other real-world applications with conflicting rewards.
Read original β