RESEARCHarXiv CS.LG·4/16/2026
Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization
This paper introduces STOMP, a novel offline reinforcement learning algorithm for multi-objective optimization using smooth Tchebysheff scalarization. It addresses the limitation of linear scalarization in recovering non-convex Pareto fronts, crucial for aligning large language models and other real-world applications with conflicting rewards.
31