heapsort
RESEARCH27

Distributional Reinforcement Learning via the Cram\'er Distance

arXiv CS.LG·May 12, 2026

This paper introduces the Cramér-based Distributional Soft Actor-Critic (C-DSAC) algorithm, applying Soft Actor-Critic within a distributional reinforcement learning framework by minimizing the squared Cramér distance. Empirical results demonstrate that C-DSAC outperforms baseline SAC and other distributional methods, particularly in high-complexity environments, attributed to its confidence-driven Q-value updates.

Read original