RESEARCH27
Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?
arXiv CS.LGΒ·May 29, 2026
This paper investigates the mechanistic origins of catastrophic forgetting in Large Language Models (LLMs), comparing Reinforcement Learning (RL) with Supervised Fine-Tuning (SFT). It reveals that RL preserves internal computational circuits more effectively, mitigating the forgetting of prior capabilities, unlike SFT which causes greater circuit disruption.
Read original β