RESEARCH28
From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents
arXiv CS.AIΒ·April 23, 2026
This paper presents a conformal interpretability framework for LLM agents to understand temporal concept evolution. It uses step-wise reward modeling and conformal prediction to statistically label internal representations and identify latent directions linked to success, failure, or reasoning drift.
Read original β