← heapsort
RESEARCH28

From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents

arXiv CS.AIΒ·April 23, 2026

This paper presents a conformal interpretability framework for LLM agents to understand temporal concept evolution. It uses step-wise reward modeling and conformal prediction to statistically label internal representations and identify latent directions linked to success, failure, or reasoning drift.

Read original β†—