RESEARCH28

From Actions to Understanding: Conformal Interpretability of Temporal Concepts in LLM Agents

arXiv CS.AI·April 23, 2026

This paper presents a conformal interpretability framework for LLM agents to understand temporal concept evolution. It uses step-wise reward modeling and conformal prediction to statistically label internal representations and identify latent directions linked to success, failure, or reasoning drift.

LLM Agents AI interpretability Conformal Prediction

Read original ↗