RESEARCHarXiv CS.CL·4/20/2026
Applied Explainability for Large Language Models: A Comparative Study
This paper presents a comparative study of three explainability techniques (Integrated Gradients, Attention Rollout, and SHAP) on a fine-tuned DistilBERT model for sentiment classification. The study concludes that gradient-based attribution provides more stable and intuitive explanations, while attention-based methods are computationally efficient but less aligned with prediction-relevant features.
27