RESEARCH27

Applied Explainability for Large Language Models: A Comparative Study

arXiv CS.CL·April 20, 2026

This paper presents a comparative study of three explainability techniques (Integrated Gradients, Attention Rollout, and SHAP) on a fine-tuned DistilBERT model for sentiment classification. The study concludes that gradient-based attribution provides more stable and intuitive explanations, while attention-based methods are computationally efficient but less aligned with prediction-relevant features.

Comparative Study Natural Language Processing Explainable AI large language models

Read original ↗