RESEARCH46

ABLE: Representing and Mapping LLMs via Attribution-Based Large-model Embedding

arXiv CS.CL·June 9, 2026

ABLE (Attribution-Based Large-model Embedding) introduces a framework for representing large language models by leveraging interpretability space through attribution-based embeddings. It addresses challenges in systematic model comparison by aggregating gradient-based feature attributions to capture model-specific input-sensitivity patterns.

LLMs model representation security model comparison interpretability

Read original ↗