RESEARCHarXiv CS.LG·11d ago
One Mask to Rule Them All: On Hidden Facts after Editing and How to Find Them
This paper investigates the internal mechanisms of knowledge editing methods such as ROME and MEMIT, revealing that diverse edits share a common functional structure reliant on a specific subset of weights. A binary mask over these edited weights reverses most changes by eliminating overattention in later layers, demonstrating this mechanism's necessity for successful edits.
27