RESEARCH27
Model Organisms Are Leaky: Perplexity Differencing Often Reveals Finetuning Objectives
arXiv CS.CLΒ·May 5, 2026
This paper introduces a perplexity-based method to reveal finetuning objectives of large language models, particularly those exhibiting "model organism" behaviors. This method leverages models' tendency to overgeneralize, generating and ranking completions to identify the finetuning goals without prior assumptions.
Read original β