← heapsort-ai

membership inference

1 items

RESEARCHarXiv CS.CL·13d ago

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

This paper offers the first unified survey of Pretraining Data Exposure (PDE) in Large Language Models (LLMs), covering data contamination and membership inference. It formalizes PDE, reviews attack and defense methods, and highlights open challenges to ensure evaluation integrity and protect privacy.

29