data leakage

2 items

RESEARCHarXiv CS.LG·8d ago

NumLeak: Public Numeric Benchmarks as Latent Labels in Foundation Models

This paper introduces NumLeak, a framework designed to measure memorized recall in foundation models using public numeric benchmarks. It reveals that top-tier LLMs recall financial and economic data with high fidelity, suggesting that evaluations may be measuring memorization rather than genuine out-of-sample skill.

LLM memorization Foundation Models data leakage Benchmarking

ARTICLEDEV.to AI·4/15/2026

A Complete Guide to Securing AI-Generated Code: From Pre-LLM Sanitization to AI-Native SAST (2026)

This article analyzes the security risks associated with AI coding assistants, such as GitHub Copilot, highlighting two main directions: the generation of code with security flaws and the exposure of sensitive data (API keys, PII) when developers paste their code into AI tools. It notes that while most security teams address the former, few have a plan for the data leakage inherent in the latter.

data leakage code security Software Development Security AI coding assistants