← heapsort-ai

predictive models

2 items

RESEARCHarXiv CS.CL·19d ago

Data Scaling as Progressive Coverage of a Predictive Contribution Spectrum

This research investigates whether real-data scaling laws are governed by a progressive coverage of a latent predictive contribution spectrum, rather than solely by token-frequency. Using a suffix-automaton and a global-KL predictive contribution spectrum, the study finds a strong correlation between the spectrum's tail slope and the data-scaling exponent of GPT learners, showing that effective truncation rank scales logarithmically.

29