RESEARCHarXiv CS.CL·22d ago
Language Acquisition Device in Large Language Models
This paper proposes LAD-inspired pre-pretraining on MP-STRUCT, a formal language reflecting natural language structures, to improve Large Language Models' data efficiency. A brief pre-pretraining with MP-STRUCT matches strong formal-language baselines in token efficiency and imparts human-like resistance to structurally implausible languages.
27