← heapsort-ai

language models

103 items

RESEARCHarXiv CS.CL·4d ago

Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning

This paper introduces a hybrid pre-training objective for text encoders, combining a JEPA-style latent-space prediction loss with a standard Masked Language Modelling (MLM) objective. This new approach aims to encourage representations anchored to deeper semantic structure rather than just surface-form token identity, showing significantly more uniform embeddings.

30
RESEARCHarXiv CS.CL·19d ago

Data Scaling as Progressive Coverage of a Predictive Contribution Spectrum

This research investigates whether real-data scaling laws are governed by a progressive coverage of a latent predictive contribution spectrum, rather than solely by token-frequency. Using a suffix-automaton and a global-KL predictive contribution spectrum, the study finds a strong correlation between the spectrum's tail slope and the data-scaling exponent of GPT learners, showing that effective truncation rank scales logarithmically.

29
RESEARCHarXiv CS.CL·4/13/2026

Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models

This paper reveals a critical vulnerability in diffusion-based language models (dLLMs) where their safety alignment, based on monotonic denoising schedules, can be easily bypassed. By re-masking refusal tokens and injecting an affirmative prefix, researchers achieved high attack success rates against prominent dLLMs, exposing a structural flaw.

29
RESEARCHarXiv CS.CL·22d ago

Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance

This paper presents a comprehensive analysis of neural activation patterns across six distinct large language model (LLM) architectures, examining their performance on twelve cognitive task categories. The findings reveal fundamental differences in how encoder and decoder architectures process diverse cognitive tasks, with mathematical reasoning consistently producing the highest attention entropy and decoder models exhibiting significantly higher sparsity.

29
RESEARCHarXiv CS.LG·15d ago

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

This research study reveals that small instruction-tuned language models (LMs) using Chain-of-Thought (CoT) for arithmetic often employ a positional shortcut, copying whichever number occupies the trailing position before the answer delimiter. This shortcut dominates, even when intermediate reasoning is correct, significantly impacting answer accuracy.

29
RESEARCHarXiv CS.CL·5d ago

Discourse-Role Labels as Presentation-Time Variables for Context Use in Language Models

This study investigates the effect of discourse-role labels, such as "Reference" or "Instruction," on language model behavior. It reveals that the adoption rate of misleading information can shift significantly (56-84 percentage points) depending on the label, with labels like "Instruction" increasing adoption and "Example" consistently suppressing it.

28