← heapsort-ai

machine learning

790 items

RESEARCHarXiv CS.CL·4/17/2026

How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data

This research proposes TESSY, a Teacher-Student Cooperation Data Synthesis framework, to address performance drops when fine-tuning reasoning models with teacher-generated data. TESSY enables the generation of synthetic sequences that inherit advanced reasoning from the teacher while maintaining stylistic consistency with the student model's distribution.

27
RESEARCHarXiv CS.LG·4/16/2026

The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

This research investigates the 'grokking' phenomenon in transformers, finding that the long delay to generalization in arithmetic models stems from a decoder bottleneck. The encoder acquires relevant structural knowledge early, but the decoder struggles to access it, a hypothesis supported by causal interventions like transplanting encoders.

27
RESEARCHarXiv CS.CL·4/16/2026

Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling

This paper argues that the primary bottleneck in multimodal scaling for MLLMs is knowledge density in training data, rather than task format. It demonstrates that task-specific supervision like VQA adds little incremental semantic information beyond image captions, and that increasing knowledge density leads to consistent performance improvements.

27
RESEARCHarXiv CS.LG·4/27/2026

When Quotes Crumble: Detecting Transient Mechanical Liquidity Erosion in Limit Order Books

This research introduces a method for detecting transient liquidity erosion ("crumbling quotes") in electronic limit order books, differentiating between mechanical liquidity withdrawal and informational repricing. Using an ABIDES multi-agent simulator for ground truth, a neural model is developed that significantly outperforms rule-based baselines in identifying crumbling events across diverse market conditions.

27
RESEARCHarXiv CS.CL·4/27/2026

Knowledge-driven Augmentation and Retrieval for Integrative Temporal Adaptation

KARITA (Knowledge-driven Augmentation and Retrieval for Integrative Temporal Adaptation) is a system developed to address the challenges of temporal shifts in AI models, which are trained on historical data but deployed on future data. It integrates knowledge-driven augmentation and retrieval to capture diverse shifts and leverage insights for improved temporal adaptation across multiple domains.

27
RESEARCHarXiv CS.LG·4/20/2026

The Spectral Geometry of Thought: Phase Transitions, Instruction Reversal, Token-Level Dynamics, and Perfect Correctness Prediction in How Transformers Reason

This research paper discovers spectral phase transitions in large language models' hidden activation spaces during reasoning versus factual recall. A systematic spectral analysis across 11 models and 5 architecture families identifies seven core phenomena, including reasoning spectral compression and instruction tuning spectral reversal.

27
RESEARCHarXiv CS.LG·20d ago

Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models

The paper proposes a neural framework to estimate pairwise conditional mutual information (MI) directly from the hidden states of pretrained masked diffusion models (MDMs). This method captures dependency structures and enables MI-guided parallel decoding, showing utility in Sudoku and protein sequence generation by recovering known structural constraints.

27