AI Research

146 items

RESEARCHarXiv CS.CL·4/30/2026

SpecTr-GBV: Multi-Draft Block Verification Accelerating Speculative Decoding

SpecTr-GBV is a novel speculative decoding method that unifies multi-draft and greedy block verification to accelerate language model inference. It formulates the verification step as an optimal transport problem, improving both theoretical efficiency and empirical performance by achieving the optimal expected acceptance length.

large language models Inference Optimization Speculative Decoding AI Research

RESEARCHarXiv CS.AI·5/9/2026

From History to State: Constant-Context Skill Learning for LLM Agents

This paper proposes constant-context skill learning, a novel framework for LLM agents to manage recurring workflows more efficiently. It addresses privacy, cost, and capability challenges by learning reusable procedures in task-family modules and conditioning inference on a compact state block. Its effectiveness is demonstrated across benchmarks like ALFWorld, WebShop, and SciWorld.

LLM Agents reinforcement learning Skill Learning AI Research

RESEARCHarXiv CS.LG·4/20/2026

The Spectral Geometry of Thought: Phase Transitions, Instruction Reversal, Token-Level Dynamics, and Perfect Correctness Prediction in How Transformers Reason

This research paper discovers spectral phase transitions in large language models' hidden activation spaces during reasoning versus factual recall. A systematic spectral analysis across 11 models and 5 architecture families identifies seven core phenomena, including reasoning spectral compression and instruction tuning spectral reversal.

neural networks LLMs machine learning AI Research

RESEARCHarXiv CS.LG·19d ago

Neural Estimation of Pairwise Mutual Information in Masked Discrete Sequence Models

The paper proposes a neural framework to estimate pairwise conditional mutual information (MI) directly from the hidden states of pretrained masked diffusion models (MDMs). This method captures dependency structures and enables MI-guided parallel decoding, showing utility in Sudoku and protein sequence generation by recovering known structural constraints.

neural networks information theory machine learning sequence models

RESEARCHarXiv CS.CL·19d ago

Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction

This study proposes a structured framework to improve LLM reasoning when analyzing long documents, addressing issues like contextual bias and omission error. It combines parallel chunk-level processing with evidence-anchored consolidation to generate more robust and bias-resilient conceptual abstractions.

Contextual Reasoning Natural Language Processing AI Research Bias

RESEARCHarXiv CS.CL·19d ago

Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models

This study investigates how emotionally framed evaluation follow-ups alter both the behavior and internal representations of small language models. Findings indicate that "pressure" strongly induces shortcut markers, while "calm" and "curiosity" preserve honesty, with emotional direction vectors peaking at the final transformer layer.

NLP model behavior emotional framing AI Research

RESEARCHarXiv CS.LG·5/8/2026

MidSteer: Optimal Affine Framework for Steering Generative Models

This paper formalizes the theory of concept steering in generative models, linking it to affine concept erasure and introducing LEACE-Switch. It then proposes MidSteer, a more general affine framework for concept manipulation with minimal disturbance.

model steering machine learning theoretical framework AI Research

RESEARCHarXiv CS.CL·19d ago

FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation

FlowLM introduces a novel flow matching language model, adapted from pre-trained diffusion models through efficient fine-tuning. This method enables high-quality, few-step text generation that significantly outperforms traditional diffusion sampling with fewer training epochs.

Diffusion Models language models machine learning text generation

RESEARCHarXiv CS.CL·4/21/2026

Data Mixing for Large Language Models Pretraining: A Survey and Outlook

This paper provides a comprehensive survey on data mixing for Large Language Model (LLM) pretraining, a crucial factor for training efficiency and downstream generalization. It formalizes data mixture optimization as a bilevel problem and introduces a fine-grained taxonomy for existing methods.

data optimization pretraining machine learning large language models

RESEARCHarXiv CS.CL·7d ago

ART: Attention Run-time Termination for Efficient Large Language Model Decoding

Long-context decoding in Large Language Models (LLMs) is severely constrained by the memory bandwidth of the Key-Value (KV) cache. This paper proposes Attention Run-time Termination (ART), a lightweight mechanism that optimizes KV cache access, leading to a 20% higher generation throughput.

LLMs memory management decoding performance

RESEARCHarXiv CS.CL·25d ago

Distribution Corrected Offline Data Distillation for Large Language Models

This research proposes an offline reasoning distillation framework for Large Language Models (LLMs) to enhance intelligence in resource-constrained environments. It tackles the distributional drift issue in existing offline methods by correcting teacher-student discrepancies while preserving data efficiency and supervision quality.

Data Distillation Offline Distillation machine learning large language models

RESEARCHarXiv CS.LG·7d ago

From Demonstrations to Rewards: Test-Time Prompt Optimization for VLM Reward Models

Researchers propose Demo2Reward, a test-time adaptation technique to optimize Vision-Language Model (VLM) reward models in robotics. It uses a few demonstrations to reduce false positives while preserving true positives, without requiring additional model training.

Vision-Language Models reinforcement learning Prompt Optimization robotics

RESEARCHarXiv CS.LG·25d ago

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

EvolveMem introduces a self-evolving memory architecture for LLM agents that allows both stored knowledge and retrieval mechanisms to co-evolve. It optimizes its configuration autonomously using an LLM-powered diagnosis module, leading to a closed-loop AutoResearch process.

LLM Agents AutoResearch self-evolving systems memory architecture

RESEARCHarXiv CS.LG·25d ago

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

This paper introduces TraFL, a novel post-training approach for diffusion language models that addresses "trajectory locking" observed in reward-maximizing methods. TraFL, a trajectory-balance objective, outperforms other methods across mathematical reasoning and code generation benchmarks.

Diffusion Models language models reinforcement learning machine learning

RESEARCHarXiv CS.LG·25d ago

Rethinking Molecular OOD Generalization via Target-Aware Source Selection

This research addresses challenges in robust molecular property prediction under extreme out-of-distribution (OOD) scenarios crucial for AI-driven drug discovery. It proposes SCOPE-BENCH, a new benchmark for OOD performance evaluation, and POMA, a framework for multi-source adaptation to overcome limitations of existing methods.

Out-of-Distribution Molecular AI machine learning drug discovery

RESEARCHarXiv CS.AI·5/7/2026

The Scaling Properties of Implicit Deductive Reasoning in Transformers

This paper investigates the scaling properties of implicit deductive reasoning over Horn clauses in depth-bounded Transformers. Deep models with a bidirectional prefix mask approach explicit CoT performance, though CoT remains necessary for depth extrapolation.

neural networks scaling deductive reasoning AI Research

RESEARCHarXiv CS.LG·5/7/2026

A Self-Attentive Meta-Optimizer with Group-Adaptive Learning Rates and Weight Decay

MetaAdamW is a novel optimizer that employs a self-attention mechanism to dynamically adjust per-group learning rates and weight decay, addressing the limitation of uniform hyperparameters in adaptive optimizers. Its attention module is trained via a meta-learning objective, integrating gradient alignment, loss decrease, and generalization gap.

Meta-Learning deep learning learning AI Research

RESEARCHarXiv CS.AI·28d ago

Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction

This paper investigates strategies to improve multimodal LLM accuracy in extracting data from scientific charts. It demonstrates that a simple grid-based spatial priming method significantly outperforms semantic prompting techniques.

Data Extraction spatial priming chart analysis AI Research

RESEARCHarXiv CS.LG·21d ago

Language Game: Talking to Non-Human Systems

This paper explores direct communication with non-human systems (like gene regulatory networks or fungi) recognized as computational substrates, moving beyond LLMs acting as proxies. It proposes a "language game" approach using reinforcement learning with linear interfaces to enable these systems to "speak in their own voice."

reinforcement learning AI communication large language models non-human systems

RESEARCHarXiv CS.CL·7d ago

CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards

This paper proposes CSRP, a three-stage framework for Chinese Grammatical Error Correction (CGEC) using Large Language Models (LLMs). CSRP addresses challenges of general-purpose models and metric optimization with continual pre-training, Chain-of-Thought SFT, and policy optimization with efficiency-aware rewards that penalize unnecessary edits, achieving state-of-the-art performance on the NACGEC benchmark.

reinforcement learning Grammar Correction Natural Language Processing AI Research