AI Research

146 items

RESEARCHarXiv CS.CL·7d ago

SENSE: Semantic Embedding Navigation with Soft-gated Evaluation for Retrieval-based Speculative Decoding

This paper proposes SENSE (Semantic Embedding Navigation with Soft-gated Evaluation) to enhance Retrieval-based Speculative Decoding (RSD) for LLMs. SENSE addresses RSD's rigid lexical dependencies by using robust semantic alignment and a soft-gated evaluation module to validate semantic equivalence.

LLMs NLP Inference Optimization Speculative Decoding

RESEARCHarXiv CS.LG·13d ago

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

This paper introduces GEM (Geometric Entropy Mixing), a novel framework for LLM data curation that reformulates the problem as a variational one on the hypersphere. GEM optimizes data composition for LLM pre-training, overcoming categorization flaws and discovering balanced semantic structures.

machine learning Geometric Entropy Mixing data curation AI Research

ARTICLE↑ trendingReddit r/MachineLearning·4/16/2026

Camera-ready paranoia [D]

A user expresses "camera-ready paranoia" after submitting their paper to CVPRW, fearing rejection due to potential errors despite having used a PDF validation tool and the correct template. They are seeking confirmation on when the paper will be placed in the proceedings, noting its current status as "In production".

academic submission research publishing computer vision AI Research

RESEARCHarXiv CS.CL·4/6/2026

PolyJarvis: LLM Agent for Autonomous Polymer MD Simulations

PolyJarvis é um agente LLM que automatiza simulações de dinâmica molecular de polímeros para prever propriedades a partir de linguagem natural, utilizando a plataforma RadonPy. O sistema executa tarefas desde a construção do monômero até o cálculo de propriedades, mostrando previsões precisas de densidade e módulos de elasticidade para polímeros como aPS e PMMA.

Autonomous Simulation LLM Agent Molecular Dynamics Polymer Science

RESEARCHDEV.to AI·18d ago

Hugging Face: New Research Highlights Value of Specialized AI Models

Hugging Face published research by Dharma AI on May 22, 2026, highlighting that specialized AI models can outperform larger, general-purpose models in specific tasks. The study suggests a strategic shift in AI procurement, emphasizing task-specific performance and efficiency.

specialized AI models Hugging Face AI procurement large language models

RESEARCHarXiv CS.CL·5d ago

Discourse-Role Labels as Presentation-Time Variables for Context Use in Language Models

This study investigates the effect of discourse-role labels, such as "Reference" or "Instruction," on language model behavior. It reveals that the adoption rate of misleading information can shift significantly (56-84 percentage points) depending on the label, with labels like "Instruction" increasing adoption and "Example" consistently suppressing it.

language models Context NLP model behavior

RESEARCHDEV.to AI·4/17/2026

Logical Neural Networks

Logical Neural Networks represent a research area that seeks to integrate symbolic reasoning with neural network pattern recognition. This field explores how to combine explicit knowledge representation and logical inference with the learning capabilities of connectionist models.

neural networks machine learning logic AI Research

RESEARCHarXiv CS.CL·4/20/2026

Brain Score Tracks Shared Properties of Languages: Evidence from Many Natural Languages and Structured Sequences

This research investigates the similarity between language models' processing and human language processing using the Brain Score framework. Findings suggest LMs trained on diverse natural languages and even structured data (human genome, Python) show similar Brain Score performance, indicating the metric captures the ability to extract common structure.

language models fMRI Neuroscience AI Research

RESEARCHarXiv CS.CL·5/4/2026

Timing is Everything: Temporal Scaffolding of Semantic Surprise in Humor

This research proposes the Dual Prediction Violation (DPV) framework to explain humor, emphasizing the interplay between content and timing. Analyzing 828 Chinese stand-up performances, it shows that temporal features, particularly peak semantic violations and systematic pauses, significantly predict audience appreciation more than semantic incongruity alone.

cognitive science humor stand-up comedy AI Research

RESEARCHarXiv CS.LG·21d ago

A Structural Threshold in Decision Capacity Governs Collapse in Self-Play Reinforcement Learning

This paper shows that a threshold in decision capacity governs collapse in self-play reinforcement learning agents under asymmetric rule perturbations. Eliminating all positive-reach contingent decisions causes rapid convergence to a deterministic exploitation attractor, while preserving even a single such decision prevents this collapse.

Decision Making reinforcement learning learning game theory

RESEARCHarXiv CS.CL·5d ago

When Retrieval Doesn't Help: A Large-Scale Study of Biomedical RAG

A large-scale study re-evaluates Retrieval-Augmented Generation (RAG) in medical question answering, finding only small and inconsistent improvements over no-retrieval baselines. It suggests that the choice of the backbone model is more critical than retrieval methods, and the main bottleneck is the model's ability to effectively use retrieved evidence.

RAG Medical Question Answering Biomedical AI large language models

RESEARCHarXiv CS.CL·5d ago

SaliMory: Orchestrating Cognitive Memory for Conversational Agents

SALIMORY is a framework that trains a single language model to manage cognitively-structured memory for conversational agents, addressing issues with existing memory expansion and reinforcement learning methods. It achieves this through a hierarchical stage-wise process reward and reward-decomposed contrastive refinement, significantly improving accuracy and personalization while reducing memory-attributed failures.

language models memory management AI Research Conversational AI

RESEARCHarXiv CS.CL·19d ago

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

Large language models struggle with complex long-context reasoning tasks despite supporting extensive inputs. ProxyCoT is a novel training framework designed to transfer reasoning capabilities from short proxy contexts to full long contexts, outperforming strong baselines.

machine learning Natural Language Processing Reasoning large language models

RESEARCHarXiv CS.CL·27d ago

Instructions shape Production of Language, not Processing

This research paper explores a production-centered mechanism in language models, revealing an asymmetry between language processing and production. It shows that instructions significantly shape information in output tokens, but not in sample tokens, correlating strongly with model behavior.

language models cognitive science NLP AI Research

RESEARCHarXiv CS.LG·27d ago

Steering Without Breaking: Mechanistically Informed Interventions for Discrete Diffusion Language Models

This paper investigates the limitations of uniform interventions in discrete diffusion language models (DLMs), demonstrating they degrade controlled generation quality. The authors find that different attributes commit at distinct stages of the denoising process, proposing an adaptive scheduler to concentrate interventions efficiently.

Diffusion Models language models Controlled Generation text generation

RESEARCHarXiv CS.CL·12d ago

From AR to Diffusion: Efficiently Adapting Large Language Models with Strictly Causal and Elastic Horizons

FLUID is a new framework designed to efficiently adapt Autoregressive (AR) backbones to the diffusion paradigm for parallel text generation. It enables initialization from GPT-style models and introduces an entropy-driven mechanism called Elastic Horizons, achieving state-of-the-art performance with significantly reduced training costs.

Diffusion Models text generation large language models Autoregressive Models

RESEARCHarXiv CS.LG·22d ago

TeamTR: Trust-Region Fine-Tuning for Multi-Agent LLM Coordination

This paper introduces TeamTR, a trust-region framework for fine-tuning multi-agent LLM systems, addressing structural failures in sequential fine-tuning. It proves that stale-occupancy evaluation incurs a quadratic penalty with the number of agents and improves performance by 7.1% on average.

Multi-agent LLMs LLM coordination Trust-region method Fine-tuning

RESEARCHarXiv CS.LG·15d ago

Reading Calibrated Uncertainty from Language Model Trajectories

This research paper proposes a new method to quantify uncertainty in language models by tracing the cumulative path of per-layer MLP updates. By extracting eleven scale-invariant geometric features, a sparse linear probe is shown to outperform maximum softmax probability in evaluating uncertainty, especially with baseline miscalibration.

language models deep learning Uncertainty Quantification model calibration

RESEARCHarXiv CS.CL·15d ago

RAS: Reflection-Augmented Scaling with In-Context Learning for Executable Cypher Query Generation

This paper introduces Reflection-Augmented Scaling (RAS) for executable Cypher query generation, leveraging prior execution feedback through in-context learning. RAS reduces the Query Execution Error Rate by 41-50%, significantly outperforming Independent Scaling.

language models graph databases query generation in-context learning

RESEARCHarXiv CS.CL·4/20/2026

DALM: A Domain-Algebraic Language Model via Three-Phase Structured Generation

DALM (Domain-Algebraic Language Model) is proposed to address knowledge interference in LLMs by replacing unconstrained generation with structured denoising over a domain lattice. It uses a three-phase generation path (domain, relation, concept uncertainty) under algebraic constraints, requiring a domain lattice, relation typing, and fiber partition to prevent cross-domain contamination.

language models machine learning Natural Language Processing AI Research