language models

103 items

RESEARCHarXiv CS.LG·5/5/2026

StyleShield: Exposing the Fragility of AIGC Detectors through Continuous Controllable Style Transfer

The paper introduces StyleShield, a novel flow matching framework for conditional text style transfer that exposes the fragility of AI-generated content (AIGC) detectors. It operates in continuous token embedding space to blur the statistical boundary between human and AI writing, challenging the reliability of current detection services.

language models AI detection security style transfer

RESEARCHarXiv CS.CL·5/5/2026

H-Probes: Extracting Hierarchical Structures From Latent Representations of Language Models

This paper introduces H-probes, linear probes designed to extract hierarchical structure, specifically depth and pairwise distance, from the latent representations of large language models. The research shows these probes robustly find low-dimensional subspaces crucial for performance in synthetic tree traversal tasks, generalizing well both within and out-of-domain.

language models hierarchical reasoning representation learning AI Research

RESEARCHarXiv CS.LG·4/9/2026

$S^3$: Stratified Scaling Search for Test-Time in Diffusion Language Models

O trabalho propõe $S^3$ (Stratified Scaling Search), um método de busca guiado por verificador para melhorar a qualidade de geração em modelos de linguagem de difusão durante o tempo de inferência. Ele realoca a computação no processo de denoising, avaliando e reamostrando seletivamente candidatos promissores para favorecer saídas de maior qualidade.

Diffusion Models search algorithms language models inference

RESEARCHarXiv CS.CL·4/13/2026

EMA Is Not All You Need: Mapping the Boundary Between Structure and Content in Recurrent Context

This research explores Exponential Moving Average (EMA) traces as a minimal recurrent context to delineate the capabilities and limitations of fixed-coefficient accumulation in sequence models. It demonstrates that EMA traces excel at encoding temporal structure, matching advanced models on structural tasks, yet fundamentally fail to capture token identity, resulting in significantly reduced performance for language modeling.

language models Recurrent Context Temporal Structure sequence models

RESEARCHarXiv CS.LG·5/1/2026

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

This research investigates the training-time mechanisms of refusal in safety-aligned language models, specifically comparing supervised fine-tuning with R2D2-style dynamic adversarial fine-tuning. Findings show R2D2 initially achieves strong refusal on HarmBench but then partially reopens, while SFT remains consistently less robust.

language models model robustness Fine-tuning Adversarial Training

RESEARCHarXiv CS.CL·5/1/2026

CL-bench Life: Can Language Models Learn from Real-Life Context?

CL-bench Life is a new human-curated benchmark designed to assess whether frontier language models can effectively learn from complex, messy real-life contexts. It comprises 405 context-task pairs to test models' ability to reason over personal and social experiences.

context-learning language models Benchmarks

RESEARCHarXiv CS.CL·4/16/2026

KMMMU: Evaluation of Massive Multi-discipline Multimodal Understanding in Korean Language and Context

KMMMU is a new native Korean benchmark for evaluating multimodal understanding in Korean cultural and institutional settings, featuring 3,466 questions from native exams. The study shows that current AI models achieve only 42.05% accuracy on the full set, with significant failures in culturally and discipline-specific problems.

language models multimodal AI evaluation Benchmarking

RESEARCHarXiv CS.AI·4/27/2026

Math Takes Two: A test for emergent mathematical reasoning in communication

This paper proposes Math Takes Two, a new benchmark designed to assess the emergence of mathematical reasoning in language models through communication. It tests whether two agents, without prior mathematical knowledge, can develop a shared symbolic protocol to solve a visually grounded task where a numerical system facilitates extrapolation.

language models mathematical reasoning AI communication Benchmarks

RESEARCHarXiv CS.CL·4/8/2026

Document Optimization for Black-Box Retrieval via Reinforcement Learning

Este artigo de pesquisa propõe uma nova abordagem para otimização de documentos, transformando-os para melhor alinhamento com sistemas de recuperação via Reinforcement Learning (GRPO), utilizando melhorias de ranking como recompensa. O método, aplicável a retrievers de caixa preta, demonstrou ganhos em tarefas de recuperação de código e documentos visuais.

language models Vision-Language Models reinforcement learning document optimization

RESEARCHarXiv CS.CL·5/8/2026

Chainwash: Multi-Step Rewriting Attacks on Diffusion Language Model Watermarks

This paper investigates multi-step rewriting attacks on diffusion language model watermarks, which are used to verify AI text authorship. The findings show that watermarked texts can have their detection compromised after multiple rewrites by other language models, even those unaware of the watermark key.

Diffusion Models language models AI watermarking security

RESEARCHarXiv CS.CL·20d ago

FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation

FlowLM introduces a novel flow matching language model, adapted from pre-trained diffusion models through efficient fine-tuning. This method enables high-quality, few-step text generation that significantly outperforms traditional diffusion sampling with fewer training epochs.

Diffusion Models language models machine learning text generation

RESEARCHarXiv CS.LG·26d ago

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

This paper introduces TraFL, a novel post-training approach for diffusion language models that addresses "trajectory locking" observed in reward-maximizing methods. TraFL, a trajectory-balance objective, outperforms other methods across mathematical reasoning and code generation benchmarks.

Diffusion Models language models reinforcement learning machine learning

RESEARCHarXiv CS.AI·5/7/2026

Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games

Agent Island is a new multiagent simulation environment for language models, serving as a dynamic benchmark designed to mitigate saturation and contamination. Models like openai/gpt-5.5 are ranked based on their performance in games involving cooperation, conflict, and persuasion.

language models Benchmarking multiagent games AI

RESEARCHarXiv CS.AI·8d ago

Grokers: Bottom-Up Inductive Comprehension and Write-Time Intelligence over Typed Knowledge Graphs

Grokers is an innovative architecture for building persistent, structured comprehension of typed knowledge graphs through bottom-up inductive traversal. Unlike RAG, it shifts intelligence to write time, where autonomous Groker agents analyze and enrich attributes via language models for all future queries at zero cost.

language models AI architecture Knowledge Graphs Data Comprehension

RESEARCHarXiv CS.LG·18d ago

Teaching Language Models to Forecast Research Success Through Comparative Idea Evaluation

This paper explores training language models to forecast the empirical success of research ideas by evaluating pairs of ideas against objective outcomes. SFT significantly boosts performance beyond GPT-5, and RLVR can train models to discover interpretable reasoning paths for this forecasting task.

language models research evaluation machine learning AI forecasting

RESEARCHarXiv CS.AI·29d ago

CoCoDA: Co-evolving Compositional DAG for Tool-Augmented Agents

CoCoDA proposes a framework for tool-augmented language models, utilizing a co-evolving compositional code DAG to manage and retrieve tools efficiently. This approach addresses challenges in scaling tool libraries by encoding typed, compositional structures and pruning candidates through symbolic signature unification.

language models Tool-Augmented Agents Compositional AI AI

RESEARCHarXiv CS.CL·23d ago

Always Learning, Always Mixing: Efficient and Simple Data Mixing All The Time

This research introduces OP-Mix, a novel algorithm for efficient data mixing throughout the entire lifecycle of language model training. It addresses the challenge of combining diverse data sources for pretraining, continual learning, and adaptation, proposing a unified online decision-making solution.

language models learning data mixing machine learning

RESEARCHarXiv CS.AI·27d ago

DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

DisaBench introduces a participatory evaluation framework to assess disability-related harms in large language models, addressing the inadequacy of general-purpose safety benchmarks. It features a co-created taxonomy of twelve harm categories, a methodology pairing benign and adversarial prompts, and a dataset with human-annotated labels, revealing subtle harms often missed by standard evaluations.

language models Benchmarking AI ethics disability harms

RESEARCHarXiv CS.CL·28d ago

HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model

Hebatron is a Hebrew-specialized open-weight large language model built on NVIDIA's Nemotron-3 Mixture-of-Experts (MoE) architecture. It achieves a 73.8% Hebrew reasoning average, outperforming competitors and offering significantly higher inference throughput by activating fewer parameters per pass.

language models NVIDIA AI Hebrew AI Mixture of Experts

RESEARCHarXiv CS.AI·8d ago

Emergent Collaborative Deliberation in Multi-Model AI Systems: A BFT-Derived Protocol for Epistemic Synthesis

The Consilium Protocol, derived from Byzantine Fault Tolerance, is introduced for structured multi-model AI deliberation, treating inter-model disagreement as an epistemic signal. The study demonstrates that cognitive personas determine epistemic behavior and that RLHF alignment training creates measurable epistemic blind spots.

language models multi-model AI Epistemic synthesis Consilium Protocol