language models

103 items

RESEARCHarXiv CS.CL·1d ago

Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation

This paper introduces the On-Policy Diffusion Language Model (OPDLM) for transforming autoregressive models (ARLMs) into diffusion language models (DLMs). It addresses issues like knowledge loss and train-inference mismatch by employing On-Policy Distillation (OPD).

Diffusion Models language models AI models machine learning

NEWS↑ trendingReddit r/LocalLLaMA·4/17/2026

Ternary Bonsai: Top intelligence at 1.58 bits

Prism ML announced Ternary Bonsai, a new family of 1.58-bit language models designed to balance strict memory constraints with high accuracy. These models, available in 8B, 4B, and 1.7B sizes, achieve a 9x smaller memory footprint than 16-bit models while outperforming most peers.

Model Compression language models Efficient AI

Ternary Bonsai: Top intelligence at 1.58 bits

RESEARCH↑ trendingReddit r/LocalLLaMA·27d ago

sensenova/SenseNova-U1-A3B-MoT · Hugging Face

SenseNova U1 is a new series of native multimodal models that unifies multimodal understanding, reasoning, and generation within a monolithic architecture. These innovative models natively think and act across language and vision, marking a fundamental paradigm shift in multimodal AI.

language models multimodal AI unified architecture SenseNova

sensenova/SenseNova-U1-A3B-MoT · Hugging Face

ARTICLE↑ trendingReddit r/MachineLearning·4/21/2026

Bulding my own Diffusion Language Model from scratch was easier than I thought [P]

The author built a diffusion language model from scratch to better understand complex concepts, without the help of AI-generated code. They trained the 7.5M parameter model on the tiny Shakespeare dataset and shared the code on GitHub.

Diffusion Models language models personal-project machine learning

RESEARCHarXiv CS.CL·1d ago

How Language Models Fail: Token-Level Signatures of Committed and Persistent Reasoning Failures

Failures in language model reasoning emerge through distinct processes that leave identifiable token-level signatures. These failures are characterized as "committed failure" or "persistent uncertainty", and understanding these signatures helps distinguish failing from successful completions across various configurations.

language models research Reasoning AI failures

RESEARCHarXiv CS.CL·4/22/2026

Remask, Don't Replace: Token-to-Mask Refinement in Masked Diffusion Language Models

This paper proposes a novel technique, Token-to-Mask (T2M) remasking, to refine masked diffusion language models like LLaDA2.1. The method addresses the shortcomings of Token-to-Token (T2T) editing by resetting suspect tokens to a mask state, enabling more accurate re-prediction.

Diffusion Models language models error correction Natural Language Processing

RESEARCHarXiv CS.CL·14d ago

AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue

This paper introduces AERIC, a novel transfer-oriented hidden-state approach for anticipatory same-pass monitoring of implicit harmful dialogue in language models. It aims to detect potential risks early enough to prevent the exposure of harmful continuations.

harmful dialogue language models security AI safety

ARTICLEKDNuggets·4d ago

A Deep Dive into Calibration of Language Models: Platt Scaling, Isotonic Regression, Temperature Scaling

This content explores three post-hoc methods—Platt Scaling, Isotonic Regression, and Temperature Scaling—designed to enhance the calibration of language models. These techniques aim to reduce the disparity between a model's predicted confidence and its actual accuracy.

language models Calibration learning machine learning

A Deep Dive into Calibration of Language Models: Platt Scaling, Isotonic Regression, Temperature Scaling

RESEARCHDEV.to AI·4/13/2026

TALM: Tool Augmented Language Models

TALM (Tool Augmented Language Models) focuses on integrating external tools with large language models to augment their capabilities. This approach allows LLMs to perform complex tasks more effectively by leveraging specialized functions and real-world interactions.

language models LLMs NLP Tool Augmentation

RESEARCHarXiv CS.CL·4d ago

Generic Triple-Latent Compression with Gated Associative Retrieval

This research introduces generic triple-latent sequence models, which use a running token state and compressed pair-memory to capture higher-order token interactions. These models show improvement over a Transformer baseline on language-model benchmarks, though a retrieval extension enhances recall but is slower.

language models latent models sequence models associative retrieval

RESEARCHarXiv CS.CL·4d ago

Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning

This paper introduces a hybrid pre-training objective for text encoders, combining a JEPA-style latent-space prediction loss with a standard Masked Language Modelling (MLM) objective. This new approach aims to encourage representations anchored to deeper semantic structure rather than just surface-form token identity, showing significantly more uniform embeddings.

language models deep learning self-supervised learning machine learning

RESEARCHarXiv CS.CL·19d ago

Data Scaling as Progressive Coverage of a Predictive Contribution Spectrum

This research investigates whether real-data scaling laws are governed by a progressive coverage of a latent predictive contribution spectrum, rather than solely by token-frequency. Using a suffix-automaton and a global-KL predictive contribution spectrum, the study finds a strong correlation between the spectrum's tail slope and the data-scaling exponent of GPT learners, showing that effective truncation rank scales logarithmically.

language models data scaling machine learning predictive models

RESEARCHarXiv CS.CL·4/13/2026

Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models

This paper reveals a critical vulnerability in diffusion-based language models (dLLMs) where their safety alignment, based on monotonic denoising schedules, can be easily bypassed. By re-masking refusal tokens and injecting an affirmative prefix, researchers achieved high attack success rates against prominent dLLMs, exposing a structural flaw.

Diffusion Models language models vulnerability Exploitation

RESEARCHarXiv CS.AI·20d ago

Learn-by-Wire Training Control Governance: Bounded Autonomous Training Under Stress for Stability and Efficiency

This paper introduces Learn-by-Wire Guard (LBW-Guard), a bounded autonomous training-control governance layer for language models. It aims to improve training stability and efficiency by observing telemetry and applying bounded control, significantly reducing final perplexity.

language models deep learning AI training model stability

RESEARCHarXiv CS.CL·4/24/2026

GRISP: Guided Recurrent IRI Selection over SPARQL Skeletons

GRISP is a novel SPARQL-based question-answering method over knowledge graphs, leveraging a fine-tuned small language model (SLM). It generates SPARQL query skeletons from natural language questions and iteratively refines them by selecting knowledge graph items, achieving state-of-the-art results on Wikidata and Freebase benchmarks.

language models Knowledge Graphs SPARQL Question Answering

RESEARCHarXiv CS.AI·29d ago

When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment

This research introduces a "finite-answer preference stabilization" method to determine when a language model's answer preference becomes stable before its final output. It shows that this stabilization often occurs before the answer is parseable, with a significant lead time.

language models cognitive science machine learning NLP

RESEARCHarXiv CS.CL·22d ago

Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance

This paper presents a comprehensive analysis of neural activation patterns across six distinct large language model (LLM) architectures, examining their performance on twelve cognitive task categories. The findings reveal fundamental differences in how encoder and decoder architectures process diverse cognitive tasks, with mathematical reasoning consistently producing the highest attention entropy and decoder models exhibiting significantly higher sparsity.

neural networks language models cognitive science Model Analysis

RESEARCHarXiv CS.LG·15d ago

The Readout Shortcut: Positional Number Copying Dominates Arithmetic CoT Readout in Small Language Models

This research study reveals that small instruction-tuned language models (LMs) using Chain-of-Thought (CoT) for arithmetic often employ a positional shortcut, copying whichever number occupies the trailing position before the answer delimiter. This shortcut dominates, even when intermediate reasoning is correct, significantly impacting answer accuracy.

language models CoT Prompting Arithmetic

RESEARCHarXiv CS.CL·5d ago

Discourse-Role Labels as Presentation-Time Variables for Context Use in Language Models

This study investigates the effect of discourse-role labels, such as "Reference" or "Instruction," on language model behavior. It reveals that the adoption rate of misleading information can shift significantly (56-84 percentage points) depending on the label, with labels like "Instruction" increasing adoption and "Example" consistently suppressing it.

language models Context NLP model behavior

ARTICLEDEV.to AI·18d ago

Playing with Words at the National Library of Sweden -- Making a Swedish BERT

The article discusses the process of creating a BERT model for the Swedish language, a project developed at the National Library of Sweden. The aim is to enhance natural language processing for Swedish.

language models BERT NLP National Library