large language models

262 items

RESEARCHarXiv CS.AI·22d ago

ICRL: Learning to Internalize Self-Critique with Reinforcement Learning

ICRL proposes a novel framework to train large language model agents to internalize self-critique, converting feedback into unassisted problem-solving. It jointly trains a solver and a critic from a shared backbone, rewarding the critic for actionable feedback to foster iterative self-improvement.

reinforcement learning learning self-critique large language models

RESEARCHarXiv CS.LG·26d ago

Multi-Rollout On-Policy Distillation via Peer Successes and Failures

The paper introduces Multi-Rollout On-Policy Distillation (MOPD), a framework that uses a student's local rollout group to construct more informative teacher signals for post-training large language models. MOPD conditions the teacher on both successful and failed peer rollouts, leveraging successes for valid reasoning patterns and failures for avoiding plausible mistakes.

distillation reinforcement learning AI training machine learning

RESEARCHarXiv CS.CL·26d ago

TimelineReasoner: Advancing Timeline Summarization with Large Reasoning Models

TimelineReasoner is a novel framework that leverages Large Reasoning Models (LRMs) to advance timeline summarization, moving beyond passive Large Language Model (LLM) generation. It employs a two-stage, reasoning-driven process—Global Cognition and Detail Exploration—to actively extract and refine structured timelines from unstructured online news content.

timeline-summarization Natural Language Processing Reasoning large language models

RESEARCHarXiv CS.CL·27d ago

Decomposing Evolutionary Mixture-of-LoRA Architectures: The Routing Lever, the Lifecycle Penalty, and a Substrate-Conditional Boundary

This paper decomposes an evolutionary Mixture-of-LoRA system, examining factors such as router rewrite, per-domain evaluation, and an adaptation lifecycle. Results indicate that the router rewrite is solely responsible for the balanced log-PPL improvement observed.

neural networks machine learning large language models LoRA

RESEARCHarXiv CS.LG·27d ago

LEAP: Unlocking dLLM Parallelism via Lookahead Early-Convergence Token Detection

Diffusion Language Models (dLLMs) face scalability limits in parallelism due to overly conservative confidence thresholds that hinder their potential for highly parallel processing. This paper introduces LEAP, a training-free, plug-and-play method that improves dLLM parallelism by detecting early-converging tokens, thereby accelerating decoding.

Diffusion Models Parallel Computing AI large language models

RESEARCHarXiv CS.AI·27d ago

Rethinking LLMOps for Fraud and AML: Building a Compliance-Grade LLM Serving Stack

This research paper proposes a specialized LLMOps stack designed for fraud detection and anti-money laundering (AML) compliance, recognizing their distinct serving requirements compared to generic chat workloads. The stack integrates various advanced techniques to efficiently handle evidence-rich, schema-constrained prompts and ensure compliance-grade performance with self-hosted open-weight LLMs.

LLMOps security AML Compliance fraud detection

ARTICLEDEV.to AI·4/15/2026

GPT-6 just merged ChatGPT, Codex, and a browser into one agent.

OpenAI's new GPT-6 unifies chat, code generation, and web browsing into a single agent, leveraging a powerful base model and dual-tier reasoning architecture. This model boasts a real and usable 2M token context window, significantly improving its utility for complex tasks like IoT telemetry without extensive data chunking.

OpenAI GPT-6 Context window large language models

RESEARCHarXiv CS.CL·18d ago

Probabilistic Attribution For Large Language Models

This work uses the conditional probabilities computed by LLMs to situate them within the mathematical theory of stochastic processes. It presents a model-agnostic probabilistic token attribution measure, using Bayes' rule to capture the model's internal representation of the distribution over token sequences.

AI Theory Token Attribution Probabilistic Attribution Stochastic Processes

RESEARCHarXiv CS.LG·11d ago

Continuity and Ordinality Matter: Constraining Time Series Tokens for Effective Time Series Analysis with Large Language Models

This paper introduces COM (Continuity and Ordinality Matter), a strategy that integrates geometric constraints into both the initialization and training stages of token-based time series large language models (TS-LLMs). The research demonstrates that preserving continuity and ordinality in time series token embeddings significantly improves model performance and generalizability.

machine learning Tokenization large language models Time Series Analysis

RESEARCHarXiv CS.CL·14d ago

Improving the Completeness and Comparability of Segment Disclosures: A Large Language Model Approach

This study introduces a large language model-based framework to extract and preserve both reportable and nested segment disclosures directly from Form 10-K filings. It also incorporates a retrieval-augmented system to enhance comparability across multiple filings.

Financial Reporting Segment Disclosures Form 10-K Data Extraction

RESEARCHarXiv CS.CL·14d ago

TriVAL: A Tri-Validation Framework for Faithful Automatic Optimization Modeling

TriVAL is a novel tri-validation framework designed to enhance the accuracy of automatic optimization modeling by addressing the lack of explicit validation in current methods. It implements a construct-validate-revise loop across semantic specification, mathematical formulation, and code generation stages to mitigate errors and improve overall modeling fidelity.

AI accuracy validation framework optimization modeling operations research

RESEARCHarXiv CS.AI·14d ago

Confidence Calibration in Large Language Models

This study investigates confidence calibration in Large Language Models (LLMs) across diverse tasks, finding that current LLMs are overconfident on difficult tests and underconfident on easy ones. The researchers developed LifeEval, a new test to evaluate model calibration across varying levels of difficulty.

Confidence Calibration Overconfidence machine learning large language models

RESEARCHarXiv CS.CL·14d ago

Raon-Speech Technical Report

Raon-Speech is a top-performing 9B-parameter speech language model (SpeechLM) for English and Korean speech understanding, answering, and generation, achieving strong overall results across 42 benchmarks. It successfully transforms a pre-trained LLM into a SpeechLM while preserving strong text capabilities through specific training stages.

multimodal AI Benchmarking Natural Language Processing large language models

RESEARCHarXiv CS.AI·6d ago

ChatHealthAI: Aligning Electronic Health Record Representations with Large Language Models for Grounded Clinical Reasoning

ChatHealthAI proposes a multimodal framework to align structured electronic health record (EHR) representations with large language models (LLMs). This integration enables clinically grounded natural-language reasoning and accurate patient prediction, bridging the gap between predictive EHR models and interpretable LLM reasoning.

Clinical Reasoning multimodal AI Electronic Health Records large language models

RESEARCHarXiv CS.AI·15d ago

PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning

This research paper introduces 'PathCal', investigating the distinct functional roles and timing of reflection markers in Large Reasoning Language Models' Chain-of-Thought trajectories. It reveals that markers like 'wait' or 'but' differ significantly in their impact on accuracy and generation length, challenging previous coarse-grained approaches.

Natural Language Processing Chain-of-Thought Reasoning large language models

RESEARCHarXiv CS.CL·8d ago

Configurable Reward Model for Balanced Safety Alignment

This paper introduces the Configurable Safety Reward Model (CSRM) to address the challenge of aligning LLMs with heterogeneous and rapidly evolving safety requirements. CSRM substantially improves generalization to previously unseen safety configurations by being jointly optimized for calibrated safety compliance and reward modeling, achieving state-of-the-art performance on benchmarks.

Generalization machine learning large language models Reward Models

RESEARCHarXiv CS.AI·8d ago

PhyDrawGen: Physically Grounded Diagram Generation from Natural Language

PhyDrawGen is a neuro-symbolic pipeline designed to generate physically accurate physics diagrams from natural language text, outperforming existing models in adhering to physical laws. It leverages a large language model for scene graph extraction and a deterministic solver to enforce physical and geometric constraints.

Diagram Generation Physics AI large language models

RESEARCHarXiv CS.CL·8d ago

When English Rewrites Local Knowledge: Global Narrative Dominance in Large Language Models

This research paper investigates global narrative dominance in Large Language Models (LLMs), where local cultural knowledge is often overshadowed by global narratives. It introduces the CulturalNB dataset for Bengali cultural contexts and demonstrates that questions asked in English tend to increase global substitution and institutional framing, reducing local perspective coverage.

Dataset Cross-lingual Cultural Bias Natural Language Processing

RESEARCHarXiv CS.CL·15d ago

Evaluating Large Language Models in a Complex Hidden Role Game

This research quantifies the deceptive potential of Large Language Models (LLMs) in the social deduction game Secret Hitler, introducing novel metrics and an open-source framework. The study benchmarks LLMs against rule-based algorithms and human games, revealing a gap between conversational ability and strategic depth, and showing that reasoning-enhancement techniques can worsen performance for fascist roles.

Game AI Benchmarking deception large language models

RESEARCHarXiv CS.CL·12d ago

EvoSpec: Evolving Speculative Decoding via Real-Time Vocabulary and Parameter AdaptationTarget

EvoSpec introduces a framework for real-time evolution of draft models in speculative decoding for Large Language Models, addressing the bottleneck of large vocabulary sizes. It uses dynamic vocabulary and parameter adaptation, employing a context-aware mechanism and a lightweight online alignment strategy to improve acceptance rates and minimize distributional gaps.

Optimization machine learning large language models AI inference