Reasoning

57 items

RESEARCHarXiv CS.CL·26d ago

TimelineReasoner: Advancing Timeline Summarization with Large Reasoning Models

TimelineReasoner is a novel framework that leverages Large Reasoning Models (LRMs) to advance timeline summarization, moving beyond passive Large Language Model (LLM) generation. It employs a two-stage, reasoning-driven process—Global Cognition and Detail Exploration—to actively extract and refine structured timelines from unstructured online news content.

timeline-summarization Natural Language Processing Reasoning large language models

RESEARCHarXiv CS.CL·20d ago

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

This paper introduces Stepwise Confidence Attribution (SCA), a framework for closed-source LLMs that diagnoses multi-step reasoning failures by assigning step-level confidence. SCA applies the Information Bottleneck principle, flagging deviations from consensus structures as potential errors, and proposes two complementary methods: NIBS and GIBS.

LLMs information bottleneck Reasoning confidence estimation

RESEARCHarXiv CS.AI·15d ago

PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning

This research paper introduces 'PathCal', investigating the distinct functional roles and timing of reflection markers in Large Reasoning Language Models' Chain-of-Thought trajectories. It reveals that markers like 'wait' or 'but' differ significantly in their impact on accuracy and generation length, challenging previous coarse-grained approaches.

Natural Language Processing Chain-of-Thought Reasoning large language models

RESEARCHarXiv CS.CL·8d ago

Can LLM Teams Play What? Where? When?

This research explores how team-based interactions improve Large Language Model (LLM) performance on complex reasoning tasks, specifically in the quiz game What? Where? When?. It demonstrates that team strategies yield significant accuracy gains, with the best teams approaching human performance.

LLMs team strategies Benchmarking Reasoning

RESEARCHarXiv CS.AI·14d ago

How Much Thinking is Enough? Quantifying and Understanding Redundancy in LLM Reasoning

This paper quantifies and explains redundancy in large language model (LLM) reasoning, formalizing the concept and measuring it at scale. The research reveals that between 61% and 93% of LLM thought steps are unnecessary, impacting latency, GPU time, and energy consumption.

efficiency Benchmarking Reasoning redundancy

RESEARCHarXiv CS.CL·6d ago

Adaptive Latent Agentic Reasoning

This research introduces Adaptive Latent Agentic Reasoning (ALAR), a dual-mode framework designed to enhance the efficiency of LLM agents. ALAR uses compact latent reasoning for routine tasks and escalates to explicit chain-of-thought when deeper deliberation is required, leading to comparable or better task accuracy with substantial efficiency gains.

LLMs machine learning efficiency Reasoning

RESEARCHarXiv CS.LG·13d ago

ARBITER: Reasoning Trajectory Basins and Majority Vote Failures in Test-Time Sampling

When language models use test-time sampling and majority vote, reasoning trajectories concentrate into non-independent

language models Model Evaluation Reasoning AI Research

RESEARCHHugging Face Blog·4/15/2026

Inside VAKRA: Reasoning, Tool Use, and Failure Modes of Agents

This content delves into VAKRA, an AI agent system, examining its reasoning processes, how it utilizes tools, and the various modes in which it can fail. It provides insights into the operational characteristics and limitations of advanced AI agents.

failure modes VAKRA Reasoning tool use

RESEARCHarXiv CS.AI·4/9/2026

SELFDOUBT: Uncertainty Quantification for Reasoning LLMs via the Hedge-to-Verify Ratio

Este artigo propõe SELFDOUBT, uma estrutura de passagem única para quantificar a incerteza em LLMs de raciocínio, especialmente para APIs proprietárias. Utiliza o Hedge-to-Verify Ratio (HVR) para identificar marcadores de incerteza e autoavaliação diretamente do rastro de raciocínio, superando métodos caros de amostragem.

LLMs Model Evaluation Uncertainty Quantification Reasoning

RESEARCHarXiv CS.AI·4/30/2026

Auto-Relational Reasoning

Researchers propose a novel theoretical framework for automated relational reasoning, integrating Machine Learning with rigid reasoning to surpass the limitations of current large models. The resulting system demonstrates high performance on IQ problems, achieving a 98.03% solving rate without prior knowledge.

neural networks machine learning Reasoning problem-solving

RESEARCHarXiv CS.AI·4/23/2026

The Tool-Overuse Illusion: Why Does LLM Prefer External Tools over Internal Knowledge?

This paper reveals the pervasive phenomenon of "tool overuse" in LLMs, where models unnecessarily use external tools. It identifies a "knowledge epistemic illusion" and proposes a direct preference optimization-based strategy that reduces tool usage by 82.8% while improving accuracy.

LLMs Knowledge Representation Reasoning model behavior

RESEARCHarXiv CS.CL·5/6/2026

Evaluating Reasoning Models for Queries with Presuppositions

This research evaluates how large reasoning models handle user queries containing factually inaccurate presuppositions. It finds that while reasoning models show a slight improvement over non-reasoning models, they still fail to challenge a significant fraction of false assumptions.

presuppositions AI models LLMs evaluation

RESEARCHarXiv CS.CL·4/15/2026

Filtered Reasoning Score: Evaluating Reasoning Quality on a Model's Most-Confident Traces

This research introduces the "Filtered Reasoning Score," a novel metric designed to assess the quality of reasoning in AI models. It specifically focuses on evaluating the reasoning evident in a model's most confident outputs or traces.

AI metrics machine learning Reasoning AI evaluation

RESEARCHarXiv CS.LG·4/24/2026

The Path Not Taken: Duality in Reasoning about Program Execution

The title suggests an exploration of duality in reasoning about program execution, indicating a deep analysis of alternative approaches. It likely delves into formal and logical methods for understanding how programs operate.

formal methods Reasoning Program execution Duality

ARTICLEDEV.to AI·4/12/2026

We Hit 99.1% on the LOCOMO Benchmark. Here's How.

A team achieved 99.1% on the LOCOMO benchmark, which assesses AI agents' multi-hop reasoning with stored memories. This breakthrough was attributed to removing a single premise rather than developing a complex new model.

Memory Systems Benchmarking Reasoning AI

NEWSTogether AI Blog·3/18/2026

Together AI expands fine-tuning service with tool calling, reasoning, and vision support

Together AI has expanded its fine-tuning service with native support for tool calling, reasoning, and vision-language models. The enhancements also include 100B+ model training, up to 6x higher throughput, and job cost and ETA estimates.

Vision-Language Models tool-calling Reasoning Together AI

NEWSDEV.to AI·4/24/2026

DeepSeek V4 Rivoluziona l'IA con un Contesto da 1 Milione di Token e Ragionamento di Classe Mondiale

DeepSeek V4 is revolutionizing AI by introducing a 1 million token context window and world-class reasoning capabilities. The announcement details the key points, with a more in-depth analysis available in the full article.

DeepSeek AI models Context window Reasoning