Reasoning

57 items

RESEARCHarXiv CS.AI·20h ago

Why Limit the Residual Stream to Layers and Not Tokens? Persistent Memory for Continuous Latent Reasoning

Large language models (LLMs) face a limitation called the 'concept bottleneck,' where they lose critical facts in deep latent reasoning. This paper proposes AGCLR (Adaptive Gated Continuous Latent Reasoning) to address this by augmenting CoCoNuT with a Gated Concept Stream for persistent memory.

machine learning Latent Reasoning Reasoning AI Research

NEWS↑ trendingReddit r/LocalLLaMA·4/8/2026

Meta new reasoning model Muse Spark

O conteúdo anuncia o lançamento do Muse Spark, um novo modelo de raciocínio de IA desenvolvido pela Meta. Este modelo promete avançar as capacidades de raciocínio em inteligência artificial.

Muse Spark Reasoning AI model Meta

RESEARCHarXiv CS.CL·1d ago

How Language Models Fail: Token-Level Signatures of Committed and Persistent Reasoning Failures

Failures in language model reasoning emerge through distinct processes that leave identifiable token-level signatures. These failures are characterized as "committed failure" or "persistent uncertainty", and understanding these signatures helps distinguish failing from successful completions across various configurations.

language models research Reasoning AI failures

RESEARCHDEV.to AI·14d ago

Meta-Stanford Survey: Code as Agent Harness Improves AI Reasoning

A survey from Meta, Stanford, and Illinois suggests that AI agents perform better when code functions as their main working layer, a concept termed an "agent harness". This approach shifts AI's focus from mere text prediction to executable reasoning, enhancing its ability to handle complex tasks and minimize errors.

agent harness LLMs code Reasoning

RESEARCHarXiv CS.LG·4/13/2026

Robust Reasoning Benchmark

This study proposes a new perturbation pipeline to evaluate the robustness of LLM reasoning, applying it to the AIME 2024 dataset. While frontier models show resilience, open-weight models suffer catastrophic accuracy drops, exposing structural fragility and potential issues with working memory or mechanical parsing.

robustness LLMs Model Evaluation Reasoning

RESEARCHarXiv CS.CL·6d ago

Adaptive Latent Agentic Reasoning

This research introduces Adaptive Latent Agentic Reasoning (ALAR), a dual-mode framework designed to enhance the efficiency of LLM agents. ALAR uses compact latent reasoning for routine tasks and escalates to explicit chain-of-thought when deeper deliberation is required, leading to comparable or better task accuracy with substantial efficiency gains.

LLMs machine learning efficiency Reasoning

RESEARCHarXiv CS.CL·4/20/2026

Think Multilingual, Not Harder: A Data-Efficient Framework for Teaching Reasoning Models to Code-Switch

This research introduces a data-efficient fine-tuning framework to teach large language models to effectively code-switch for reasoning tasks. It identifies beneficial code-switched behaviors, moving beyond treating code-switching as an error, through systematic analysis of diverse reasoning traces.

Multilingual AI Code-Switching Reasoning large language models

RESEARCHDEV.to AI·4/22/2026

What VAKRA Reveals About Why Agents Actually Fail

VAKRA, a new benchmark from IBM Research, reveals that AI agents fail in predictable, structural ways by mapping fracture points between reasoning, tool selection, and execution. It decomposes agent failure into six specific categories, moving beyond traditional binary task completion evaluations to uncover common weaknesses.

failure analysis Model Evaluation Benchmarking Reasoning

RESEARCHarXiv CS.CL·4/24/2026

AITP: Traffic Accident Responsibility Allocation via Multimodal Large Language Models

AITP is introduced as a multimodal large language model designed for traffic accident responsibility allocation, enhancing reasoning through Multimodal Chain-of-Thought and integrating legal knowledge via Retrieval-Augmented Generation. The research also presents DecaTARA, a comprehensive decathlon-style benchmark with 67,941 annotated videos and 195,821 question-answer pairs.

multimodal AI Reasoning Benchmarks large language models

RESEARCHDEV.to AI·4/20/2026

O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

O1-Pruner introduces a length-harmonizing fine-tuning method aimed at improving reasoning capabilities through model pruning. This technique focuses on optimizing models for specific O1-like reasoning tasks.

Pruning Reasoning Fine-tuning model optimization

RESEARCHarXiv CS.AI·5d ago

Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal

This paper argues that reducing disagreement in multi-agent systems is insufficient for value-laden tasks, proposing a knowledge-representation layer. This layer abstracts reasoning traces and agent decisions into symbolic disagreement states, distinguishing four types, with application in content moderation.

Disagreement Knowledge Representation Reasoning content moderation

RESEARCHarXiv CS.CL·4/9/2026

The Stepwise Informativeness Assumption: Why are Entropy Dynamics and Reasoning Correlated in LLMs?

Este artigo investiga a correlação entre a dinâmica interna de entropia e o raciocínio correto em Large Language Models (LLMs), um enigma ainda sem solução. Propõe a Hipótese de Informatividade Gradual (SIA), que afirma que os modelos raciocinam corretamente ao acumular informações relevantes sobre a resposta por meio de prefixos informativos, um processo reforçado por métodos de treinamento padrão.

information theory LLMs machine learning Reasoning

RESEARCHarXiv CS.AI·5/4/2026

Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents

This research challenges the assumption that tool-augmented reasoning always improves LLM performance, showing that it can underperform native CoT due to a "tool-use tax" from the tool-calling protocol, especially with semantic noise. A Factorized Intervention Framework is proposed to analyze this, and G-STEP is introduced as a partial mitigation for protocol-induced errors.

LLM Agents Reasoning AI performance tool use

RESEARCHarXiv CS.CL·19d ago

Long-Context Reasoning Through Proxy-Based Chain-of-Thought Tuning

Large language models struggle with complex long-context reasoning tasks despite supporting extensive inputs. ProxyCoT is a novel training framework designed to transfer reasoning capabilities from short proxy contexts to full long contexts, outperforming strong baselines.

machine learning natural language processing Reasoning large language models

RESEARCHarXiv CS.AI·6d ago

Visual Graph Scaffolds for Structural Reasoning in Large Language Models

This research explores using visual graph scaffolds to organize reasoning in Large Language Models (LLMs), inspired by human mind maps. Experiments on multi-hop question answering reveal that visual graph guidance significantly improves reasoning efficiency and answer quality compared to flattened text representations.

LLMs Graph Structures Reasoning artificial intelligence

RESEARCHarXiv CS.CL·4/10/2026

Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

Este artigo propõe uma estrutura de refinamento baseada em raciocínio que utiliza LLMs como juízes semânticos para validar e reestruturar os resultados de algoritmos de agrupamento de texto não supervisionados. A estrutura inclui verificação de coerência, adjudicação de redundância e fundamentação de rótulos, visando melhorar a qualidade dos clusters sem dados rotulados.

LLMs Text Clustering Reasoning semantic analysis

RESEARCHarXiv CS.LG·15d ago

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

This research proposes that LLM reasoning is a dynamic decoding state, not a static property, observable through early-stage entropy dynamics during generation. Tasks benefiting from Chain-of-Thought exhibit consistent entropy reduction, interpreted as a phase-transition to a structured reasoning regime.

AI models LLMs Chain-of-Thought Reasoning

RESEARCHarXiv CS.AI·4/6/2026

Compositional Neuro-Symbolic Reasoning

O título refere-se à pesquisa sobre raciocínio neuro-simbólico composicional, uma área avançada da inteligência artificial. Este campo explora a integração de redes neurais com sistemas simbólicos para permitir raciocínio mais robusto e estruturado.

Compositionality Reasoning Neuro-symbolic AI

RESEARCHarXiv CS.LG·4/15/2026

How Transformers Learn to Plan via Multi-Token Prediction

This paper investigates how Multi-token prediction (MTP) enables Transformers to learn to plan, outperforming standard Next-token prediction (NTP). Empirically, MTP consistently improves performance on reasoning tasks, and theoretically, it induces a two-stage reverse reasoning process via gradient decoupling.

Next-token prediction Planning Multi-Token Prediction Reasoning

ARTICLEGoogle for Developers (YouTube)·19d ago

Building agents with real-world reasoning

This content explores the methodologies and challenges involved in developing AI agents capable of robust real-world reasoning. It delves into the techniques required to enable agents to interact effectively with complex, dynamic environments.

agent development Reasoning real-world AI AI agents

Building agents with real-world reasoning