← heapsort-ai

Reasoning

57 items

RESEARCHarXiv CS.LG·4/13/2026

Robust Reasoning Benchmark

This study proposes a new perturbation pipeline to evaluate the robustness of LLM reasoning, applying it to the AIME 2024 dataset. While frontier models show resilience, open-weight models suffer catastrophic accuracy drops, exposing structural fragility and potential issues with working memory or mechanical parsing.

30
RESEARCHarXiv CS.CL·6d ago

Adaptive Latent Agentic Reasoning

This research introduces Adaptive Latent Agentic Reasoning (ALAR), a dual-mode framework designed to enhance the efficiency of LLM agents. ALAR uses compact latent reasoning for routine tasks and escalates to explicit chain-of-thought when deeper deliberation is required, leading to comparable or better task accuracy with substantial efficiency gains.

29
RESEARCHarXiv CS.CL·4/24/2026

AITP: Traffic Accident Responsibility Allocation via Multimodal Large Language Models

AITP is introduced as a multimodal large language model designed for traffic accident responsibility allocation, enhancing reasoning through Multimodal Chain-of-Thought and integrating legal knowledge via Retrieval-Augmented Generation. The research also presents DecaTARA, a comprehensive decathlon-style benchmark with 67,941 annotated videos and 195,821 question-answer pairs.

29
RESEARCHarXiv CS.AI·5d ago

Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal

This paper argues that reducing disagreement in multi-agent systems is insufficient for value-laden tasks, proposing a knowledge-representation layer. This layer abstracts reasoning traces and agent decisions into symbolic disagreement states, distinguishing four types, with application in content moderation.

28
RESEARCHarXiv CS.CL·4/9/2026

The Stepwise Informativeness Assumption: Why are Entropy Dynamics and Reasoning Correlated in LLMs?

Este artigo investiga a correlação entre a dinâmica interna de entropia e o raciocínio correto em Large Language Models (LLMs), um enigma ainda sem solução. Propõe a Hipótese de Informatividade Gradual (SIA), que afirma que os modelos raciocinam corretamente ao acumular informações relevantes sobre a resposta por meio de prefixos informativos, um processo reforçado por métodos de treinamento padrão.

28
RESEARCHarXiv CS.AI·5/4/2026

Are Tools All We Need? Unveiling the Tool-Use Tax in LLM Agents

This research challenges the assumption that tool-augmented reasoning always improves LLM performance, showing that it can underperform native CoT due to a "tool-use tax" from the tool-calling protocol, especially with semantic noise. A Factorized Intervention Framework is proposed to analyze this, and G-STEP is introduced as a partial mitigation for protocol-induced errors.

28
RESEARCHarXiv CS.CL·4/10/2026

Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

Este artigo propõe uma estrutura de refinamento baseada em raciocínio que utiliza LLMs como juízes semânticos para validar e reestruturar os resultados de algoritmos de agrupamento de texto não supervisionados. A estrutura inclui verificação de coerência, adjudicação de redundância e fundamentação de rótulos, visando melhorar a qualidade dos clusters sem dados rotulados.

28