← heapsort-ai

Reasoning

57 items

RESEARCHarXiv CS.AI·4/22/2026

From Natural Language to Executable Narsese: A Neuro-Symbolic Benchmark and Pipeline for Reasoning with NARS

This paper introduces a neuro-symbolic framework for translating natural-language reasoning problems into executable Narsese, leveraging first-order logic. It presents NARS-Reasoning-v0.1, a new benchmark featuring reasoning problems with corresponding formal representations and truth labels for evaluating reasoning capabilities.

27
ARTICLEDEV.to AI·19d ago

Apple Paper Argues LLMs Show 'Illusion of Thinking'

An Apple paper titled "The Illusion of Thinking" argues that Large Language Models (LLMs) lack genuine reasoning, relying only on sophisticated statistical pattern matching. Led by Mehrdad Farajtabar, the study criticizes claims from vendors like GPT-4 and Claude, highlighting failures in formal reasoning tasks requiring compositionality.

27
RESEARCHarXiv CS.LG·4/15/2026

When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation

This paper investigates how enhanced reasoning in language models can harm the fidelity of behavioral simulations, particularly when the goal is to sample boundedly rational behavior rather than solve a strategic problem. The authors identify a "solver-sampler mismatch" where LLMs over-optimize, collapsing compromise-oriented behavior and leading to diversity without fidelity in outcomes.

27
RESEARCHarXiv CS.LG·4/14/2026

Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model

This research investigates Deliberative Alignment in LLMs, a method designed to improve safety by distilling reasoning capabilities from stronger models. It uncovers an alignment gap between teacher and student models, showing that student models can retain unsafe behaviors from the base model despite learning advanced reasoning patterns. The paper proposes a BoN sampling method to address these challenges.

27
RESEARCHarXiv CS.AI·5/9/2026

BALAR : A Bayesian Agentic Loop for Active Reasoning

This paper introduces BALAR (Bayesian Agentic Loop for Active Reasoning), a task-agnostic outer-loop algorithm enabling structured multi-turn interaction between an LLM agent and a user. BALAR maintains a structured belief over latent states, selects clarifying questions by maximizing expected mutual information, and significantly outperforms baselines across diverse reasoning benchmarks.

27
RESEARCHarXiv CS.LG·4/27/2026

Universal Transformers Need Memory: Depth-State Trade-offs in Adaptive Recursive Reasoning

This research investigates the necessity of learned memory tokens as a computational scratchpad for Universal Transformers with Adaptive Computation Time (ACT) on a combinatorial reasoning benchmark, Sudoku-Extreme. It finds that memory tokens are empirically necessary for non-trivial performance, identifying a sharp lower threshold for optimal count and a common router initialization trap.

27
RESEARCHarXiv CS.LG·4/9/2026

RAGEN-2: Reasoning Collapse in Agentic RL

Este estudo introduz o conceito de 'colapso de template', uma falha em agentes LLM de múltiplas interações onde a resposta se torna agnóstica à entrada, mesmo com entropia estável. Propõe a Informação Mútua (MI) como uma métrica superior à entropia para diagnosticar a qualidade do raciocínio, correlacionando-se mais fortemente com o desempenho final.

27
RESEARCHarXiv CS.AI·4/30/2026

Grounding vs. Compositionality: On the Non-Complementarity of Reasoning in Neuro-Symbolic Systems

This work challenges the assumption that compositional reasoning emerges as a byproduct of symbol grounding in neuro-symbolic AI. It introduces the $i$LTN architecture, demonstrating that models trained solely on a grounding objective fail to generalize, while joint training on perceptual grounding and multi-step reasoning is crucial.

27
RESEARCHarXiv CS.CL·5/7/2026

Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning

This research introduces Adaptive Power-Mean Policy Optimization (APMPO) to improve Large Language Model (LLM) reasoning capabilities within Reinforcement Learning with Verifiable Rewards (RLVR). APMPO combines a generalized power-mean objective and feedback-adaptive clipping to enhance learning dynamics and performance, addressing limitations of static optimization schemes.

27
RESEARCHarXiv CS.CL·5/7/2026

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

FREIA is a novel reinforcement learning algorithm designed to enhance LLMs for unsupervised reasoning, addressing the lack of adaptability in existing methods. It employs Free Energy-Driven Reward (FER) to balance consensus and exploration, and Adaptive Advantage Shaping (AAS) to adjust learning signals. FREIA outperforms unsupervised baselines across various reasoning tasks, particularly in mathematical reasoning.

27