Chain-of-Thought

10 items

RESEARCHarXiv CS.AI·4/14/2026

OOWM: Structuring Embodied Reasoning and Planning via Object-Oriented Programmatic World Modeling

Object-Oriented World Modeling (OOWM) is a novel framework addressing the limitations of Chain-of-Thought prompting in embodied tasks. It structures embodied reasoning and robotic planning by redefining the world model as an explicit symbolic tuple and leveraging software engineering formalisms like UML.

Robotic Planning LLMs Chain-of-Thought Embodied Reasoning

RESEARCHarXiv CS.LG·4/6/2026

From Broad Exploration to Stable Synthesis: Entropy-Guided Optimization for Autoregressive Image Generation

O artigo analisa a interação entre Chain-of-Thought (CoT) e Reinforcement Learning (RL) na geração de imagens a partir de texto (T2I) usando uma análise sistemática baseada em entropia. Ele revela que menor entropia dos tokens de imagem e do CoT textual se correlaciona com melhor qualidade de imagem, propondo a estratégia Entropy-Guided Group Relative Policy Optimization (EG-GRPO) para otimização com base na incerteza.

Optimization deep learning reinforcement learning Text-to-Image Generation

RESEARCHarXiv CS.AI·4/20/2026

LLM Reasoning Is Latent, Not the Chain of Thought

This position paper argues that large language model (LLM) reasoning should be studied as latent-state trajectory formation rather than faithful surface chain-of-thought (CoT). It formalizes three competing hypotheses regarding the primary object of reasoning, impacting claims about faithfulness, interpretability, and benchmarks.

Chain-of-Thought interpretability AI Reasoning large language models

RESEARCHarXiv CS.AI·4/13/2026

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

Sequence-Level PPO (SPPO) addresses the limitations of standard token-level PPO in long-horizon LLM reasoning tasks by reformulating the process as a Sequence-Level Contextual Bandit problem. This approach uses a decoupled scalar value function to derive low-variance advantage signals, offering improved sample efficiency and stability without the high computational overhead of critic-free alternatives.

LLMs reasoning tasks reinforcement learning PPO

ARTICLEDEV.to AI·4/13/2026

AI Agent Black Boxes Have Two Layers — Technical Limits and Business Incentives

The text explores how Chain-of-Thought (CoT) has evolved from an external prompt engineering technique to an internal reasoning capability in advanced AI models. Research indicates that applying external CoT to these models is now ineffective, as the reasoning process has been internalized.

prompt engineering Chain-of-Thought AI Reasoning AI

RESEARCHarXiv CS.LG·15d ago

When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions

This research proposes that LLM reasoning is a dynamic decoding state, not a static property, observable through early-stage entropy dynamics during generation. Tasks benefiting from Chain-of-Thought exhibit consistent entropy reduction, interpreted as a phase-transition to a structured reasoning regime.

AI models LLMs Chain-of-Thought Reasoning

RESEARCHarXiv CS.CL·4/10/2026

Decompose, Look, and Reason: Reinforced Latent Reasoning for VLMs

Este artigo propõe o DLR, um framework de raciocínio latente reforçado para Vision-Language Models (VLMs) que melhora o raciocínio visual complexo, superando a perda de informação em CoT textual. Ele decompõe dinamicamente consultas, extrai latentes visuais e deduz respostas, oferecendo maior interpretabilidade e superando baselines em benchmarks vision-centric.

Vision-Language Models visual reasoning Reinforced Latent Reasoning Chain-of-Thought

RESEARCHarXiv CS.CL·4/8/2026

TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models

Este artigo propõe um método baseado em topologia para otimizar cadeias de raciocínio em LLMs, visando superar lacunas lógicas e custos elevados. Ele quantifica características estruturais de CoT, ToT e GoT usando homologia persistente para aprimorar o paradigma CoT.

LLMs Chain-of-Thought Reasoning Tree-of-Thoughts

RESEARCHarXiv CS.AI·15d ago

PathCal: State-Aware Reflection-Marker Calibration for Efficient Reasoning

This research paper introduces 'PathCal', investigating the distinct functional roles and timing of reflection markers in Large Reasoning Language Models' Chain-of-Thought trajectories. It reveals that markers like 'wait' or 'but' differ significantly in their impact on accuracy and generation length, challenging previous coarse-grained approaches.

natural language processing Chain-of-Thought Reasoning large language models

RESEARCHarXiv CS.CL·28d ago

Sanity Checks for Long-Form Hallucination Detection

This research paper introduces a controlled-invariance methodology for hallucination detection in large language models. Using oracle tests like extsc{Force} and extsc{Remove}, it investigates whether detection methods evaluate reasoning or merely surface correlates of the final answer.

hallucination detection Chain-of-Thought large language models LLM evaluation