mathematical reasoning

14 items

RESEARCHarXiv CS.LG·1d ago

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

This research proposes "program-of-layers (PoLar)" for LLMs, enabling dynamic skipping or looping of pretrained layers during inference to achieve better or equivalent accuracy with shorter execution paths. A lightweight prediction network learns to generate these customized programs, demonstrating improved performance on mathematical reasoning benchmarks.

neural networks mathematical reasoning inference LLMs

RESEARCHarXiv CS.AI·1d ago

CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions

This paper introduces CrowdMath, a dataset of 164 expert-annotated progress chains from the MIT PRIMES--Art of Problem Solving CrowdMath program. It aims to evaluate large language models on collaborative open-problem solving in mathematical research, diverging from benchmarks focused on final answers or complete proofs.

mathematical reasoning LLMs datasets Benchmarks

RESEARCHarXiv CS.AI·5d ago

Characterizing initial human-AI proof formalization workflows

This paper investigates how people use AI tools in the formalization of mathematical proofs, a long-standing challenge in verifying mathematical arguments. Through a mixed-methods analysis, the study explores user preferences and challenges in AI integration, with a general desire for assistance that preserves high-level human control.

mathematical reasoning AI Systems human-AI interaction proof formalization

RESEARCHarXiv CS.CL·4/13/2026

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

This study evaluates the performance of prompting strategies (chain-of-thought and zero-shot) in extended reasoning LLMs like Grok-4.1, varying the sampling temperature across 39 challenging mathematical problems. It found that zero-shot prompting peaks at moderate temperatures, while chain-of-thought performs best at temperature extremes, significantly increasing the benefit of extended reasoning.

mathematical reasoning LLMs Prompting Temperature

RESEARCHarXiv CS.CL·4/16/2026

Mathematical Reasoning Enhanced LLM for Formula Derivation: A Case Study on Fiber NLI Modellin

This research introduces a mathematical reasoning-enhanced generative AI approach for deriving optical communication formulas, specifically for fiber nonlinear interference modelling. By guiding an LLM with structured prompts, the study successfully reconstructed known expressions and derived a novel approximation, demonstrating both physical consistency and practical accuracy.

mathematical reasoning LLMs Scientific Discovery Generative AI

RESEARCHDEV.to AI·22d ago

Solving Math Word Problems by Combining Language Models With Symbolic Solvers

This research explores a novel approach to solving math word problems by integrating the power of language models with the precision of symbolic solvers. The method aims to leverage both natural language understanding and formal mathematical reasoning to achieve robust solutions.

mathematical reasoning Symbolic AI natural language processing problem-solving

RESEARCHarXiv CS.AI·21d ago

LinAlg-Bench: A Forensic Benchmark Revealing Structural Failure Modes in LLM Mathematical Reasoning

LinAlg-Bench is a new diagnostic benchmark evaluating 10 frontier large language models (LLMs) on structured linear algebra computation, revealing structural failure modes. It assesses LLM performance across a dimensional gradient of matrices, classifying failures into ten primary error types and identifying a behavioral threshold at 4x4 matrices.

mathematical reasoning Benchmarking linear algebra AI evaluation

RESEARCHarXiv CS.CL·4/30/2026

MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese

This paper introduces MATH-PT, a novel dataset of 1,729 mathematical problems in European and Brazilian Portuguese, to address the linguistic bias in LLM mathematical reasoning evaluations. The benchmark reveals that frontier reasoning models achieve strong performance in multiple-choice questions but their performance decreases for open-ended questions.

Dataset mathematical reasoning LLMs Benchmarking

RESEARCHarXiv CS.AI·4/27/2026

Math Takes Two: A test for emergent mathematical reasoning in communication

This paper proposes Math Takes Two, a new benchmark designed to assess the emergence of mathematical reasoning in language models through communication. It tests whether two agents, without prior mathematical knowledge, can develop a shared symbolic protocol to solve a visually grounded task where a numerical system facilitates extrapolation.

language models mathematical reasoning AI communication Benchmarks

RESEARCHarXiv CS.AI·15d ago

RMA: an Agentic System for Research-Level Mathematical Problems

Research Math Agents (RMA) is an agentic framework designed for automated reasoning on complex research-level mathematical problems, distinguishing itself from prior work on competition math or formal theorem proving. RMA employs specialized modules and coordinated agents that collaboratively generate, refine, and verify candidate proofs through a multi-role, multi-round workflow, utilizing a shared structured memory.

mathematical reasoning proof verification Automated reasoning Research Methods

RESEARCHarXiv CS.AI·12d ago

LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

LaneRoPE is a novel technique designed to enhance parallel Large Language Model (LLM) generation by enabling coordination and collaboration among multiple sequences at test time. It achieves this through an inter-sequence attention mask and a RoPE extension that injects positional information, demonstrating promising results on mathematical reasoning tasks.

mathematical reasoning attention mechanisms Positional Encoding Parallel Processing

RESEARCHarXiv CS.CL·4/7/2026

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

A pesquisa aborda a queda de diversidade em sistemas de co-evolução de LLMs, onde um modelo gera problemas e outro os resolve, comprometendo o aprendizado de currículo autônomo. Para resolver isso, introduz o 'vocabulary dropout', uma máscara aleatória para manter a diversidade, resultando em melhorias no desempenho de solvers em raciocínio matemático.

mathematical reasoning diversity Co-evolution self-play

RESEARCHarXiv CS.LG·4/6/2026

LLM Reasoning with Process Rewards for Outcome-Guided Steps

Este conteúdo apresenta o PROGRS, um framework para melhorar o raciocínio matemático em LLMs, combinando modelos de recompensa de processo (PRMs) com a priorização da correção do resultado final. Ele busca resolver o problema de PRMs que podem recompensar raciocínios intermediários fluentes, mas que levam a respostas incorretas, otimizando o aprendizado com feedback mais alinhado.

mathematical reasoning Process Rewards reinforcement learning AI

RESEARCHQwen Blog·1/13/2025

Towards Effective Process Supervision in Mathematical Reasoning

Modelos de Linguagem Grandes (LLMs) têm feito avanços notáveis no raciocínio matemático, mas podem cometer erros de cálculo ou lógica. Mesmo quando as respostas finais estão corretas, os LLMs podem criar passos de raciocínio plausíveis, mas falhos, comprometendo a confiabilidade de seus processos.

mathematical reasoning LLMs Process Supervision AI limitations