← heapsort-ai

mathematical reasoning

14 items

RESEARCHarXiv CS.CL·4/13/2026

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

This study evaluates the performance of prompting strategies (chain-of-thought and zero-shot) in extended reasoning LLMs like Grok-4.1, varying the sampling temperature across 39 challenging mathematical problems. It found that zero-shot prompting peaks at moderate temperatures, while chain-of-thought performs best at temperature extremes, significantly increasing the benefit of extended reasoning.

30
RESEARCHarXiv CS.CL·4/16/2026

Mathematical Reasoning Enhanced LLM for Formula Derivation: A Case Study on Fiber NLI Modellin

This research introduces a mathematical reasoning-enhanced generative AI approach for deriving optical communication formulas, specifically for fiber nonlinear interference modelling. By guiding an LLM with structured prompts, the study successfully reconstructed known expressions and derived a novel approximation, demonstrating both physical consistency and practical accuracy.

29
RESEARCHarXiv CS.AI·21d ago

LinAlg-Bench: A Forensic Benchmark Revealing Structural Failure Modes in LLM Mathematical Reasoning

LinAlg-Bench is a new diagnostic benchmark evaluating 10 frontier large language models (LLMs) on structured linear algebra computation, revealing structural failure modes. It assesses LLM performance across a dimensional gradient of matrices, classifying failures into ten primary error types and identifying a behavioral threshold at 4x4 matrices.

28
RESEARCHarXiv CS.CL·4/30/2026

MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese

This paper introduces MATH-PT, a novel dataset of 1,729 mathematical problems in European and Brazilian Portuguese, to address the linguistic bias in LLM mathematical reasoning evaluations. The benchmark reveals that frontier reasoning models achieve strong performance in multiple-choice questions but their performance decreases for open-ended questions.

27
RESEARCHarXiv CS.AI·4/27/2026

Math Takes Two: A test for emergent mathematical reasoning in communication

This paper proposes Math Takes Two, a new benchmark designed to assess the emergence of mathematical reasoning in language models through communication. It tests whether two agents, without prior mathematical knowledge, can develop a shared symbolic protocol to solve a visually grounded task where a numerical system facilitates extrapolation.

27
RESEARCHarXiv CS.AI·15d ago

RMA: an Agentic System for Research-Level Mathematical Problems

Research Math Agents (RMA) is an agentic framework designed for automated reasoning on complex research-level mathematical problems, distinguishing itself from prior work on competition math or formal theorem proving. RMA employs specialized modules and coordinated agents that collaboratively generate, refine, and verify candidate proofs through a multi-role, multi-round workflow, utilizing a shared structured memory.

27
RESEARCHarXiv CS.AI·12d ago

LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

LaneRoPE is a novel technique designed to enhance parallel Large Language Model (LLM) generation by enabling coordination and collaboration among multiple sequences at test time. It achieves this through an inter-sequence attention mask and a RoPE extension that injects positional information, demonstrating promising results on mathematical reasoning tasks.

27
RESEARCHarXiv CS.CL·4/7/2026

Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution

A pesquisa aborda a queda de diversidade em sistemas de co-evolução de LLMs, onde um modelo gera problemas e outro os resolve, comprometendo o aprendizado de currículo autônomo. Para resolver isso, introduz o 'vocabulary dropout', uma máscara aleatória para manter a diversidade, resultando em melhorias no desempenho de solvers em raciocínio matemático.

27
RESEARCHarXiv CS.LG·4/6/2026

LLM Reasoning with Process Rewards for Outcome-Guided Steps

Este conteúdo apresenta o PROGRS, um framework para melhorar o raciocínio matemático em LLMs, combinando modelos de recompensa de processo (PRMs) com a priorização da correção do resultado final. Ele busca resolver o problema de PRMs que podem recompensar raciocínios intermediários fluentes, mas que levam a respostas incorretas, otimizando o aprendizado com feedback mais alinhado.

27