RESEARCH30

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

arXiv CS.CL·April 13, 2026

This study evaluates the performance of prompting strategies (chain-of-thought and zero-shot) in extended reasoning LLMs like Grok-4.1, varying the sampling temperature across 39 challenging mathematical problems. It found that zero-shot prompting peaks at moderate temperatures, while chain-of-thought performs best at temperature extremes, significantly increasing the benefit of extended reasoning.

mathematical reasoning LLMs Prompting Temperature Extended Reasoning

Read original ↗