← heapsort
RESEARCH30

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

arXiv CS.CLΒ·April 13, 2026

This study evaluates the performance of prompting strategies (chain-of-thought and zero-shot) in extended reasoning LLMs like Grok-4.1, varying the sampling temperature across 39 challenging mathematical problems. It found that zero-shot prompting peaks at moderate temperatures, while chain-of-thought performs best at temperature extremes, significantly increasing the benefit of extended reasoning.

Read original β†—