RESEARCH30
Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models
arXiv CS.CLΒ·April 13, 2026
This study evaluates the performance of prompting strategies (chain-of-thought and zero-shot) in extended reasoning LLMs like Grok-4.1, varying the sampling temperature across 39 challenging mathematical problems. It found that zero-shot prompting peaks at moderate temperatures, while chain-of-thought performs best at temperature extremes, significantly increasing the benefit of extended reasoning.
Read original β