← heapsort-ai

Temperature

1 items

RESEARCHarXiv CS.CL·4/13/2026

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

This study evaluates the performance of prompting strategies (chain-of-thought and zero-shot) in extended reasoning LLMs like Grok-4.1, varying the sampling temperature across 39 challenging mathematical problems. It found that zero-shot prompting peaks at moderate temperatures, while chain-of-thought performs best at temperature extremes, significantly increasing the benefit of extended reasoning.

30