← heapsort-ai

Peer review

25 items

RESEARCHarXiv CS.CL·4/17/2026

Decoupling Scores and Text: The Politeness Principle in Peer Review

This study investigates the difficulty of interpreting peer review feedback, comparing the effectiveness of numerical scores versus text in predicting acceptance. The research reveals that score-based models are significantly more accurate (91%) than text-based models (81% even with LLMs), indicating textual information is considerably less reliable.

27
RESEARCHarXiv CS.AI·5/6/2026

Stop Automating Peer Review Without Rigorous Evaluation

This paper argues against using current AI systems for peer review, identifying two critical issues: a "hivemind effect" that reduces perspective diversity and the trivial gameability of AI review scores through paper rewriting. Empirical comparison of human- versus AI-generated reviews shows that AI reviewers are susceptible to stylistic changes rather than scientific merit, highlighting the need for non-gameability and review diversity for automation.

27