RESEARCH28

Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing

arXiv CS.CL·May 7, 2026

This study investigates hallucinations in Large Language Models (ChatGPT, Grok, Gemini, Copilot) when generating academic content, using 80 prompts across four categories. A novel weighted metric, the Hallucination Index (HI), was introduced to measure factual accuracy and reference validity.

academic writing AI quality Model Evaluation hallucinations LLM

Read original ↗