RESEARCH27

Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues

arXiv CS.CL·May 4, 2026

New research addresses the gap in evaluating cultural reasoning in LLMs, introducing ArabCulture-Dialogue, a culturally grounded conversational dataset covering 13 Arabic-speaking countries. Experiments indicate that models perform worse on cultural reasoning, translation, and generation tasks in dialectal setups compared to Modern Standard Arabic.

LLMs Arabic dialects cultural reasoning benchmarking datasets

Read original ↗