← heapsort
RESEARCH27

Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues

arXiv CS.CLΒ·May 4, 2026

New research addresses the gap in evaluating cultural reasoning in LLMs, introducing ArabCulture-Dialogue, a culturally grounded conversational dataset covering 13 Arabic-speaking countries. Experiments indicate that models perform worse on cultural reasoning, translation, and generation tasks in dialectal setups compared to Modern Standard Arabic.

Read original β†—