RESEARCH27
MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media
arXiv CS.CLΒ·May 11, 2026
MultiSoc-4D is a new Bengali social media dataset benchmark designed to diagnose LLM behavior in closed-set annotation. The research identifies "instruction-induced label collapse," a phenomenon where LLMs systematically prefer fallback labels, leading to under-detection of minority categories.
Read original β