← heapsort-ai

Data Annotation

3 items

RESEARCHarXiv CS.CL·29d ago

MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media

MultiSoc-4D is a new Bengali social media dataset benchmark designed to diagnose LLM behavior in closed-set annotation. The research identifies "instruction-induced label collapse," a phenomenon where LLMs systematically prefer fallback labels, leading to under-detection of minority categories.

27