safety evaluation — AI articles, news & research

RESEARCHarXiv CS.AI·27d ago

DisaBench: A Participatory Evaluation Framework for Disability Harms in Language Models

DisaBench introduces a participatory evaluation framework to assess disability-related harms in large language models, addressing the inadequacy of general-purpose safety benchmarks. It features a co-created taxonomy of twelve harm categories, a methodology pairing benign and adversarial prompts, and a dataset with human-annotated labels, revealing subtle harms often missed by standard evaluations.

language models benchmarking AI ethics disability harms