RESEARCH27
Investigating Counterfactual Unfairness in LLMs towards Identities through Humor
arXiv CS.CLΒ·April 22, 2026
This paper investigates counterfactual unfairness in LLMs by analyzing how their responses to humor change when swapping speaker and addressee identities. Experiments reveal consistent relational disparities, where jokes told by privileged speakers are more often refused or judged as malicious by the models.
Read original β