← heapsort
RESEARCH27

Investigating Counterfactual Unfairness in LLMs towards Identities through Humor

arXiv CS.CLΒ·April 22, 2026

This paper investigates counterfactual unfairness in LLMs by analyzing how their responses to humor change when swapping speaker and addressee identities. Experiments reveal consistent relational disparities, where jokes told by privileged speakers are more often refused or judged as malicious by the models.

Read original β†—