RESEARCHarXiv CS.CL·26d ago
In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores
This paper proposes evaluating LLM fairness through in-situ conversational behavior instead of standardized tests. It introduces the MAC-Fairness framework for behavioral analysis in multi-agent dialogue, revealing the unreliability of traditional approaches.
27