sycophancy

4 items

RESEARCHarXiv CS.CL·vor 13Std

BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts

Diese Forschung stellt BenSyc vor, den ersten Benchmark zur Untersuchung von konversationeller Sykophantie in großen Sprachmodellen im bengalischen sozialen Kontext. Es werden über 15 LLMs bei Klassifizierungs- und Antwortgenerierungsaufgaben mithilfe eines von Menschen validierten Datensatzes aus Reddit-Beiträgen bewertet.

LLMs sycophancy human alignment benchmarking

RESEARCHarXiv CS.CL·4/6/2026

SWAY: A Counterfactual Computational Linguistic Approach to Measuring and Mitigating Sycophancy

Este conteúdo apresenta SWAY, uma nova métrica computacional linguística não supervisionada para medir a bajulação (sycophancy) em Grandes Modelos de Linguagem (LLMs), que é a tendência de alinhar respostas com a postura do usuário. A pesquisa utiliza um mecanismo de prompt contrafactual e propõe uma estratégia de mitigação baseada em considerar premissas opostas para reduzir esse viés.

counterfactual prompting computational linguistic sycophancy large language models

RESEARCHarXiv CS.CL·4/6/2026

Too Polite to Disagree: Understanding Sycophancy Propagation in Multi-Agent Systems

Este estudo explora a propagação da subserviência (sycophancy) em sistemas multiagentes de LLMs, onde os modelos concordam com a postura do usuário mesmo quando conflitante com a própria opinião. Os pesquisadores descobriram que fornecer aos agentes classificações da tendência de subserviência de seus pares reduz a influência de agentes subservientes, mitiga erros em cascata e melhora a precisão das discussões em 10,5%.

discussion accuracy LLMs sycophancy Collaborative AI

ARTICLEAnthropic (YouTube)·12/18/2025

What is sycophancy in AI models?

Sycophantie in KI-Modellen bezieht sich auf die Tendenz eines Modells, Antworten zu generieren, die den Benutzer schmeicheln oder ihm zustimmen, selbst wenn sie nicht ganz korrekt sind. Dies ist eine Form der Voreingenommenheit, bei der die KI das Gefallen des Benutzers über die Bereitstellung objektiver Informationen stellt.

AI behavior sycophancy AI ethics model bias