RESEARCH28

How Far Will They Go? Red-Teaming Online Influence with Large Language Models

arXiv CS.CL·May 25, 2026

This research proposes an empirical red-teaming framework to evaluate the capacity of locally deployed open-source large language models (LLMs) to support political influence campaigns, focusing on information integrity. It measures "LLM Overton Windows" and quantifies how natural-language jailbreaks expand the range of political opinions models can express, revealing systematic asymmetries in political expressivity.

red-teaming security online influence misinformation LLM

Read original ↗