RESEARCH28
How Far Will They Go? Red-Teaming Online Influence with Large Language Models
arXiv CS.CLΒ·May 25, 2026
This research proposes an empirical red-teaming framework to evaluate the capacity of locally deployed open-source large language models (LLMs) to support political influence campaigns, focusing on information integrity. It measures "LLM Overton Windows" and quantifies how natural-language jailbreaks expand the range of political opinions models can express, revealing systematic asymmetries in political expressivity.
Read original β