activation steering

2 items

RESEARCHarXiv CS.CL·9d ago

Cross-Lingual Steering for Figurative Language Generation

This research investigates whether internal signals driving figurative language generation in multilingual large language models are language-specific or reusable across languages. The study demonstrated that figurative category directions reliably steer within their own language and, importantly, transfer robustly across languages, indicating a shared component for this capability.

figurative language multilingual LLMs language generation cross-lingual transfer

RESEARCHarXiv CS.CL·14d ago

Cultural Value Alignment Via Latent Activation Steering in Large Language Models

This paper proposes a novel framework for evaluating and intervening in cultural value alignment within Large Language Models (LLMs), addressing their often homogenized cultural perspectives. It uses scenario-based behavioral probing and implicit token probabilities to map latent cultural values, also introducing activation steering to shift these alignments without retraining.

LLMs Cultural Alignment AI ethics Value Systems