RESEARCHarXiv CS.CL·14d ago
Cultural Value Alignment Via Latent Activation Steering in Large Language Models
This paper proposes a novel framework for evaluating and intervening in cultural value alignment within Large Language Models (LLMs), addressing their often homogenized cultural perspectives. It uses scenario-based behavioral probing and implicit token probabilities to map latent cultural values, also introducing activation steering to shift these alignments without retraining.
27