← heapsort-ai

MoE models

2 items

RESEARCHarXiv CS.CL·5d ago

Expert-Aware Refusal Steering

This paper extends refusal steering to Mixture-of-Experts (MoE) Large Language Models, finding that steering performance is not hindered by the MoE architecture. It proposes expert-aware refusal steering methods that leverage expert routing patterns, demonstrating that refusal behavior can be effectively steered based on a single expert's output.

28