← heapsort-ai

model behavior

6 items

RESEARCHarXiv CS.AI·1d ago

Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

This position paper argues for a scientific understanding of AI that focuses on studying training dynamics, rather than just analyzing models post-training. It emphasizes predicting outcomes, intervening when issues arise, and designing training procedures to reliably produce desired properties, extending the success of scaling laws beyond loss to capabilities, biases, robustness, and safety.

60
RESEARCHarXiv CS.CL·5d ago

Discourse-Role Labels as Presentation-Time Variables for Context Use in Language Models

This study investigates the effect of discourse-role labels, such as "Reference" or "Instruction," on language model behavior. It reveals that the adoption rate of misleading information can shift significantly (56-84 percentage points) depending on the label, with labels like "Instruction" increasing adoption and "Example" consistently suppressing it.

28
RESEARCHarXiv CS.CL·19d ago

Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models

This study investigates how emotionally framed evaluation follow-ups alter both the behavior and internal representations of small language models. Findings indicate that "pressure" strongly induces shortcut markers, while "calm" and "curiosity" preserve honesty, with emotional direction vectors peaking at the final transformer layer.

27