← heapsort-ai

alignment

4 items

ARTICLEDEV.to AI·4/8/2026

Announcing the OpenAI Safety Fellowship

O OpenAI Safety Fellowship é um programa de pesquisa focado na segurança da IA, abordando aspectos críticos como robustez, interpretabilidade e alinhamento de valores humanos. O texto detalha seus objetivos e componentes técnicos, como treinamento adversarial e técnicas de explicabilidade.

28
RESEARCHarXiv CS.LG·4/21/2026

SaFeR-Steer: Evolving Multi-Turn MLLMs via Synthetic Bootstrapping and Feedback Dynamics

SaFeR-Steer is a novel framework designed to improve the safety alignment of Multi-modal Large Language Models (MLLMs) in multi-turn dialogues, addressing challenges like escalating unsafe intent and long-context safety decay. It employs synthetic bootstrapping and feedback dynamics, while also releasing the STEER dataset for training and evaluation.

27