Model Monitoring

2 items

RESEARCHarXiv CS.AI·18d ago

Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs

This research introduces MOOD, a benchmark designed to study the detection of out-of-distribution (OOD) alignment failures in large language models (LLMs) using monitoring pipelines. It proposes combining guard models with OOD detectors to improve the generalization of safety classifiers, which often fail in OOD scenarios.

Model Monitoring OOD Detection LLMs Benchmarking

ARTICLEWeights & Biases·11/8/2019

Tracking the heartbeat of ML models by exploring gradients

This content explores methods for monitoring the behavior of Machine Learning models by analyzing gradients. It details how to track the "heartbeat" of these models to ensure their performance and stability.

Model Monitoring machine learning Gradients AI development