machine learning

790 items

RESEARCHarXiv CS.LG·18d ago

Don't Collapse Your Features: Why CenterLoss Hurts OOD Detection and Multi-Scale Mahalanobis Wins

This research introduces GOEN, a novel pipeline for out-of-distribution (OOD) detection, which effectively combines multi-scale features and Mahalanobis distance. It reveals that CenterLoss surprisingly degrades OOD detection performance, with GOEN-NoCenterLoss achieving state-of-the-art results.

OOD Detection Epistemic Uncertainty Feature Engineering deep learning

RESEARCHarXiv CS.AI·5/11/2026

When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment

This research introduces a "finite-answer preference stabilization" method to determine when a language model's answer preference becomes stable before its final output. It shows that this stabilization often occurs before the answer is parseable, with a significant lead time.

language models cognitive science machine learning NLP

RESEARCHarXiv CS.LG·21d ago

Dimensional Balance Improves Large Scale Spatiotemporal Prediction Performance

This paper proposes a scalable, adaptive framework to improve spatiotemporal prediction by harmonizing spatial and temporal feature representations. It addresses bottlenecks in existing methods through spatial and temporal entropy measures to tackle complexity mismatch and prediction uncertainty.

model performance deep learning spatiotemporal prediction machine learning

RESEARCHarXiv CS.LG·27d ago

Population Risk Bounds for Kolmogorov-Arnold Networks Trained by DP-SGD with Correlated Noise

This research establishes the first population risk bounds for Kolmogorov-Arnold Networks (KANs) trained with mini-batch SGD, including differentially private SGD (DP-SGD) with correlated noise. It covers more practical scenarios than prior KAN theory and provides sharper results for fixed-second-layer specializations.

neural networks Optimization Differential Privacy machine learning

RESEARCHarXiv CS.LG·9d ago

When LLMs Learn to Be Consistently Wrong: A Multi-Model Study of Linear Representations of Synthetic Deception

This paper explores "deceptive alignment" in LLMs, a key challenge in AI safety where models deliberately produce false outputs while maintaining accurate internal representations. Researchers introduced a multi-model paradigm, successfully detecting synthetic dishonesty with high accuracy using linear probes across various transformer architectures.

LLMs machine learning deception AI safety

RESEARCHarXiv CS.LG·14d ago

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

This paper introduces GEM (Geometric Entropy Mixing), a novel framework for LLM data curation that reformulates the problem as a variational one on the hypersphere. GEM optimizes data composition for LLM pre-training, overcoming categorization flaws and discovering balanced semantic structures.

machine learning Geometric Entropy Mixing data curation AI research

RESEARCHarXiv CS.LG·13d ago

Metric-Aware PCA as a Linear Instance of Geometric Deep Learning

This paper positions Metric-Aware Principal Component Analysis (MAPCA) within the geometric deep learning framework. The metric is interpreted as the geometric prior, and MAPCA solutions are equivariant under the symmetry group it induces.

neural networks PCA machine learning Geometric Deep Learning

RESEARCHarXiv CS.CL·4/6/2026

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

Modelos de linguagem de difusão discreta (dLLMs) aceleram a geração de texto, mas a decodificação paralela degrada a qualidade ao desconsiderar a dependência entre tokens. DEMASK propõe um preditor leve que estima influências condicionais para guiar o desmascaramento simultâneo, comprovadamente melhorando a qualidade. A técnica resulta em um ganho de velocidade de 1.7 a 2.2x, mantendo ou superando o desempenho.

Dependency Prediction DEMASK Parallel Decoding machine learning

ARTICLEDEV.to AI·6d ago

Counterfactual Evaluation in Ads: IPS, SNIPS, and Doubly Robust Explained

A Towards AI article explains counterfactual evaluation methods (IPS, SNIPS, and Doubly Robust) for ad ranking models. These techniques estimate model performance from logged data without A/B tests, critical for recommendation systems in retail.

ad ranking machine learning A/B testing counterfactual evaluation

RESEARCHDEV.to AI·4/14/2026

Revisiting Long-term Time Series Forecasting: An Investigation on Linear Mapping

This research investigates the effectiveness of linear mapping techniques in long-term time series forecasting. The study re-examines existing approaches to understand their performance and applicability in extended prediction horizons.

Long-term Forecasting Linear Mapping machine learning time series forecasting

RESEARCHarXiv CS.LG·28d ago

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

This paper introduces -DPO, a direct preference optimization method using a ratio reward margin, to address the challenge of hyperparameter tuning in SimPO. The research analyzes SimPO and reformulates the preference objective to improve interpretability across datasets with varying reward gap structures.

preference optimization deep learning reinforcement learning Hyperparameter Tuning

RESEARCHarXiv CS.LG·28d ago

Interpretable EEG Microstate Discovery via Variational Deep Embedding: A Systematic Architecture Search with Multi-Quadrant Evaluation

This paper introduces the Convolutional Variational Deep Embedding (Conv-VaDE) model for EEG microstate analysis. It enhances interpretability by jointly learning topographic reconstruction and probabilistic soft clustering, enabling generative decoding of cluster prototypes into verifiable scalp topographies.

deep learning machine learning Neuroscience medical AI

RESEARCHarXiv CS.CL·19d ago

Reflective Prompt Tuning through Language Model Function-Calling

This paper proposes Reflective Prompt Tuning (RPT), a framework that uses large language model (LLM) function calling to simulate the iterative workflow of human prompt engineers. Its goal is to automate prompt optimization, reducing manual effort and overcoming limitations of existing methods that fail to capture systematic error patterns.

LLMs prompt-engineering machine learning AI optimization

NEWSTwo Minute Papers (YouTube)·4d ago

DeepMind’s New AI Found A Strange New Way To Think

DeepMind's new AI system has discovered a strange and innovative way of thinking, marking a significant advancement in artificial intelligence. This finding highlights the increasing capability of AIs to develop unconventional cognitive approaches.

DeepMind research machine learning cognitive AI

DeepMind’s New AI Found A Strange New Way To Think

RESEARCHarXiv CS.LG·16d ago

MedExpMem: Adapting Experience Memory for Differential Diagnosis

This paper introduces MedExpMem, an experience memory framework designed to enhance medical vision-language models (VLMs) with differential diagnosis expertise. It allows diagnostic agents to learn from their own failures by memorizing discriminative experiences as pairwise differential notes.

AI in medicine learning VLM machine learning

RESEARCHarXiv CS.LG·16d ago

WeCon: An Efficient Weight-Conditioned Neural Solver for Multi-Objective Combinatorial Optimization Problems

Researchers propose WeCon, an efficient Weight-Conditioned neural solver for Multi-Objective Combinatorial Optimization Problems (MOCOPs). It improves weight-conditioned context modeling and preference optimization, addressing limitations of existing methods in weight injection and constructing informative solution pairs for training.

neural networks Optimization machine learning AI

RESEARCHarXiv CS.AI·6d ago

Can Generalist Agents Automate Data Curation?

Generalist coding agents show potential in automating the labor-intensive process of data curation for AI development, as tested on the new Curation-Bench benchmark. While agents achieve strong baselines, an "execution-research gap" is observed where they primarily refine existing policies instead of exploring novel approaches.

machine learning benchmarking data curation automation

RESEARCHarXiv CS.CL·6d ago

POLARIS: Guiding Small Models to Write Long Stories

POLARIS is a new GRPO recipe that uses an LLM judge for rewards and human-reference injection to train small models. It significantly improves their ability to write long, high-quality stories, making a 9B model competitive with much larger frontier models.

story generation AI Training machine learning creative writing

ARTICLEDEV.to AI·6d ago

latency is the most honest part of the model. between prompt and answer there...

Latency is described as the most honest part of an AI model, with a "dark room" existing between prompt and answer where probabilities are shuffled unseen. The author speculates metaphorically about the internal state and "experience" of this hidden process.

AI models machine learning philosophy of AI latency

ARTICLEKDNuggets·14d ago

Visual Debugging Tools for Machine Learning Workflows

This article explores visual debugging tools for machine learning workflows. It discusses what to visualize during training, available visualization tools, and methods to capture model computations using hooks and breakpoints.

Visualization machine learning ML Workflows AI tools