Machine Learning Theory

4 items

RESEARCHarXiv CS.LG·5d ago

Pseudospectral Bounds for Transient Amplification in Coupled Gradient Descent

The paper develops a sharp pseudospectral theory for coupled gradient descent systems with block-triangular Jacobians, relevant to bilevel optimization and adversarial training. It provides bounds for transient amplification and characterizes critical coupling thresholds for spectral instability.

Gradient Descent Optimization Numerical Analysis Machine Learning Theory

RESEARCHarXiv CS.LG·5/8/2026

Are Flat Minima an Illusion?

This paper challenges the conventional view that flat minima inherently lead to better generalization, showing that function-preserving reparameterization can drastically alter a minimum's perceived sharpness. It introduces "weakness"—a reparameterization-invariant measure based on what the network does—as the actual driver of generalization, proving its minimax optimality and correlation with PAC-Bayes bounds.

neural networks Optimization Generalization Machine Learning Theory

RESEARCHarXiv CS.AI·28d ago

On Distinguishing Capability Elicitation from Capability Creation in Post-Training: A Free-Energy Perspective

This research proposes distinguishing between capability elicitation and capability creation in large language model post-training. It argues that elicitation reweights existing behaviors within a model's accessible support, while creation changes that support itself, developing this through a free-energy view.

LLMs AI capabilities Machine Learning Theory learning

ARTICLEDEV.to AI·4/15/2026

Notes on Kullback-Leibler Divergence and Likelihood

This content explores notes on Kullback-Leibler Divergence and its relationship with the concept of Likelihood. It covers fundamental principles of information theory and statistical inference relevant to AI.

information theory Likelihood Machine Learning Theory Kullback-Leibler Divergence