Neural Network Compression

2 items

RESEARCHarXiv CS.LG·4/8/2026

Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression

Este artigo propõe um pipeline ordenado (poda, quantização INT8 e destilação de conhecimento) para otimizar a compressão de redes neurais, visando a latência de inferência medida em vez de métricas indiretas. A pesquisa revela que a quantização INT8 oferece o principal benefício de tempo de execução, enquanto a poda atua como um pré-condicionador e a destilação de conhecimento recupera a precisão.

Pruning Knowledge Distillation model efficiency Neural Network Compression

RESEARCHarXiv CS.LG·7d ago

Automatically Differentiable Nonlinear Tensor Networks (ADNTNs) for Exponential Compression of Deep Neural Networks

This paper introduces Automatically Differentiable Nonlinear Tensor Networks (ADNTNs), a family of structured weight generators for exponential compression of Deep Neural Networks. It extends low-rank adaptation and tensor factorization by building large weight tensors through a hierarchy of small cores and nonlinear activations.

deep learning Automatic Differentiation machine learning Neural Network Compression