model efficiency

9 items

RESEARCH↑ trendingReddit r/LocalLLaMA·4/21/2026

PrismML — Introducing Ternary Bonsai: Top Intelligence at 1.58 Bits

This content introduces PrismML and a new AI concept called Ternary Bonsai, claiming to achieve top intelligence with remarkable efficiency at 1.58 bits. It likely discusses advancements in AI model compression or optimized performance.

AI models model efficiency machine learning quantization

PrismML — Introducing Ternary Bonsai: Top Intelligence at 1.58 Bits

RESEARCHarXiv CS.LG·4/8/2026

Prune-Quantize-Distill: An Ordered Pipeline for Efficient Neural Network Compression

Este artigo propõe um pipeline ordenado (poda, quantização INT8 e destilação de conhecimento) para otimizar a compressão de redes neurais, visando a latência de inferência medida em vez de métricas indiretas. A pesquisa revela que a quantização INT8 oferece o principal benefício de tempo de execução, enquanto a poda atua como um pré-condicionador e a destilação de conhecimento recupera a precisão.

Pruning Knowledge Distillation model efficiency Neural Network Compression

NEWSHugging Face Blog·21d ago

OlmoEarth v1.1: A more efficient family of models

OlmoEarth v1.1 is a new version of a model family, emphasizing increased efficiency. This update aims to optimize performance and resource utilization.

updates Geospatial AI AI models model efficiency

RESEARCHarXiv CS.LG·4/28/2026

AutoCompress: Critical Layer Isolation for Efficient Transformer Compression

AutoCompress is a transformer compression method based on the empirical finding that Layer 0 carries disproportionately high task-critical information. Its Critical Layer Isolation (CLI) architecture achieves 2.47x compression on GPT-2 Medium with 59.5% parameter reduction, significantly outperforming a uniform bottleneck baseline.

AI architecture model efficiency deep learning GPT-2

RESEARCHarXiv CS.LG·4/6/2026

LiME: Lightweight Mixture of Experts for Efficient Multimodal Multi-task Learning

O LiME (Lightweight Mixture of Experts) propõe uma nova abordagem para MoE-PEFT, utilizando modulação leve de um único módulo PEFT compartilhado em vez de adaptadores separados por especialista. Isso reduz significativamente os parâmetros, introduz roteamento de parâmetros zero e generaliza para qualquer método PEFT, superando as limitações de escalabilidade e aplicabilidade.

multi-task learning model efficiency Deep Learning Architectures Mixture of Experts

RESEARCHarXiv CS.AI·5/1/2026

Step-level Optimization for Efficient Computer-use Agents

This research highlights the inefficiency of current computer-use agents, which overuse large multimodal models for every GUI interaction. It argues that tasks are heterogeneous, with routine steps needing less compute, while errors concentrate at high-risk moments like stalls or semantic drift, requiring targeted optimization.

multimodal models model efficiency GUI automation AI agents

RESEARCHarXiv CS.LG·4/24/2026

Forget, Then Recall: Learnable Compression and Selective Unfolding via Gist Sparse Attention

This paper introduces Gist Sparse Attention (GSA), an end-to-end learnable method to scale large language models to long contexts without architectural modifications. GSA compresses context into 'gist tokens' for summary, then selectively restores relevant raw chunks for detailed attention, combining compact global representations with targeted fine-grained access.

neural networks model efficiency attention mechanisms large language models

RESEARCHarXiv CS.AI·24d ago

Enhanced and Efficient Reasoning in Large Learning Models

This paper proposes an efficient and principled method to enhance reasoning in Large Language Models, addressing the current lack of trustworthiness in produced content. It involves a preprocessing stage using a Unary Relational Integracode followed by a streamlined machine learning process.

model efficiency machine learning Reasoning data preprocessing

RESEARCHarXiv CS.LG·5/7/2026

MP-ISMoE: Mixed-Precision Interactive Side Mixture-of-Experts for Efficient Transfer Learning

This research introduces MP-ISMoE, a Mixed-Precision Interactive Side Mixture-of-Experts framework, to enhance parameter-efficient transfer learning by mitigating memory overhead. It employs a Gaussian Noise Perturbed Iterative Quantization (GNP-IQ) scheme for lower-bit weight quantization, freeing up memory to improve side network learning capacity and performance.

model efficiency learning Transfer Learning quantization