Computational Efficiency

10 items

ARTICLE↑ trendingReddit r/MachineLearning·4/22/2026

I built a new category of AI called a Reductive Inference Model (RIM) that answers by elimination instead of generation — AMA [P]

POEM (Process Of Elimination Master) is a novel AI architecture that answers questions by progressively eliminating impossibilities rather than generating possibilities, operating independently of LLMs. It achieves 88% accuracy, is 95.5x faster, and 100x smaller than TinyLlama 1.1B, demonstrating significant computational efficiency.

AI architecture inference Computational Efficiency sustainable AI

RESEARCHarXiv CS.CL·4/22/2026

Two-dimensional early exit optimisation of LLM inference

This paper introduces a two-dimensional early exit strategy for LLM classification tasks, coordinating layer-wise and sentence-wise exiting. The method achieves multiplicative computational savings and speed-ups of 1.4-2.3x over optimal layer-wise early exit for simpler tasks, applicable across various state-of-the-art LLMs.

LLMs Computational Efficiency Inference Optimization

RESEARCHarXiv CS.LG·4/6/2026

Not All Denoising Steps Are Equal: Model Scheduling for Faster Masked Diffusion Language Models

Este trabalho explora o agendamento de modelos para acelerar os Modelos de Linguagem de Difusão Mascarada (MDLMs), substituindo o modelo completo por um menor em certas etapas de denoising. A pesquisa mostra que as etapas iniciais e finais são mais robustas a essa substituição, permitindo uma redução de até 17% nos FLOPs com degradação mínima na perplexidade generativa.

Diffusion Models language models Computational Efficiency denoising

RESEARCHarXiv CS.CL·4/13/2026

WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models

WAND introduces a framework to adapt pretrained autoregressive text-to-speech (AR-TTS) models for constant computational and memory complexity. It achieves this by separating attention into global and local sliding-window mechanisms, employing curriculum learning, and utilizing knowledge distillation to maintain high-fidelity speech synthesis with significant KV cache memory reduction.

Knowledge Distillation Autoregressive Text-to-Speech Attention Mechanism Computational Efficiency

RESEARCHarXiv CS.LG·4/14/2026

Efficient Matrix Implementation for Rotary Position Embedding

This research proposes RoME, a novel and computationally efficient reformulation of Rotary Position Embedding (RoPE), a core component in modern Transformer architectures. By replacing vector-level operations with unified matrix transformations, RoME significantly reduces computational overhead and improves hardware utilization.

Matrix operations Rotary Position Embedding NPU optimization Computational Efficiency

RESEARCHarXiv CS.LG·5/5/2026

From Euler to Dormand-Prince: ODE Solvers for Flow Matching Generative Models

This research paper systematically benchmarks four classical ODE solvers (Euler, Explicit Midpoint, RK4, Dormand-Prince 5(4)) for Flow Matching generative models, implementing them from scratch in PyTorch. It quantitatively compares their efficiency on tasks from 2D distributions to MNIST, showing RK4 at 80 function evaluations achieves sample quality comparable to Euler at 200, and observes Jacobian eigenvalue spectrum stiffening near t=1.

neural networks machine learning Computational Efficiency ODE Solvers

RESEARCHarXiv CS.LG·4/27/2026

LTBs-KAN: Linear-Time B-splines Kolmogorov-Arnold Networks

LTBs-KAN is a novel neural network architecture designed to overcome the computational slowness of traditional KANs by offering linear complexity and reduced parameters. Experiments demonstrate significant improvements in computational efficiency and parameter reduction on common datasets like MNIST, Fashion-MNIST, and CIFAR-10.

neural networks B-splines deep learning Computational Efficiency

RESEARCHarXiv CS.LG·29d ago

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

The Toeplitz MLP Mixer (TMM) is a new transformer-like architecture that replaces attention with triangular-masked Toeplitz matrix multiplication, significantly reducing computational complexity to O(dn log n) time and O(dn) space. TMMs demonstrate superior training efficiency and better input information retention compared to traditional transformers, despite their simpler design.

neural networks AI architecture Computational Efficiency sequence models

RESEARCHarXiv CS.AI·21d ago

TTE-Flash: Accelerating Reasoning-based Multimodal Representations via Think-Then-Embed Tokens

This work proposes TTE-Flash, a method to accelerate reasoning-based multimodal representations by replacing explicit Chain-of-Thought (CoT) with latent think tokens. It aims to achieve high-performance, reasoning-aware representations at a constant inference cost.

neural networks multimodal AI machine learning Computational Efficiency

RESEARCHarXiv CS.AI·21d ago

PRISMat: Policy-Driven, Permutation-Invariant Autoregressive Material Generation

This paper introduces PRISMat, a cost-effective, permutation-invariant model designed for the rapid identification of candidate materials. It addresses the inefficiencies of large language models in material generation by offering a faster and cheaper alternative for material filtering.

Materials Science AI models machine learning Computational Efficiency