Transformer

10 items

ARTICLE↑ trendingReddit r/MachineLearning·4/23/2026

Optimizing Transformer model size & inference beyond FP16 + ONNX (pruning/graph opt didn’t help much) [P]

The user is optimizing a Transformer model for size and inference speed, having plateaued after FP16 conversion and ONNX optimization, with pruning yielding limited gains. They are seeking advice on advanced techniques like low-rank factorization, aggressive quantization (INT8/INT4), knowledge distillation, or hardware-specific optimizations to achieve further real-world improvements.

Pruning inference Transformer quantization

ARTICLEDEV.to AI·4/23/2026

Building a Bit-Accurate Fused QKV + RoPE Kernel for Qwen 2.5 in Triton

This article details the creation of a bit-accurate Triton kernel for Qwen 2.5, fusing QKV projection, RoPE, and KV cache write into a single GPU launch. It achieves a 4.5-5x speedup over multiple PyTorch operations while maintaining exact output accuracy, with the post explaining its design and benchmarking.

GPU computing Transformer AI optimization Triton

ARTICLEDEV.to AI·4/10/2026

"Attention Is All You Need" Paper tahun 2017 yang mengubah dunia kecerdasan buatan, dijelaskan tanpa perlu latar belakang teknis.

O artigo explora a importância do paper 'Attention Is All You Need' de 2017, que revolucionou a IA ao introduzir a arquitetura Transformer, base de modelos como ChatGPT. Ele detalha como essa inovação superou as limitações das redes neurais recorrentes, permitindo que computadores compreendam e gerem linguagem humana com maior eficiência.

Attention Is All You Need Transformer ChatGPT NLP

RESEARCHarXiv CS.CL·4/10/2026

Hybrid CNN-Transformer Architecture for Arabic Speech Emotion Recognition

Este artigo apresenta um sistema de Reconhecimento de Emoção da Fala (SER) em árabe, baseado em uma arquitetura híbrida CNN-Transformer. O modelo combina camadas convolucionais para extração de características espectrais e codificadores Transformer para capturar dependências temporais, alcançando 97,8% de precisão e 0,98 de F1-score macro.

CNN deep learning Transformer machine learning

ARTICLEDEV.to AI·4/25/2026

The hidden engine behind the AI Revolution: The Transformer

The true engine behind the AI revolution is not ChatGPT, but the Transformer architecture, introduced by the "Attention Is All You Need" paper. This innovation enabled massive parallel language processing, utilizing GPUs and fundamentally changing how machines understand language.

AI history deep learning Transformer NLP

ARTICLEDEV.to AI·5/7/2026

The Transformer: The Architecture Behind Modern AI

The Transformer architecture, introduced by Vaswani in 2017, marked a pivotal shift in AI from sequential processing to parallel understanding, primarily through its attention mechanism. This innovation allows models to process meaning and context simultaneously, akin to thinking directly in a language rather than translating word by word.

AI architecture Attention Mechanism Transformer machine learning

DOCDEV.to AI·27d ago

Transformer Neural Network Architecture Diagram — A Visual Guide for Engineers

This visual guide explains the Transformer neural network architecture, covering the attention mechanism and encoder-decoder structure. It demonstrates how Transformers surpassed previous RNN models by introducing parallel processing and self-attention, becoming the backbone of modern LLMs like BERT and GPT.

neural networks deep learning learning Transformer

RESEARCHDEV.to AI·4/26/2026

Transformer-Transducer: End-to-End Speech Recognition with Self-Attention

This content describes the Transformer-Transducer model, a novel architecture for end-to-end speech recognition that leverages the self-attention mechanism of Transformers. It focuses on improving the accuracy and efficiency of transcribing spoken language directly into text.

deep learning Transformer Speech Recognition

RESEARCHDEV.to AI·12d ago

Sleep Phase Cuts Transformer Costs by Consolidating Memory

A new research paper introduces a "sleep phase" for language models, consolidating context into fixed-size memory layers. This method significantly reduces quadratic inference costs and enhances performance on long-horizon tasks.

language models inference Transformer memory

RESEARCHYannic Kilcher (YouTube)·11/1/2025

[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)

This content provides an in-depth analysis of a paper titled "The Free Transformer" and related concepts concerning Variational Autoencoders. It delves into advanced technical aspects of AI model architectures.

AI models deep learning Transformer Variational Autoencoder

[Paper Analysis] The Free Transformer (and some Variational Autoencoder stuff)