Transformer Models

7 items

ARTICLE↑ trendingReddit r/LocalLLaMA·4/15/2026

[P] Built GPT-2, Llama 3, and DeepSeek from scratch in PyTorch - open source code + book

A new book and open-source code are released, detailing how to build modern LLM architectures like GPT-2, Llama 3, and DeepSeek from scratch in PyTorch. It explains the architectural changes required to transform GPT-2 into Llama 3 and implements DeepSeek's advanced features.

Open Source deep learning Transformer Models PyTorch

RESEARCHDEV.to AI·25d ago

Shared expert pool reduces parameters while maintaining performance

Conventional Mixture-of-Experts designs increase parameters linearly with depth by assigning private expert sets to each transformer layer. A new approach, UniPool, replaces this with a single, globally shared pool of experts that all routers draw from, significantly reducing the total expert parameter count while maintaining comparable predictive quality.

Parameter efficiency Deep learning architecture AI optimization Mixture of Experts

RESEARCHarXiv CS.LG·20d ago

Simply Stabilizing the Loop via Fully Looped Transformer

Looped Transformers provide a way to improve model performance by iteratively reusing blocks without increasing parameter count, but they suffer from training instability at higher loop iterations. This instability is attributed to gradient oscillation and residual explosion, leading to the proposal of the Fully Looped Transformer, which introduces a Fully Looped Architecture and Attention Injection.

neural networks AI architecture deep learning model training

RESEARCHDEV.to AI·5/2/2026

Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection inAutonomous Driving

This research introduces a Temporal-Channel Transformer designed for 3D Lidar-based video object detection. It aims to improve the perception capabilities of autonomous driving systems by processing sequential Lidar data.

object detection computer vision autonomous driving LiDAR

RESEARCHarXiv CS.CL·4/7/2026

Noise Steering for Controlled Text Generation: Improving Diversity and Reading-Level Fidelity in Arabic Educational Story Generation

O artigo investiga a técnica de "noise steering", que injeta perturbações gaussianas em modelos Transformer durante a inferência, para gerar histórias educacionais em árabe. O método melhora a diversidade narrativa para avaliações de leitura de nível inicial, mantendo a qualidade e o nível de leitura.

Noise Steering NLP Diversity text generation Transformer Models

RESEARCHarXiv CS.LG·20d ago

Robust Basis Spline Decoupling for the Compression of Transformer Models

This work introduces a B-spline-based decoupling framework for compressing Transformer models. It generalizes existing tensor-based methods, addressing their limitations in numerical instability or limited expressiveness by exploiting the properties of B-splines.

neural networks machine learning AI compression

RESEARCHarXiv CS.LG·11d ago

One Mask to Rule Them All: On Hidden Facts after Editing and How to Find Them

This paper investigates the internal mechanisms of knowledge editing methods such as ROME and MEMIT, revealing that diverse edits share a common functional structure reliant on a specific subset of weights. A binary mask over these edited weights reverses most changes by eliminating overattention in later layers, demonstrating this mechanism's necessity for successful edits.

AI models MLP Weights machine learning Transformer Models