Transformer Architecture

10 items

RESEARCHarXiv CS.AI·4/16/2026

Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models

This paper rigorously analyzes how numerical instability from finite precision leads to unpredictability in LLMs, a critical reliability issue in agentic workflows. It details rounding error propagation, identifying a chaotic "avalanche effect" in early layers and universal, scale-dependent chaotic behaviors.

Transformer Architecture LLMs chaos theory AI reliability

ARTICLEDEV.to AI·18d ago

Understanding Transformer Architecture in 2026 (SilentRecon Deep Dive)

The "SilentRecon Deep Dive" article explores Transformer architecture, explaining how it surpassed RNNs and LSTMs by enabling parallel processing and attention. This resulted in scalability, faster training, deeper contextual understanding, and real-time inference, making them the default intelligence layer for cybersecurity and automation.

Transformer Architecture cybersecurity deep learning learning

RESEARCHarXiv CS.LG·4/20/2026

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit

This research introduces sequential KV compression, a novel two-layer architecture for transformer key-value caches that surpasses the per-vector Shannon limit. It leverages the sequential nature of KV cache tokens, using probabilistic prefix deduplication with language tries and predictive delta coding to achieve more efficient compression.

Transformer Architecture AI models LLMs data compression

RESEARCHarXiv CS.LG·4/20/2026

Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation

This paper presents causal evidence that hallucination in autoregressive language models is an early trajectory commitment governed by asymmetric attractor dynamics. The research shows that factual and hallucinated trajectories diverge at the very first token, and correcting a hallucinated path requires sustained multi-step intervention, whereas corruption needs less effort.

Transformer Architecture LLMs hallucination model dynamics

ARTICLEDEV.to AI·4/24/2026

Layer Normalization — Deep Dive + Problem: Largest Connected Region

This content provides a deep dive into Layer Normalization, a crucial component of the Transformer Architecture. It details its importance for stabilizing training and improving the performance of Large Language Models (LLMs), originating from the "Attention is All You Need" paper.

Transformer Architecture LLMs deep learning NLP

ARTICLEDEV.to AI·22d ago

How Gemma 4's Per-Layer Embeddings Actually Work — And Why E2B Punches Above 2B

This article explains Per-Layer Embeddings (PLE), a mechanism in Gemma 4 E2B that enables it to outperform larger models despite its 2B parameter count. It delves into the exact mechanism, comparing E2B's benchmarks and discussing PLE's impact on LLM understanding, quantization, and deployment.

Transformer Architecture Gemma 4 E2B Per-Layer Embeddings

ARTICLEDEV.to AI·4/8/2026

Gemma 4: Byte for byte, the most capable open models

O modelo Gemma 4, anunciado pela DeepMind, representa um marco significativo em LLMs de código aberto. Ele emprega uma arquitetura baseada em transformer com 7 bilhões de parâmetros e um design eficiente que utiliza mecanismos de autoatenção hierárquicos para otimizar sua capacidade.

Transformer Architecture LLMs DeepMind Gemma 4

RESEARCHarXiv CS.AI·12d ago

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments a pretrained GPT-2 Small backbone with cognitively grounded components derived from category theory and cognitive science inspirations. It achieved a 12% relative reduction in perplexity on WikiText-103 compared to a fine-tuned GPT-2 Small baseline, with 84% of the architectural improvement attributed to GT-Full simplicial message passing.

Transformer Architecture cognitive science GPT-2 Category Theory

RESEARCHarXiv CS.AI·4/7/2026

Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models

Este artigo introduz uma nova estrutura de governança baseada em energia para LLMs, que conecta a dinâmica de inferência de transformers a modelos de satisfação de restrições, desafiando métodos atuais de segurança de IA. A pesquisa identifica uma janela de pré-comprometimento de 57 tokens em Phi-3-mini-4k-instruct, demonstrando que tais sinais existem, mas são específicos do modelo, tarefa e configuração, e propõe uma taxonomia de comportamento de inferência.

Transformer Architecture Inference Dynamics energy-based models Pre-commitment Signals

ARTICLEDEV.to AI·4/15/2026

Gemma 4: Byte for byte, the most capable open models

Gemma 4 is a highly capable and parameter-efficient open language model, achieving state-of-the-art performance. It leverages a transformer architecture with innovations like sparse attention and FFN optimizations to reduce computational costs and boost inference speeds.

Parameter efficiency Transformer Architecture Gemma 4 sparse attention