← heapsort-ai

Transformer Architecture

10 items

RESEARCHarXiv CS.LG·4/20/2026

Sequential KV Cache Compression via Probabilistic Language Tries: Beyond the Per-Vector Shannon Limit

This research introduces sequential KV compression, a novel two-layer architecture for transformer key-value caches that surpasses the per-vector Shannon limit. It leverages the sequential nature of KV cache tokens, using probabilistic prefix deduplication with language tries and predictive delta coding to achieve more efficient compression.

27
RESEARCHarXiv CS.LG·4/20/2026

Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation

This paper presents causal evidence that hallucination in autoregressive language models is an early trajectory commitment governed by asymmetric attractor dynamics. The research shows that factual and hallucinated trajectories diverge at the very first token, and correcting a hallucinated path requires sustained multi-step intervention, whereas corruption needs less effort.

27
RESEARCHarXiv CS.AI·12d ago

The Cognitive Categorical Transformer: Category-Theoretic Inductive Biases for Language Modeling

The Cognitive Categorical Transformer (CCT) is a 306M-parameter architecture that augments a pretrained GPT-2 Small backbone with cognitively grounded components derived from category theory and cognitive science inspirations. It achieved a 12% relative reduction in perplexity on WikiText-103 compared to a fine-tuned GPT-2 Small baseline, with 84% of the architectural improvement attributed to GT-Full simplicial message passing.

27
RESEARCHarXiv CS.AI·4/7/2026

Structural Rigidity and the 57-Token Predictive Window: A Physical Framework for Inference-Layer Governability in Large Language Models

Este artigo introduz uma nova estrutura de governança baseada em energia para LLMs, que conecta a dinâmica de inferência de transformers a modelos de satisfação de restrições, desafiando métodos atuais de segurança de IA. A pesquisa identifica uma janela de pré-comprometimento de 57 tokens em Phi-3-mini-4k-instruct, demonstrando que tais sinais existem, mas são específicos do modelo, tarefa e configuração, e propõe uma taxonomia de comportamento de inferência.

27