← heapsort-ai

Transformer Models

7 items

RESEARCHDEV.to AI·25d ago

Shared expert pool reduces parameters while maintaining performance

Conventional Mixture-of-Experts designs increase parameters linearly with depth by assigning private expert sets to each transformer layer. A new approach, UniPool, replaces this with a single, globally shared pool of experts that all routers draw from, significantly reducing the total expert parameter count while maintaining comparable predictive quality.

29
RESEARCHarXiv CS.LG·20d ago

Simply Stabilizing the Loop via Fully Looped Transformer

Looped Transformers provide a way to improve model performance by iteratively reusing blocks without increasing parameter count, but they suffer from training instability at higher loop iterations. This instability is attributed to gradient oscillation and residual explosion, leading to the proposal of the Fully Looped Transformer, which introduces a Fully Looped Architecture and Attention Injection.

29
RESEARCHarXiv CS.CL·4/7/2026

Noise Steering for Controlled Text Generation: Improving Diversity and Reading-Level Fidelity in Arabic Educational Story Generation

O artigo investiga a técnica de "noise steering", que injeta perturbações gaussianas em modelos Transformer durante a inferência, para gerar histórias educacionais em árabe. O método melhora a diversidade narrativa para avaliações de leitura de nível inicial, mantendo a qualidade e o nível de leitura.

27
RESEARCHarXiv CS.LG·11d ago

One Mask to Rule Them All: On Hidden Facts after Editing and How to Find Them

This paper investigates the internal mechanisms of knowledge editing methods such as ROME and MEMIT, revealing that diverse edits share a common functional structure reliant on a specific subset of weights. A binary mask over these edited weights reverses most changes by eliminating overattention in later layers, demonstrating this mechanism's necessity for successful edits.

27