← heapsort-ai

AI Architectures

7 items

ARTICLE↑ trendingReddit r/MachineLearning·4/26/2026

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

The author is transitioning from fine-tuning dense 3B/7B transformers to NVIDIA's Nemotron 3 Nano (a hybrid Mamba-Attention-MoE architecture) for multi-task reasoning. They are seeking guidance on how the hybrid architecture impacts standard LoRA fine-tuning, as their prior experience is limited to dense models.

42
RESEARCHarXiv CS.CL·22d ago

Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance

This paper presents a comprehensive analysis of neural activation patterns across six distinct large language model (LLM) architectures, examining their performance on twelve cognitive task categories. The findings reveal fundamental differences in how encoder and decoder architectures process diverse cognitive tasks, with mathematical reasoning consistently producing the highest attention entropy and decoder models exhibiting significantly higher sparsity.

29
RESEARCHarXiv CS.CL·4/7/2026

LPC-SM: Local Predictive Coding and Sparse Memory for Long-Context Language Modeling

Este artigo propõe LPC-SM, uma arquitetura híbrida autorregressiva para modelos de linguagem de contexto longo, que separa atenção local, memória persistente, correção preditiva e controle em tempo de execução. O modelo de 158M parâmetros é avaliado, demonstrando melhorias na perda de LM e estabilidade em sequências longas.

28
RESEARCHDEV.to AI·4/27/2026

An Attention Free Transformer

This content introduces the concept of an Attention Free Transformer, a novel architectural design aiming to achieve the capabilities of traditional Transformers without relying on the self-attention mechanism. It likely explores alternative mechanisms for contextual information processing in sequence-to-sequence tasks.

27
RESEARCHarXiv CS.AI·4/30/2026

Grounding vs. Compositionality: On the Non-Complementarity of Reasoning in Neuro-Symbolic Systems

This work challenges the assumption that compositional reasoning emerges as a byproduct of symbol grounding in neuro-symbolic AI. It introduces the $i$LTN architecture, demonstrating that models trained solely on a grounding objective fail to generalize, while joint training on perceptual grounding and multi-step reasoning is crucial.

27