AI Architectures

7 items

ARTICLE↑ trendingReddit r/MachineLearning·4/26/2026

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

The author is transitioning from fine-tuning dense 3B/7B transformers to NVIDIA's Nemotron 3 Nano (a hybrid Mamba-Attention-MoE architecture) for multi-task reasoning. They are seeking guidance on how the hybrid architecture impacts standard LoRA fine-tuning, as their prior experience is limited to dense models.

LLMs multi-task reasoning AI Architectures Fine-tuning

ARTICLEDEV.to AI·4/11/2026

A Review of Sparse Expert Models in Deep Learning

This content analyzes Sparse Expert Models in Deep Learning, a fundamental architecture for the scalability and efficiency of large neural networks. The review explores their application and impact in the field of advanced artificial intelligence.

neural networks deep learning Sparse Models AI Architectures

RESEARCHarXiv CS.CL·22d ago

Neural Activation Patterns Across Language Model Architectures: A Comprehensive Analysis of Cognitive Task Performance

This paper presents a comprehensive analysis of neural activation patterns across six distinct large language model (LLM) architectures, examining their performance on twelve cognitive task categories. The findings reveal fundamental differences in how encoder and decoder architectures process diverse cognitive tasks, with mathematical reasoning consistently producing the highest attention entropy and decoder models exhibiting significantly higher sparsity.

neural networks language models cognitive science Model Analysis

RESEARCHarXiv CS.CL·4/7/2026

LPC-SM: Local Predictive Coding and Sparse Memory for Long-Context Language Modeling

Este artigo propõe LPC-SM, uma arquitetura híbrida autorregressiva para modelos de linguagem de contexto longo, que separa atenção local, memória persistente, correção preditiva e controle em tempo de execução. O modelo de 158M parâmetros é avaliado, demonstrando melhorias na perda de LM e estabilidade em sequências longas.

neural networks language models Long Context attention mechanisms

ARTICLEDEV.to AI·27d ago

Beyond Basic RAG: The Rise of Agentic Retrieval

This article explores the limitations of basic Retrieval-Augmented Generation (RAG), such as context bloat and hallucination persistence. It proposes Agentic RAG as an evolution, where LLMs autonomously orchestrate the information retrieval process, deciding when and how to search for data.

LLMs RAG AI Architectures Agentic AI

RESEARCHDEV.to AI·4/27/2026

An Attention Free Transformer

This content introduces the concept of an Attention Free Transformer, a novel architectural design aiming to achieve the capabilities of traditional Transformers without relying on the self-attention mechanism. It likely explores alternative mechanisms for contextual information processing in sequence-to-sequence tasks.

neural networks deep learning AI Architectures Transformers

RESEARCHarXiv CS.AI·4/30/2026

Grounding vs. Compositionality: On the Non-Complementarity of Reasoning in Neuro-Symbolic Systems

This work challenges the assumption that compositional reasoning emerges as a byproduct of symbol grounding in neuro-symbolic AI. It introduces the $i$LTN architecture, demonstrating that models trained solely on a grounding objective fail to generalize, while joint training on perceptual grounding and multi-step reasoning is crucial.

Compositional Generalization Reasoning AI Architectures Symbol Grounding