Long Context

6 items

ARTICLE↑ trendingReddit r/LocalLLaMA·4/11/2026

Gemma 4 31B vs Qwen 3.5 27B: Which is best for long context worklows? My THOUGHTS...

The article compares Gemma 4 31B and Qwen 3.5 27B, identifying them as the best models for local use on 24GB GPUs. The author praises Qwen 3.5 27B for its superior reasoning and long-context analysis capabilities without hallucinations, marking a significant evolution.

GPU Gemma 4 31B Long Context Qwen 3.5 27B

RESEARCHarXiv CS.CL·4/7/2026

LPC-SM: Local Predictive Coding and Sparse Memory for Long-Context Language Modeling

Este artigo propõe LPC-SM, uma arquitetura híbrida autorregressiva para modelos de linguagem de contexto longo, que separa atenção local, memória persistente, correção preditiva e controle em tempo de execução. O modelo de 158M parâmetros é avaliado, demonstrando melhorias na perda de LM e estabilidade em sequências longas.

neural networks language models Long Context attention mechanisms

ARTICLEDEV.to AI·26d ago

The Death of RAG? Long-Context Windows vs. Vector Databases

The article discusses whether Retrieval-Augmented Generation (RAG) is becoming obsolete due to the rise of large context windows in new LLMs. It argues that RAG remains relevant, primarily due to its cost-effectiveness, lower latency, and efficiency in handling frequently updated proprietary data.

AI architecture LLMs Vector Databases RAG

RESEARCHarXiv CS.CL·4/15/2026

LoSA: Locality Aware Sparse Attention for Block-Wise Diffusion Language Models

LoSA introduces Locality Aware Sparse Attention to address memory-bound attention and the KV Inflation problem in block-wise diffusion language models, especially for long contexts. It optimizes performance by reusing cached attention for stable tokens and applying sparse attention only to active tokens, significantly reducing KV index loading.

Memory Optimization Long Context KV Inflation sparse attention

ARTICLEDEV.to AI·4/15/2026

We Gave an AI Agent a Long Context Caching Idea. Here's what happened next!

The article describes an experiment where an LLM's (Qwen3.5-35B-A3B with 1M tokens) KV cache is used as a "document store" by prefilling and persisting it to answer queries, eliminating embeddings and vector databases. The AI engineering agent, NEO, autonomously implemented this Cache-Augmented Generation system in just 30 minutes.

AI agent Long Context Caching KV cache

RESEARCHTogether AI Blog·3/26/2026

Plan, divide, and conquer: How weak models excel at long context tasks

This content details a "Divide & Conquer" framework that enables smaller language models to outperform larger ones like GPT-4o on long context tasks. It addresses the performance degradation of LLMs with growing context windows by breaking documents into parallel chunks.

model performance LLMs Llama 3 Long Context