LLM

609 items

DOC↑ trendingReddit r/LocalLLaMA·4/21/2026

ibm-granite/granite-4.1-8b · Hugging Face

Granite-4.1-8B is an 8B parameter long-context instruct model from IBM, enhanced through finetuning and alignment for advanced tool calling, instruction following, and chat capabilities. It supports multiple languages and was released in April 2026 under the Apache 2.0 license.

NLP Natural Language Processing AI Model Large Language Model

ibm-granite/granite-4.1-8b · Hugging Face

ARTICLEDEV.to AI·20h ago

Use Claude long enough and you'll end up with Karpathy's LLM Wiki without doing much.

Consistent use of Claude allows it to build up a working memory, which manifests as a pile of plain markdown files. This effectively creates a personal "LLM Wiki," where the model remembers user decisions and preferences without requiring re-explanation.

Claude knowledge management personal wiki AI memory

ARTICLE↑ trendingReddit r/LocalLLaMA·4/15/2026

[P] Built GPT-2, Llama 3, and DeepSeek from scratch in PyTorch - open source code + book

A new book and open-source code are released, detailing how to build modern LLM architectures like GPT-2, Llama 3, and DeepSeek from scratch in PyTorch. It explains the architectural changes required to transform GPT-2 into Llama 3 and implements DeepSeek's advanced features.

Open Source deep learning Transformer Models PyTorch

DOCGoogle for Developers (YouTube)·21h ago

Gemma Playground: Robot Duck

This content explores the Gemma Playground, using a 'Robot Duck' as an application example. The focus is on demonstrating the capabilities of the Gemma model in a practical scenario.

Gemma AI robotics LLM

RESEARCHarXiv CS.LG·21h ago

Enabling KV Caching of Shared Prefix for Diffusion Language Models

The paper introduces "bicache", the first KV caching technique for shared prefixes in diffusion language models (DLMs), addressing challenges where existing LLM caching methods fail due to DLMs' bidirectional attention. This new approach aims to unlock high-throughput DLM serving by leveraging observations about shared prefix KVs stability in shallow layers.

Diffusion Models KV Caching Performance optimization High-throughput serving

RESEARCHarXiv CS.AI·21h ago

Automatic Extraction of Structured Information from Brain MRI Reports Using an Open-Weight Large Language Model

This research paper explores the automatic extraction of data from brain MRI reports using the open-weight large language model LLaMA 3.1. It evaluates the LLM's performance in analyzing Dutch neuroradiology reports, demonstrating high zero-shot performance.

Data Extraction Natural Language Processing Neuroradiology Medical Imaging

RESEARCHarXiv CS.CL·21h ago

GraphLoRA: Structure-Aware Low-Rank Adaptation for Large Language Model Recommendation

GraphLoRA proposes a novel framework for Large Language Model Recommendation (LLMRec) that integrates structural information with textual semantics. It achieves this by embedding a trainable graph message-passing network within the low-rank adaptation pathway, allowing collaborative topology to explicitly guide parameter updates.

Low-Rank Adaptation Graph Neural Networks Recommendation Systems AI Research

ARTICLE↑ trendingReddit r/LocalLLaMA·4/15/2026

Video of how my LLM's decoder blocks changed while training

This content presents a video demonstrating how an LLM's decoder blocks changed during training, building upon a popular previous post. The author shares visual data to illustrate the model's evolution process.

neural networks deep learning Training decoder blocks

Video of how my LLM's decoder blocks changed while training

RESEARCH↑ trendingReddit r/MachineLearning·4/24/2026

New project about llm hallucination [P]

This content introduces a new side project and its GitHub repository, focusing on mitigating LLM hallucination through a novel contrastive sampling and selective training method. The core idea treats hallucination as a preference problem, using self-generated negative samples and divergence-based, gated learning to push correct answers and suppress wrong ones.

hallucination model training Natural Language Processing AI safety

NEWS↑ trendingReddit r/LocalLLaMA·4/15/2026

1-bit Bonsai 1.7B (290MB in size) running locally in your browser on WebGPU

This content announces the execution of the 1-bit Bonsai 1.7B model (290MB) directly in the browser, leveraging WebGPU technology. A link to the demonstration is provided via Hugging Face Spaces, highlighting an innovation in client-side ML.

Bonsai on-device AI browser AI LLM

1-bit Bonsai 1.7B (290MB in size) running locally in your browser on WebGPU

RESEARCH↑ trendingReddit r/MachineLearning·4/9/2026

[R] Forced Depth Consideration Reduces Type II Errors in LLM Self-Classification: Evidence from an Exploration Prompting Ablation Study - (200 trap prompts, 4 models, 8 Step-0 variants) [R]

Este estudo aborda erros de Tipo II na classificação de tarefas por LLMs, onde prompts aparentemente simples exigem compreensão profunda. A pesquisa demonstrou que prompts de exploração aberta ("What's really going on here?") reduzem significativamente esses erros em comparação com prompts de extração direta.

prompt-engineering Type II Error Metacognition Self-Classification

RESEARCH↑ trendingReddit r/MachineLearning·4/15/2026

Trained a Qwen2.5-0.5B-Instruct bf16 model on Reddit post summarization task with GRPO written from scratch in PyTorch - updates! [P]

The author successfully trained a Qwen2.5-0.5B-Instruct model for Reddit post summarization using GRPO, achieving an average rollout length of 64 tokens with combined quality and length rewards. The experiment, run on a Mac Mini cluster, uses an LLM-as-a-Judge (GPT-5) for evaluation and plans future iterations with adjusted reward functions.

reinforcement learning Qwen2.5 GRPO Reddit

Trained a Qwen2.5-0.5B-Instruct bf16 model on Reddit post summarization task with GRPO written from scratch in PyTorch - updates! [P]

NEWS↑ trendingReddit r/LocalLLaMA·4/18/2026

Cloudflare open-sources lossless LLM compression tool

Cloudflare released Unweight, a lossless compression system that reduces LLM size by 15-22% without sacrificing output accuracy. The tool, which saves roughly 3 GB of VRAM on Nvidia H100 GPUs for Llama-3.1-8B, has been open-sourced on GitHub with plans to extend compression.

Open Source Optimization GPU compression

NEWSThe Verge AI·4/23/2026

OpenAI says its new GPT-5.5 model is more efficient and better at coding

OpenAI has announced its new GPT-5.5 model, touting it as its "smartest and most intuitive" offering to date and a significant step forward in computer interaction. This new version excels at complex tasks like coding, research, and document creation across various tools, handling multi-part requests by planning and self-checking.

AI models OpenAI GPT coding

ARTICLE↑ trendingReddit r/MachineLearning·4/15/2026

Built GPT-2, Llama 3, and DeepSeek from scratch in PyTorch - open source code + book [p]

A senior engineer spent the past year implementing five LLM architectures from scratch in PyTorch, including GPT-2, Llama 3, and DeepSeek. The project resulted in open-source code and a detailed book documenting the process, explaining advanced concepts like KV cache, MoE, and FP8 quantization.

DeepSeek Llama 3 GPT-2 PyTorch

CASE↑ trendingReddit r/LocalLLaMA·4/17/2026

Qwen3.6 is incredible with OpenCode!

The user praises Qwen3.6 OpenCode as an "incredible" local model for complex coding tasks, highlighting its effectiveness in implementing RLS across a multi-language codebase. While not perfect, its ability to iterate on compiler errors makes it a viable alternative to models like Claude Code for daily use.

coding assistant OpenCode AI model review Qwen

RESEARCH↑ trendingReddit r/LocalLLaMA·4/9/2026

Used ray tracing cores on my RTX 5070 Ti for LLM routing — 218x speedup, runs entirely on 1 consumer GPU

Um método inovador usa os RT Cores de GPUs para roteamento de especialistas em modelos MoE, resultando em aceleração de 218x e 731x menos VRAM para essa tarefa. A pesquisa também revela que os especialistas MoE se especializam por tipo sintático, e não por tópico como se acreditava.

Otimização de Hardware IA MoE Ray Tracing Cores

CASE↑ trendingReddit r/LocalLLaMA·4/17/2026

Qwen3.6. This is it.

A user recounts their experience with the Qwen3.6 model, which successfully built and tested a tower defense game, demonstrating the ability to identify and fix its own bugs. The AI confirmed builds using screenshots, astonishing the user with its advanced capabilities.

game development code generation AI programming Qwen

ARTICLE↑ trendingReddit r/MachineLearning·4/10/2026

Started a video series on building an orchestration layer for LLM post-training [P]

O autor iniciou uma série de vídeos sobre a construção de uma camada de orquestração para o pós-treinamento de LLMs. Ele descreve seus esforços para melhorar o framework `verl` para treinamento RL em escala, focando na modernização de pacotes e remoção de dependências irrelevantes.

reinforcement learning post-training orchestration frameworks

NEWS↑ trendingReddit r/LocalLLaMA·4/12/2026

GLM 5.1 sits alongside frontier models in my social reasoning benchmark

GLM 5.1 appears highly competitive in social reasoning against frontier models, based on a custom benchmark involving autonomous Blood on the Clocktower games. It offers significant cost efficiency at $0.92 per game compared to Claude Opus 4.6's $3.69, with a 0% tool error rate.

AI benchmark Social Reasoning Blood on the Clocktower GLM 5.1