← heapsort-ai

LLM

609 items

ARTICLE↑ trendingReddit r/LocalLLaMA·4/10/2026

offline companion robot for my disabled husband (8GB RAM constraints) – looking for optimization advice

Uma pessoa está desenvolvendo um robô companheiro de IA offline para seu marido tetraplégico, buscando reduzir o isolamento. O protótipo atual usa Mistral-7B-Instruct em um ThinkPad com 8GB de RAM para conversação e faster-whisper em um Jetson Nano para reconhecimento de fala, e a autora busca conselhos de otimização.

42
ARTICLE↑ trendingReddit r/MachineLearning·4/27/2026

How do you test AI agents in production? The unpredictability is overwhelming.[D]

A QA professional highlights the overwhelming challenges of testing non-deterministic LLM-based AI agents in production, where traditional quality assurance methods fail. They struggle with the variability of outputs and reasoning chains, finding existing approaches like snapshot testing and human evaluation insufficient or unscalable.

42
ARTICLE↑ trendingReddit r/LocalLLaMA·4/10/2026

[Model Release] I trained a 9B model to be agentic Data Analyst (Qwen3.5-9B + LoRA). Base model failed 100%, this LoRA completes 89% of workflows without human intervention.

Um desenvolvedor treinou um modelo Qwen3.5-9B com LoRA para atuar como analista de dados agente, focando em autonomia através de pesos. O modelo alcançou 89% de conclusão de fluxos de trabalho de ponta a ponta sem intervenção humana, superando a falha total do modelo base.

42
RESEARCH↑ trendingReddit r/LocalLLaMA·4/14/2026

We benchmarked TranslateGemma-12b against 5 frontier LLMs on subtitle translation - it won across the board, with one significant catch

A study benchmarked TranslateGemma-12b against five frontier LLMs on subtitle translation for six language pairs, showing the task-specific model consistently outperformed general-purpose models. While initial numbers indicated a clear win, human QA added a significant catch which will be detailed in the full report.

We benchmarked TranslateGemma-12b against 5 frontier LLMs on subtitle translation - it won across the board, with one significant catch
42
RESEARCH↑ trendingReddit r/LocalLLaMA·4/14/2026

Updated Qwen3.5-9B Quantization Comparison

This content compares various GGUF quantizations of the Qwen3.5-9B model using KL Divergence (KLD) to assess faithfulness to the BF16 baseline. The goal is to provide users with a data-driven basis for selecting the most faithful quantized file, where lower KLD scores indicate less information loss.

Updated Qwen3.5-9B Quantization Comparison
42
ARTICLE↑ trendingReddit r/MachineLearning·4/30/2026

A Hackable ML Compiler Stack in 5,000 Lines of Python [P]

The author built a simplified, hackable ML compiler stack in 5,000 lines of Python that emits raw CUDA, aiming to provide an easy-to-follow reference without the complexity of existing frameworks. It lowers small models like TinyLlama and Qwen2.5-7B through six Intermediate Representations, focusing on clarity over performance.

42
RESEARCH↑ trendingReddit r/MachineLearning·4/26/2026

Speculative Decoding Implementations: EAGLE-3, Medusa-1, PARD, Draft Models, N-gram and Suffix Decoding from scratch [P]

A new educational implementation repository has been launched for speculative decoding, implementing various methods like EAGLE-3 and Medusa-1 from scratch to facilitate studying proposer design differences. It includes training and inference paths for models like Qwen/Qwen2.5-7B-Instruct and aims to clarify the distinction between proposer quality and verifier cost, and why a high acceptance rate doesn't always imply higher throughput.

42
ARTICLE↑ trendingReddit r/LocalLLaMA·4/8/2026

I tracked a major cache reuse issue down to Qwen 3.5’s chat template

Um desenvolvedor investigou persistentes falhas de cache em fluxos de trabalho de agentes de IA locais, resultando no reprocessamento desnecessário de grandes blocos de contexto. A causa foi rastreada até um problema com o template de chat do modelo Qwen 3.5, após descartar outras possibilidades como erros no motor de inferência ou bugs na implementação do cache.

42
ARTICLE↑ trendingReddit r/LocalLLaMA·4/8/2026

Qwen3.5-35B-A3B-Uncensored-FernflowerAI-GGUF

O autor encontrou e corrigiu um bug de treinamento no modelo Qwen3.5-35B-A3B, disponibilizando uma versão fixa, um prompt de sistema aprimorado, um template de chat com suporte a tool calling e configurações recomendadas para LM Studio. A correção aborda problemas de perda de contexto e repetição que ocorriam em conversas longas com a versão anterior do modelo.

42
NEWS↑ trendingReddit r/LocalLLaMA·4/26/2026

HauhauCS (of "Uncensored Aggressive" fame) published an abliteration package that plagiarizes Heretic without attribution, and violates its license

An investigation reveals that HauhauCS, a publisher of popular uncensored LLM models, plagiarized code from the Heretic project, violating its AGPL-3.0 license. Detailed evidence was found in the recovered source code, including identical module and function names.

42