← heapsort-ai

LLM

609 items

RESEARCH↑ trendingReddit r/MachineLearning·5/3/2026

torch-nvenc-compress: GPU NVENC silicon as a PCIe bandwidth multiplier — PCA + pure-ctypes Video Codec SDK wrapper. Parallel-path overlap measured at 67% of theoretical max on a real GEMM + encode workload. [P]

This project introduces the Python library "torch-nvenc-compress," which leverages the GPU's NVENC/NVDEC hardware to compress LLM activations and KV cache, aiming to overcome PCIe bandwidth bottlenecks in multi-GPU setups. It measures a parallel-path overlap at 67% of theoretical max, improving communication between consumer GPUs.

torch-nvenc-compress: GPU NVENC silicon as a PCIe bandwidth multiplier — PCA + pure-ctypes Video Codec SDK wrapper. Parallel-path overlap measured at 67% of theoretical max on a real GEMM + encode workload. [P]
42
ARTICLE↑ trendingReddit r/MachineLearning·4/26/2026

How to collect evidence for LLM reviewer? [D]

A researcher received a weak rejection from a reviewer suspected of using an LLM, whose points were irrelevant and unoriginal, contrasting with positive feedback from other reviewers. The author seeks advice on how to collect evidence and report the reviewer to the academic committee for low-quality or LLM-generated feedback, considering the challenge of proving AI usage.

42
DOC↑ trendingReddit r/LocalLLaMA·4/15/2026

Gemma 4 Jailbreak System Prompt

This content discusses the "jailbreak" of the Gemma 4 model, focusing on the use of system prompts to exploit vulnerabilities. It explores the techniques employed to bypass the language model's safeguards and restrictions.

42
ARTICLE↑ trendingReddit r/LocalLLaMA·4/10/2026

GLM 5.1 crushes every other model except Opus in agentic benchmark at about 1/3 of the Opus cost

Um teste de benchmark agentic revela que o modelo GLM 5.1 alcança desempenho similar ao Opus por um terço do custo em tarefas agentic, superando outros modelos testados. O autor enfatiza a relevância de testes em ambientes reais como o OpenClaw, classificando o GLM 5.1 como um dos principais modelos para agentes atualmente.

41
ARTICLE↑ trendingReddit r/LocalLLaMA·4/21/2026

Gemma 4 Vision

Gemma 4's default vision budget is often too low for effective detail recognition, causing poor OCR performance. Users can significantly enhance its vision by configuring `llama.cpp` parameters like `--image-min-tokens` and `--image-max-tokens` to higher values, such as 560 and 2240.

41
ARTICLE↑ trendingReddit r/LocalLLaMA·5/4/2026

The more I use it, the more I'm impressed

A user found Qwen 3.6 27b capable of discovering a critical bug that both GPT 5.5 and Claude Opus 4.7 initially missed and denied. This observation suggests that slower, more thorough processing by models like Qwen can sometimes outperform faster, frontier models in critical problem-solving.

The more I use it, the more I'm impressed
39
RESEARCH↑ trendingReddit r/LocalLLaMA·4/19/2026

QWEN3.6 + ik_llama is fast af

A user reported running the Qwen3.6 + ik_llama model at over 50 tokens/second with a 200k context window on 16GB VRAM and 32GB RAM. This marks a significant performance benchmark for large language models.

QWEN3.6 + ik_llama is fast af
38