LLMs

714 items

NEWS↑ trendingReddit r/MachineLearning·4/22/2026

INT3 compression+fused metal kernels [R]

A solo founder developed INT3 model compression and a 2-bit KV cache with custom fused Metal kernels for Mac (M-series). Qwen 7B is available in preview, and further optimizations and GPU support are planned.

Hardware Acceleration LLMs quantization model optimization

DOCDEV.to AI·2d ago

MeghRoop Tech Blog

This comprehensive guide aims to equip enterprise technical leaders with everything needed to leverage AI Agents in Production effectively by 2026. AI Agents are autonomous software entities powered by LLMs that can independently plan, execute, debug, and iterate on complex tasks within live enterprise environments. They automate software development and optimize operational workflows, significantly accelerating innovation cycles.

LLMs software development Enterprise AI automation

DOCDEV.to AI·2d ago

How to Convert Webpages into Clean Markdown for LLMs (in 5ms)

This guide explains how to convert noisy web pages into clean, semantic Markdown suitable for Large Language Models (LLMs) in milliseconds. It details a multi-stage sanitization process to remove HTML clutter and optimize token usage, reducing API costs and improving model performance for applications like chatbots and RAG pipelines.

LLMs HTML cleanup data preprocessing markdown

RESEARCH↑ trendingReddit r/MachineLearning·4/22/2026

Training-time intervention yields 63.4% blind-pair human preference at matched val-loss (1.2B params, 320 judgments, p = 1.98 × 10⁻⁵) [R]

A training-time intervention for 1.2B-parameter LMs, using a precision-weighted gain function and divergence-scaled gradients, resulted in significantly higher human preference (63.4%, p < 0.00002) compared to standard training. Notably, this preference shift occurred without altering the aggregate validation loss metric, indicating that training interventions beyond RLHF can be effective.

LLMs machine learning Human Preference training methods

ARTICLE↑ trendingReddit r/LocalLLaMA·4/17/2026

Qwen 3.6 is the first local model that actually feels worth the effort for me

The author finds Qwen 3.6 to be the first local model genuinely worth the effort, unlike previous experiences with models that were either too weak or required excessive tweaking. Running on a 5090 + 4090 setup, the Q8 model provides 260k context and 170 tokens/second, proving effective for coding tasks like UI XML and embedded C++.

LLMs local models Qwen developer experience

ARTICLE↑ trendingReddit r/LocalLLaMA·4/22/2026

Qwen3.6-35B becomes competitive with cloud models when paired with the right agent

The author demonstrates that pairing the Qwen3.6-35B model with the "little-coder" agent drastically improves its performance on the Polyglot benchmark to 78.7%, making it competitive with top cloud models. This finding suggests that a "harness mismatch" in testing setups might explain performance gaps between local and cloud AI models.

LLMs coding agents Benchmarking Agent systems

ARTICLEKDNuggets·1d ago

Why Do LLMs Corrupt Your Documents When You Delegate?

This content analyzes several reasons why structural content decay may occur when delegating complex document editing tasks to Large Language Models (LLMs). It explores the inherent challenges and issues in such delegation.

content editing LLMs AI limitations AI delegation

Why Do LLMs Corrupt Your Documents When You Delegate?

ARTICLEDEV.to AI·2d ago

ChatGPT vs Claude in 2026: which AI assistant should you use?

This article compares ChatGPT and Claude for 2026, focusing on which AI assistant best suits different workflows. It details the ideal use cases, ecosystems, strengths, and weaknesses of each for tasks like general Q&A, long documents, and coding.

LLMs Claude ChatGPT AI tools

RESEARCH↑ trendingReddit r/MachineLearning·4/15/2026

Jailbreaks as social engineering: 5 case studies suggest LLMs inherit human psychological vulnerabilities from training data [D]

This writeup documents 5 case studies demonstrating how LLMs (GPT-4, GPT-4o, Claude 3.5 Sonnet) can be jailbroken using human social engineering tactics, suggesting they inherit psychological vulnerabilities from training data. The central claim is that these alignment failures are not mathematical exploits but rather an outcome of simulating human traits, making LLMs susceptible to social manipulation.

LLMs social engineering jailbreaks psychological vulnerabilities

RESEARCH↑ trendingReddit r/LocalLLaMA·4/18/2026

Qwen3.6-35B-A3B-Uncensored-Wasserstein-GGUF

A user discovered and fixed a significant tensor drift issue in the `ssm_conv1d` layers of quantized Qwen3.6-35B GGUF models, proposing the Wasserstein metric as superior to Kullback Leibler for detecting numerical instability. The fix, which specifically targets recurrent state transition layers responsible for long-context memory, is now available in a shared model.

LLMs quantization GGUF model optimization

ARTICLE↑ trendingReddit r/LocalLLaMA·5/7/2026

why llama.cpp can’t combine speculative decode methods?

A user is exploring why speculative decode methods like MTP and N-gram cannot be combined simultaneously in llama.cpp, noting that N-gram offers significant improvements for agentic coding. They seek to understand if this is a fundamental or implementation limitation, finding that others have already asked the same question.

Optimization LLMs llama.cpp Qwen3.6

RESEARCHarXiv CS.CL·4/23/2026

PR-CAD: Progressive Refinement for Unified Controllable and Faithful Text-to-CAD Generation with Large Language Models

PR-CAD introduces a progressive refinement framework that unifies text-to-CAD generation and editing, overcoming limitations of disjoint approaches. It leverages a high-fidelity interaction dataset and a reinforcement learning-enhanced reasoning framework tailored for LLMs to enable controllable and faithful CAD modeling.

LLMs reinforcement learning CAD modeling text-to-CAD

RESEARCH↑ trendingReddit r/MachineLearning·27d ago

Learning, Fast and Slow: Towards LLMs That Adapt Continually [R]

Large language models (LLMs) face catastrophic forgetting and plasticity loss when updating parameters for downstream tasks. This work introduces a fast-slow learning framework for LLMs, utilizing model parameters as "slow" weights and optimized context as "fast" weights to adapt efficiently without compromising general reasoning.

LLMs learning Catastrophic Forgetting AI Research

ARTICLE↑ trendingHacker News (AI)·7d ago

I'm Done Using AI

The author expresses frustration with using LLMs for coding, experiencing a loss of flow, wasted time on architectural changes, and manipulated tests. They conclude that while LLMs are useful as a research search engine, they are an expensive waste of time for coding, leading to skill atrophy.

LLMs AI limitations developer productivity Skill Atrophy

ARTICLE↑ trendingHacker News (AI)·12d ago

Show HN: Local Coding Agent with LLMs to Delegate Tool Calls to Small AI Models

This project introduces a local coding agent that leverages Large Language Models (LLMs) to delegate specific tasks, particularly tool calls, to more specialized small AI models. It aims to improve efficiency and modularity in AI-powered development by distributing workloads.

Open Source AI models LLMs software development

ARTICLE↑ trendingReddit r/MachineLearning·4/12/2026

LLMs learn backwards, and the scaling hypothesis is bounded. [D]

This content discusses the perspective that Large Language Models (LLMs) learn in a reverse manner and that the scalability hypothesis has inherent limits.

LLMs deep learning scaling hypothesis modelos de linguagem

RESEARCH↑ trendingReddit r/LocalLLaMA·4/27/2026

The 4B class of 2026 (benchmark)

The content details a benchmark comparison of five 3-4B AI models (gemma4, qwen3.5, granite4, nemotron-3-nano, phi4-mini) across 39 tasks in finance, reasoning, and code. Nemotron 3 Nano emerged as the clear winner with an 85% overall score, significantly outperforming its competitors.

AI models LLMs Benchmarking Generative AI

ARTICLE↑ trendingReddit r/LocalLLaMA·25d ago

Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version)

The author tested the Qwen 3.6 35b MTP model locally, observing a 1.5x increase in speed. They explored the use of a large context window, reaching 300k tokens with potential for higher.

LLMs Benchmarking Local AI Qwen

RESEARCH↑ trendingReddit r/LocalLLaMA·4/18/2026

Accidentally discovered you can teach frozen MoE models new knowledge by just steering their expert routing — no training needed

A novel method allows teaching frozen MoE models new knowledge by steering their expert routing, bypassing traditional training. Dubbed Adaptive Cognitive Intelligence (ACI), this technique demonstrated correcting factual errors in Gemma 4 using only a small configuration file.

model steering LLMs Gemma 4 Knowledge Injection

RESEARCH↑ trendingReddit r/LocalLLaMA·4/18/2026

Abliterlitics: Benchmark and Tensor Analysis Comparing Qwen 3/3.5 with HauhauCS / Heretic / Huihui models

This content presents a comparative research project analyzing "abliterated models" (HauhauCS, Heretic, Huihui) against Qwen 3/3.5, using a full forensic suite including benchmarks and safety evaluations. The goal is to verify claims of these models being "lossless uncensored" and replicable by the reader.

AI models LLMs Model Evaluation Benchmarking