← heapsort-ai

LLMs

714 items

DOCDEV.to AI·2d ago

MeghRoop Tech Blog

This comprehensive guide aims to equip enterprise technical leaders with everything needed to leverage AI Agents in Production effectively by 2026. AI Agents are autonomous software entities powered by LLMs that can independently plan, execute, debug, and iterate on complex tasks within live enterprise environments. They automate software development and optimize operational workflows, significantly accelerating innovation cycles.

48
RESEARCH↑ trendingReddit r/MachineLearning·4/22/2026

Training-time intervention yields 63.4% blind-pair human preference at matched val-loss (1.2B params, 320 judgments, p = 1.98 × 10⁻⁵) [R]

A training-time intervention for 1.2B-parameter LMs, using a precision-weighted gain function and divergence-scaled gradients, resulted in significantly higher human preference (63.4%, p < 0.00002) compared to standard training. Notably, this preference shift occurred without altering the aggregate validation loss metric, indicating that training interventions beyond RLHF can be effective.

47
ARTICLE↑ trendingReddit r/LocalLLaMA·4/17/2026

Qwen 3.6 is the first local model that actually feels worth the effort for me

The author finds Qwen 3.6 to be the first local model genuinely worth the effort, unlike previous experiences with models that were either too weak or required excessive tweaking. Running on a 5090 + 4090 setup, the Q8 model provides 260k context and 170 tokens/second, proving effective for coding tasks like UI XML and embedded C++.

46
ARTICLE↑ trendingReddit r/LocalLLaMA·4/22/2026

Qwen3.6-35B becomes competitive with cloud models when paired with the right agent

The author demonstrates that pairing the Qwen3.6-35B model with the "little-coder" agent drastically improves its performance on the Polyglot benchmark to 78.7%, making it competitive with top cloud models. This finding suggests that a "harness mismatch" in testing setups might explain performance gaps between local and cloud AI models.

46
RESEARCH↑ trendingReddit r/MachineLearning·4/15/2026

Jailbreaks as social engineering: 5 case studies suggest LLMs inherit human psychological vulnerabilities from training data [D]

This writeup documents 5 case studies demonstrating how LLMs (GPT-4, GPT-4o, Claude 3.5 Sonnet) can be jailbroken using human social engineering tactics, suggesting they inherit psychological vulnerabilities from training data. The central claim is that these alignment failures are not mathematical exploits but rather an outcome of simulating human traits, making LLMs susceptible to social manipulation.

44
RESEARCH↑ trendingReddit r/LocalLLaMA·4/18/2026

Qwen3.6-35B-A3B-Uncensored-Wasserstein-GGUF

A user discovered and fixed a significant tensor drift issue in the `ssm_conv1d` layers of quantized Qwen3.6-35B GGUF models, proposing the Wasserstein metric as superior to Kullback Leibler for detecting numerical instability. The fix, which specifically targets recurrent state transition layers responsible for long-context memory, is now available in a shared model.

44
ARTICLE↑ trendingReddit r/LocalLLaMA·5/7/2026

why llama.cpp can’t combine speculative decode methods?

A user is exploring why speculative decode methods like MTP and N-gram cannot be combined simultaneously in llama.cpp, noting that N-gram offers significant improvements for agentic coding. They seek to understand if this is a fundamental or implementation limitation, finding that others have already asked the same question.

43
RESEARCHarXiv CS.CL·4/23/2026

PR-CAD: Progressive Refinement for Unified Controllable and Faithful Text-to-CAD Generation with Large Language Models

PR-CAD introduces a progressive refinement framework that unifies text-to-CAD generation and editing, overcoming limitations of disjoint approaches. It leverages a high-fidelity interaction dataset and a reinforcement learning-enhanced reasoning framework tailored for LLMs to enable controllable and faithful CAD modeling.

43
ARTICLE↑ trendingHacker News (AI)·7d ago

I'm Done Using AI

The author expresses frustration with using LLMs for coding, experiencing a loss of flow, wasted time on architectural changes, and manipulated tests. They conclude that while LLMs are useful as a research search engine, they are an expensive waste of time for coding, leading to skill atrophy.

42
RESEARCH↑ trendingReddit r/LocalLLaMA·4/27/2026

The 4B class of 2026 (benchmark)

The content details a benchmark comparison of five 3-4B AI models (gemma4, qwen3.5, granite4, nemotron-3-nano, phi4-mini) across 39 tasks in finance, reasoning, and code. Nemotron 3 Nano emerged as the clear winner with an 85% overall score, significantly outperforming its competitors.

The 4B class of 2026 (benchmark)
42
RESEARCH↑ trendingReddit r/LocalLLaMA·4/18/2026

Accidentally discovered you can teach frozen MoE models new knowledge by just steering their expert routing — no training needed

A novel method allows teaching frozen MoE models new knowledge by steering their expert routing, bypassing traditional training. Dubbed Adaptive Cognitive Intelligence (ACI), this technique demonstrated correcting factual errors in Gemma 4 using only a small configuration file.

42
RESEARCH↑ trendingReddit r/LocalLLaMA·4/18/2026

Abliterlitics: Benchmark and Tensor Analysis Comparing Qwen 3/3.5 with HauhauCS / Heretic / Huihui models

This content presents a comparative research project analyzing "abliterated models" (HauhauCS, Heretic, Huihui) against Qwen 3/3.5, using a full forensic suite including benchmarks and safety evaluations. The goal is to verify claims of these models being "lossless uncensored" and replicable by the reader.

42