LLMs

714 items

NEWS↑ trendingReddit r/MachineLearning·25d ago

arXiv implements 1-year ban for papers containing incontrovertible evidence of unchecked LLM-generated errors, such as hallucinated references or results. [N]

arXiv has announced a new policy imposing a 1-year ban for authors who submit papers containing incontrovertible evidence of unchecked LLM-generated errors, such as hallucinated references or results. This policy emphasizes that authors are fully responsible for all content, regardless of how it was generated by AI tools.

scientific publishing research ethics LLMs arXiv

RESEARCH↑ trendingReddit r/MachineLearning·4/16/2026

Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. [P]

A researcher built a benchmark to map LLMs on a 2D political compass using 98 questions, finding that refusal to answer is a political stance. Initial results include GPT-5.3, Claude Opus 4.6, and KIMI K2, with the repository being fully open-source.

LLMs political-bias Benchmarking AI ethics

RESEARCH↑ trendingReddit r/LocalLLaMA·5/7/2026

ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

ParoQuant is a novel technique that employs pairwise rotation quantization to significantly improve the efficiency of Large Language Model (LLM) inference. This method specifically targets reasoning LLMs, enabling more cost-effective and faster deployment by reducing computational and memory requirements.

Optimization LLMs efficiency quantization

ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference

NEWS↑ trendingReddit r/LocalLLaMA·4/9/2026

Marco-Mini (17.3B, 0.86B active) and Marco-Nano (8B, 0.6B active) by Alibaba

A Alibaba lançou recentemente os modelos Marco-Mini e Marco-Nano, variantes instrucionadas de modelos de linguagem multilingues altamente esparsos baseados em Mixture-of-Experts (MoE). O Marco-Mini, com apenas 0.86B de 17.3B parâmetros ativos, destaca-se por superar outros modelos de até 12B de parâmetros ativados em benchmarks de desempenho.

AI models LLMs Alibaba Sparse Models

NEWS↑ trendingReddit r/LocalLLaMA·4/27/2026

Skymizer Taiwan Inc. Unveils Breakthrough Architecture Enabling Ultra-Large LLM Inference on a Single Card

Skymizer Taiwan Inc. has unveiled a breakthrough architecture, the HTX301 card, that allows 700B-parameter LLM inference on a single PCIe card with 384GB memory and low power consumption (~240W). This approach offloads decoding to the HTX301 while GPUs handle prefill, enabling ultra-large LLM inference locally without massive GPU VRAM.

inference LLMs AI hardware

ARTICLE↑ trendingReddit r/LocalLLaMA·27d ago

TextGen is now a native desktop app. Open-source alternative to LM Studio (formerly text-generation-webui).

TextGen, an open-source alternative to LM Studio, has evolved into a no-install desktop application for Windows, Linux, and macOS. Developed since December 2022, this self-contained app provides a polished UI for text generation, working similarly to how LM Studio utilizes Electron.

desktop app Open Source LLMs text generation

TextGen is now a native desktop app. Open-source alternative to LM Studio (formerly text-generation-webui).

ARTICLE↑ trendingReddit r/MachineLearning·5/6/2026

Stop letting LLMs edit your .bib [D]

The author expresses shock at the frequent hallucinated citations by LLMs in academic papers, leading to incorrect author lists. They question the lack of respect for research and the need for harsher penalties, asking if others are experiencing the same issue.

LLMs citations hallucinations AI ethics

ARTICLE↑ trendingReddit r/LocalLLaMA·4/14/2026

How to Distill from 100B+ to <4B Models

This content discusses the process of AI model distillation, focusing on how to reduce massive models with over 100 billion parameters to significantly smaller versions with less than 4 billion. The aim is to enhance the efficiency and accessibility of complex AI models.

Model Compression LLMs Model Distillation AI Efficiency

ARTICLE↑ trendingReddit r/LocalLLaMA·4/17/2026

what’s actually stopping an insider from leaking model weights?

The content questions the technical barriers preventing an insider from leaking flagship LLM weights from companies like OpenAI or Anthropic. It suggests that LLMs are relatively self-contained, making exfiltration potentially easier than traditional software, and wonders why such leaks haven't occurred more often despite NDAs.

LLMs security Intellectual Property

ARTICLE↑ trendingReddit r/MachineLearning·27d ago

Sharing all KGC 2026 decks. More production-grade KG systems than I've seen at any conference. [D]

The Knowledge Graph Conference (KGC 2026) showcased a significant number of live production-grade Knowledge Graph systems from various enterprises, a departure from typical AI events often presenting only proofs of concept. Examples included Bloomberg's ontology governance, AbbVie's drug intelligence KG with an LLM interface, and Morgan Stanley's continuous SHACL drift detection for risk reporting.

AI applications LLMs Knowledge Graph Data Governance

ARTICLE↑ trendingReddit r/LocalLLaMA·4/19/2026

I'm running qwen3.6-35b-a3b with 8 bit quant and 64k context thru OpenCode on my mbp m5 max 128gb and it's as good as claude

A user reports running Qwen3.6-35b-a3b locally on an M5 Max MacBook Pro with 8-bit quantization and 64k context, finding its performance comparable to Claude. They are highly impressed with its speed, ability to handle complex research tasks, and the privacy benefits of local execution.

LLMs privacy Model Evaluation Local AI

CASE↑ trendingReddit r/LocalLLaMA·4/23/2026

Been using PI Coding Agent with local Qwen3.6 35b for a while now and its actually insane

The user reports an extremely positive and effective experience with the PI Coding Agent, utilizing a local Qwen3.6 35b model for production projects. Success was attributed to a custom "plan-first skill file" that enforces a structured planning workflow, ensuring step-by-step execution and plan approval before any coding.

LLMs prompt-engineering workflow automation code generation

ARTICLE↑ trendingReddit r/MachineLearning·4/26/2026

Going from 3B/7B dense to Nemotron 3 Nano (hybrid Mamba-MoE) for multi-task reasoning — what changes in the fine-tuning playbook? [D]

The author is transitioning from fine-tuning dense 3B/7B transformers to NVIDIA's Nemotron 3 Nano (a hybrid Mamba-Attention-MoE architecture) for multi-task reasoning. They are seeking guidance on how the hybrid architecture impacts standard LoRA fine-tuning, as their prior experience is limited to dense models.

LLMs multi-task reasoning AI Architectures Fine-tuning

CASE↑ trendingReddit r/LocalLLaMA·4/18/2026

qwen3.6 performance jump is real, just make sure you have it properly configured

A user reports that Qwen 3.6 demonstrates a significant performance leap, proving capable for workloads typically handled by Opus and Codex, though not yet at their level. The user highlights its usefulness and speed when properly configured with `preserve_thinking` on an M5 Max with specific settings.

LLMs AI hardware local inference AI performance

qwen3.6 performance jump is real, just make sure you have it properly configured

ARTICLE↑ trendingReddit r/LocalLLaMA·4/19/2026

LLM Neuroanatomy III - LLMs seem to think in geometry, not language

This article, part of the "LLM Neuroanatomy" series, posits that Large Language Models primarily process information geometrically rather than through linguistic representations. It explores the internal mechanisms and structural organization of these advanced AI models.

AI architecture LLMs deep learning Neuroscience

LLM Neuroanatomy III - LLMs seem to think in geometry, not language

ARTICLEDEV.to AI·4/22/2026

Your LLM Isn't the Problem. Your Pipeline Is.

The article highlights a common architectural problem in LLM-powered product tagging for e-commerce, where individual LLM calls, though correct, lack memory of previous calls, leading to fragmented taxonomy. The issue is not the LLM but the pipeline's failure to provide a consistent tag vocabulary as input.

LLMs data consistency Architecture e-commerce

RESEARCH↑ trendingReddit r/LocalLLaMA·4/22/2026

Dense vs. MoE gap is shrinking fast with the 3.6-27B release

Dense AI models currently outperform MoE overall, but MoE is rapidly catching up, particularly in coding benchmarks. For users with 24GB VRAM and a need for large context windows, MoE is becoming a more appealing option.

AI models LLMs Benchmarks MoE

Dense vs. MoE gap is shrinking fast with the 3.6-27B release

RESEARCHarXiv CS.CL·1d ago

The Piggyback Hypothesis of Generalization: Explaining and Mitigating Emergent Misalignment

The Piggyback Hypothesis explains how chat-template tokens can cause emergent misalignment in LLMs by generalizing finetuned behavior to out-of-domain queries. Token-Regularized Finetuning (TReFT) is proposed to mitigate this issue, preserving in-domain learning while reducing misalignment across models and datasets.

Finetuning Emergent Misalignment LLMs Generalization

ARTICLE↑ trendingReddit r/LocalLLaMA·4/16/2026

Gemma 4 31b 3D geometry

The author expresses strong satisfaction with Gemma 4's quality, highlighting its coding ability and adaptability in conversations and reasoning. A test involving 3D model generation from an F1 car image demonstrated that Gemma significantly outperformed models like Claude Sonnet, Gemini Pro, and ChatGPT, which exhibited notable flaws.

AI models LLMs 3D Generation Gemma

NEWS↑ trendingReddit r/LocalLLaMA·4/23/2026

Note the new recommended sampling parameters for Qwen3.6 27B

This content highlights the newly recommended sampling parameters for the Qwen3.6 27B AI model, which differ from Qwen3.5. It provides specific settings for general tasks, precise coding tasks, and instruct mode, including temperature, top_p, and various penalties.

AI models LLMs generation model parameters