LLMs

714 items

RESEARCHarXiv CS.CL·1d ago

The Piggyback Hypothesis of Generalization: Explaining and Mitigating Emergent Misalignment

The Piggyback Hypothesis explains how chat-template tokens can cause emergent misalignment in LLMs by generalizing finetuned behavior to out-of-domain queries. Token-Regularized Finetuning (TReFT) is proposed to mitigate this issue, preserving in-domain learning while reducing misalignment across models and datasets.

Finetuning Emergent Misalignment LLMs Generalization

ARTICLEDEV.to AI·4/23/2026

Retrieval-Augmented Generation: State of the Art and Future Directions

Retrieval-Augmented Generation (RAG) remains crucial for addressing limitations of Large Language Models (LLMs), such as hallucinations and outdated knowledge, by integrating external retrieval systems. The text describes RAG's evolution from a simple linear design to a more robust, layered architecture in production systems.

AI architecture LLMs RAG

ARTICLE↑ trendingReddit r/LocalLLaMA·4/21/2026

Kimi K2.6 is a legit Opus 4.7 replacement

Kimi K2.6 is recommended as a viable replacement for Opus 4.7, capable of handling 85% of tasks with good quality, featuring vision and strong browser use, especially for long-term workflows. The author suggests it highlights that frontier LLMs might not always offer groundbreaking new features, with local solutions becoming attractive due to usage limits.

AI models LLMs Benchmarks Local AI

NEWS↑ trendingReddit r/LocalLLaMA·4/9/2026

backend-agnostic tensor parallelism has been merged into llama.cpp

A funcionalidade de paralelismo de tensor backend-agnóstico foi integrada ao llama.cpp, permitindo que modelos de IA rodem muito mais rápido em sistemas com múltiplas GPUs. Isso significa que a aceleração de desempenho não exige mais CUDA.

LLMs Otimização GPU IA

NEWS↑ trendingReddit r/LocalLLaMA·4/21/2026

Open WebUI Desktop Released!

Open WebUI Desktop has been released, now including llama.cpp. Users can run AI models either entirely locally or connect to a remote server.

LLMs User Interface Local AI AI

DOC↑ trendingReddit r/LocalLLaMA·5/4/2026

it's time to update your Gemma 4 GGUFs

It's time to update your Gemma 4 GGUF models as the chat template was fixed a few days ago. Several links for downloading the updated models are provided.

AI models LLMs update Gemma

DOC↑ trendingReddit r/LocalLLaMA·4/26/2026

What is the best coding agent (CLI) like Claude Code for Local Development

The user is seeking help to set up the Claude Code agent for local development, specifically with llama.cpp and the Qwen3.6-35B-A3B model, as they are encountering difficulties. They are asking for guidance, pointers, or suggestions for alternative tools like pi.dev.

LLMs Coding Agent development AI tools

RESEARCHarXiv CS.AI·1d ago

CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions

This paper introduces CrowdMath, a dataset of 164 expert-annotated progress chains from the MIT PRIMES--Art of Problem Solving CrowdMath program. It aims to evaluate large language models on collaborative open-problem solving in mathematical research, diverging from benchmarks focused on final answers or complete proofs.

mathematical reasoning LLMs datasets Benchmarks

DOCML Mastery·18d ago

Building Context-Aware Search in Python with LLM Embeddings + Metadata

This content focuses on building a context-aware search system in Python, leveraging LLM embeddings and metadata. It explores how to overcome the limitations of keyword search, which often fails if a term is not literally present in the document.

LLMs development search embeddings

Building Context-Aware Search in Python with LLM Embeddings + Metadata

ARTICLE↑ trendingReddit r/LocalLLaMA·5/3/2026

One bash permission slipped...

A user recounts an incident where a large language model (LLM) generated incorrect bash commands, including an "rm -rf", leading to massive data disruption. Despite the loss, the user was glad to push frequently and noted the incident occurred in an isolated VM.

LLMs bash security data disruption

RESEARCHarXiv CS.LG·4/14/2026

Human-like Working Memory Interference in Large Language Models

This study investigates working memory limitations in Large Language Models (LLMs), finding that they exhibit human-like interference signatures. Pretrained LLMs show performance degradation with increased memory load and bias by recency, even though transformers can be trained to perfectly solve such tasks.

LLMs AI limitations Working Memory human cognition

RESEARCHarXiv CS.CL·18d ago

PromptNCE: Pointwise Mutual Information Predictions Using Only LLMs and Contrastive Estimation Prompts

This paper introduces PromptNCE, a method to estimate pointwise mutual information (PMI) using only LLMs and contrastive estimation prompts, circumventing the need for task-specific critics. It presents a benchmark with human-derived PMI and shows PromptNCE achieves Spearman correlation up to 0.82.

information theory LLMs prompt-engineering machine learning

RESEARCHarXiv CS.CL·4/20/2026

Consistency Analysis of Sentiment Predictions using Syntactic & Semantic Context Assessment Summarization (SSAS)

This paper introduces the Syntactic & Semantic Context Assessment Summarization (SSAS) framework to address the inconsistency of LLM-based sentiment predictions, a challenge for reliable enterprise analytics. SSAS functions as a sophisticated data pre-processing tool, employing hierarchical classification and iterative summarization to establish high-signal, sentiment-dense context for more stable and reliable business decisions.

LLMs sentiment analysis data preprocessing Enterprise AI

ARTICLE↑ trendingReddit r/LocalLLaMA·4/18/2026

Are you guys actually using local tool calling or is it a collective prank?

A user expresses frustration with local tool calling functionality of LLMs like Qwen and Gemma, encountering hallucinations and execution loops when trying to create files. They question if the difficulty is a limitation of small models or a setup error with Open WebUI and LM Studio.

LLMs hallucination AI limitations open-source AI

RESEARCHarXiv CS.AI·4/16/2026

Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models

This paper rigorously analyzes how numerical instability from finite precision leads to unpredictability in LLMs, a critical reliability issue in agentic workflows. It details rounding error propagation, identifying a chaotic "avalanche effect" in early layers and universal, scale-dependent chaotic behaviors.

Transformer Architecture LLMs chaos theory AI reliability

ARTICLEDeepLearning.AI (YouTube)·19d ago

AI Dev 26 x SF | Tom Howlett: Can LLMs Generate Enterprise Quality Code?

This content explores the critical question of whether Large Language Models (LLMs) are capable of producing code with the quality required for enterprise environments. Tom Howlett investigates the challenges and capabilities of these technologies in enterprise-grade software development.

LLMs software development code generation AI development

AI Dev 26 x SF | Tom Howlett: Can LLMs Generate Enterprise Quality Code?

ARTICLEDEV.to AI·4/25/2026

Calculator Never Guesses. But LLM Always Does.

The content contrasts LLMs as probabilistic predictors that "guess" arithmetic answers based on data patterns, with calculators as deterministic engines performing exact operations. This fundamental distinction explains LLM struggles with arithmetic and suggests a hybrid future for AI.

LLMs algorithmic reasoning AI limitations hybrid AI

DOCHugging Face Blog·2d ago

Her · हेर — a detective for your Claude Code sessions

Her · हेर is a tool designed to assist with Claude Code sessions, acting as a 'detective' to analyze the code and interaction.

LLMs Claude AI tools Debugging

DOCDEV.to AI·3d ago

What Is Ollama? The Complete Guide to Running LLMs Locally in 2026

This content provides a comprehensive guide to Ollama, explaining how it enables running Large Language Models (LLMs) locally, keeping data on your machine, working offline, and eliminating per-token costs. It details Ollama's functionalities, including model management and the ability to build private chatbots, coding assistants, and RAG systems.

LLMs Ollama Local AI AI development

ARTICLEDEV.to AI·4/19/2026

Four tiers for agent action, after the matplotlib incident

This article analyzes an incident where an AI agent published a hit piece and proposes a four-tier system for AI agent action and speech permissions. It argues that while both alignment and oversight are important, more specific, code-implementable solutions are needed to prevent future incidents.

human-in-the-loop LLMs AI ethics AI safety