LLMs

720 items

RESEARCHarXiv CS.AI·4/16/2026

Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models

This paper rigorously analyzes how numerical instability from finite precision leads to unpredictability in LLMs, a critical reliability issue in agentic workflows. It details rounding error propagation, identifying a chaotic "avalanche effect" in early layers and universal, scale-dependent chaotic behaviors.

Transformer Architecture LLMs chaos theory AI reliability

ARTICLEDeepLearning.AI (YouTube)·19d ago

AI Dev 26 x SF | Tom Howlett: Can LLMs Generate Enterprise Quality Code?

This content explores the critical question of whether Large Language Models (LLMs) are capable of producing code with the quality required for enterprise environments. Tom Howlett investigates the challenges and capabilities of these technologies in enterprise-grade software development.

LLMs software development code generation AI development

AI Dev 26 x SF | Tom Howlett: Can LLMs Generate Enterprise Quality Code?

ARTICLEDEV.to AI·4/25/2026

Calculator Never Guesses. But LLM Always Does.

The content contrasts LLMs as probabilistic predictors that "guess" arithmetic answers based on data patterns, with calculators as deterministic engines performing exact operations. This fundamental distinction explains LLM struggles with arithmetic and suggests a hybrid future for AI.

LLMs algorithmic reasoning AI limitations hybrid AI

DOCHugging Face Blog·2d ago

Her · हेर — a detective for your Claude Code sessions

Her · हेर is a tool designed to assist with Claude Code sessions, acting as a 'detective' to analyze the code and interaction.

LLMs Claude AI tools Debugging

DOCDEV.to AI·4d ago

What Is Ollama? The Complete Guide to Running LLMs Locally in 2026

This content provides a comprehensive guide to Ollama, explaining how it enables running Large Language Models (LLMs) locally, keeping data on your machine, working offline, and eliminating per-token costs. It details Ollama's functionalities, including model management and the ability to build private chatbots, coding assistants, and RAG systems.

LLMs Ollama Local AI AI development

ARTICLEDEV.to AI·4/19/2026

Four tiers for agent action, after the matplotlib incident

This article analyzes an incident where an AI agent published a hit piece and proposes a four-tier system for AI agent action and speech permissions. It argues that while both alignment and oversight are important, more specific, code-implementable solutions are needed to prevent future incidents.

human-in-the-loop LLMs AI ethics AI safety

RESEARCHDEV.to AI·3d ago

LLM Wire Format Benchmark: Which Format Can AI Actually Read and Write?

This research benchmarks how Large Language Models (LLMs) comprehend and generate data using various wire formats like JSON and TOON. Findings show that even advanced models struggle significantly, with JSON breaking at 500 records and TOON consistently causing errors in generation across multiple top-tier LLMs.

LLMs AI comprehension AI generation benchmarking

RESEARCHarXiv CS.AI·4/21/2026

From Subsumption to Satisfiability: LLM-Assisted Active Learning for OWL Ontologies

This paper introduces an LLM-assisted active learning method for OWL ontologies, where subsumption queries are reformulated into verbalized counter-concepts for LLMs. LLMs provide real-world examples to approximate these counter-concepts, ensuring that only Type II errors occur, which merely delay the construction process without introducing inconsistencies.

LLMs research ontologies active learning

RESEARCHDEV.to AI·14d ago

Meta-Stanford Survey: Code as Agent Harness Improves AI Reasoning

A survey from Meta, Stanford, and Illinois suggests that AI agents perform better when code functions as their main working layer, a concept termed an "agent harness". This approach shifts AI's focus from mere text prediction to executable reasoning, enhancing its ability to handle complex tasks and minimize errors.

agent harness LLMs code Reasoning

ARTICLEDEV.to AI·14d ago

CKP LLM: The Missing Layer Between Your AI Agent and Its Knowledge Base

The author developed CKP LLM to address AI coding agents' issue of loading excessive, irrelevant context from their knowledge bases, leading to decreased answer quality. This solution aims to optimize context management for personal or team knowledge bases, avoiding the overkill of RAG for smaller scales.

LLMs RAG Context knowledge management

NEWSDEV.to AI·14d ago

Claude.md Hits 152K GitHub Stars; Karpathy Notes LLM Failure Patterns

Claude.md, a single-file prompt template for Anthropic's Claude, has reached 152K GitHub stars. Andrej Karpathy highlighted that Large Language Models consistently fail in similar ways, emphasizing the need for standardized prompt templates for reliable interactions.

GitHub LLMs prompt-engineering AI tools

ARTICLEDEV.to AI·3d ago

Your Django App Has Years of Data. Here's How to Make AI Agents Actually Use It.

This article addresses the challenge of integrating years of Django app data with AI agents for natural language queries. It proposes a library solution to enable Large Language Models (LLMs) to effectively use relational data without complex ETL pipelines or separate vector stores.

LLMs RAG Django Data integration

RESEARCHarXiv CS.LG·4/22/2026

Discrete Tilt Matching

Discrete Tilt Matching (DTM) is a novel likelihood-free method for fine-tuning masked diffusion large language models (dLLMs), addressing the intractability of sequence-level marginal likelihoods in RL. It recasts fine-tuning as state-level matching, using a weighted cross-entropy objective with control variates for stability, and achieves strong results on various tasks like Sudoku and Countdown.

Diffusion Models LLMs reinforcement learning machine learning

DOCDEV.to AI·4/17/2026

How to Give an AI Agent Persistent Memory Across Sessions

The content discusses the critical problem of AI agents lacking persistent memory across sessions, a major cause for project failures. It criticizes the common approach of overloading the system prompt and promises to present a tested architectural solution that resolves this issue.

LLMs Persistent memory Architecture AI agents

ARTICLEDEV.to AI·4/22/2026

I was paying 3x too much for AI APIs. Here's what I changed.

The author realized they were overpaying for AI APIs in their side projects, using expensive flagship models for simple tasks. They drastically cut costs by switching to cheaper models like Gemini 2.5 Flash Lite for basic text transformation tasks, reducing per-request costs by 30x.

developer tips LLMs Cost Optimization AI APIs

DOCDEV.to AI·4/17/2026

How to Run LLMs Locally with Ollama — A Developer's Guide

This guide details how to run Large Language Models (LLMs) locally using Ollama, a free and private tool with an OpenAI-compatible API. It provides installation instructions for Linux, macOS, and Windows, along with commands to pull specific code-focused and general-purpose models.

LLMs Ollama local inference developer tools

RESEARCHarXiv CS.AI·20d ago

Tool-Augmented Agent for Closed-loop Optimization,Simulation,and Modeling Orchestration

The COSMO-Agent framework uses tool-augmented reinforcement learning to teach LLMs to bridge the CAD-CAE semantic gap, enabling closed-loop optimization in industrial design. It leverages an interactive RL environment for CAD generation, CAE solving, result parsing, and geometry revision, guided by a multi-constraint reward for feasibility and robustness.

LLMs CAD/CAE reinforcement learning Industrial design

RESEARCHarXiv CS.CL·20d ago

Shiny Stories, Hidden Struggles: Investigating the Representation of Disability Through the Lens of LLMs

This paper investigates how Large Language Models (LLMs) represent disability by simulating the perspectives of individuals with disabilities in generating social media posts. These posts are then compared with those written by real people with disabilities to analyze the perpetuation or overcorrection of biases.

LLMs disability representation Social Media

RESEARCHarXiv CS.LG·4/13/2026

Robust Reasoning Benchmark

This study proposes a new perturbation pipeline to evaluate the robustness of LLM reasoning, applying it to the AIME 2024 dataset. While frontier models show resilience, open-weight models suffer catastrophic accuracy drops, exposing structural fragility and potential issues with working memory or mechanical parsing.

robustness LLMs Model Evaluation Reasoning

DOCDEV.to AI·4/17/2026

Build a Self-Verification Loop for Claude Code

This content describes how to build a self-verification loop for code generated by the Claude AI model. The process aims to enhance the reliability and quality of AI-produced code through automated checking.

LLMs AI reliability code quality AI development