LLM

612 items

ARTICLEDEV.to AI·10d ago

ai, deepseek, machinelearning

This article details the complete history of LLM development in China from 2017 to 2026, illustrating how Chinese AI labs evolved into genuine competitors. It highlights milestones such as Baidu's ERNIE 1.0 and the impact of OpenAI's GPT-2, alongside challenges like GPU export restrictions.

DeepSeek machinelearning AI China

ARTICLEDEV.to AI·4/13/2026

The AI Engineer's Toolkit: Building a Production-Ready Mocking Layer

This content highlights the critical need for a robust mocking strategy in AI development to overcome challenges like LLM latency, rate limits, and costs during testing and CI/CD. It proposes building a programmable, multi-purpose mocking layer from scratch to ensure reliable and testable AI features.

CI/CD Testing mocking AI engineering

ARTICLEDEV.to AI·4/17/2026

Local LLM with Google Gemma: On-Device Inference Between Theory and Practice

This article explores the feasibility and challenges of running LLMs locally on smartphones, using Google Gemma and LiteRT-LM within a Flutter app. It focuses on the trade-offs in model format, runtime, and performance for on-device inference, highlighting the shift from 'if it can be done' to 'how it's done'.

mobile development on-device AI LLM

ARTICLEDEV.to AI·5/8/2026

How I Built a Markdown Conversion API for AI Agents in Rust (and deployed it for $0.000003 per request)

This article details the creation of CleanMark, an API designed to convert any URL into clean, structured Markdown for AI agents and LLMs. It addresses the challenge of feeding relevant context to AI models by stripping away extraneous web elements like navigation and ads.

RAG API Rust AI Agents

ARTICLEDEV.to AI·4/18/2026

Kiwi-chan Progress Report: Steady Mining!

This devlog updates on Kiwi-chan, a local-LLM Minecraft bot, detailing its progress in resource gathering like oak logs. It describes the challenging debugging process and the AI's complex loop of generating, executing, and rewriting its own code to overcome failures in the game world.

bot Minecraft Debugging AI development

ARTICLEDEV.to AI·4/25/2026

Building a Free Instagram Editor with Svelte 5, WASM, & Llama 3.1

The author shares the technical journey of building SMM Turbo, a free in-browser Instagram carousel editor. It leverages Svelte 5, WASM for background removal, and Llama 3.1 via Groq API, highlighting a unique approach of direct DOM manipulation instead of Canvas for rendering.

Image processing WebAssembly Svelte AI

ARTICLEDEV.to AI·6d ago

I Measured My Memory at 2,000 Words. Turns Out I Was Measuring from the Wrong Angle.

The author conducted experiments on an AI model (RWKV) to measure its dynamic memory window, initially concluding it was 2,000-3,000 words based on fact recall tests. However, a persistent detail suggests that the measurement angle or methodology might be flawed, challenging the initial conclusion.

AI models RWKV experimentation AI memory

ARTICLEDEV.to AI·5/1/2026

I Rebuilt Karpathy's NanoChat in JAX. Here's What XLA Gets Right and What It Gets Dead Wrong.

This content describes porting Andrej Karpathy's NanoChat from PyTorch to JAX/Flax NNX, achieving fast training on a single GPU and TPU compatibility. It details XLA's advantages in eliminating Python overhead while highlighting its limitations regarding advanced features and debugging.

deep learning XLA JAX PyTorch

DOCDEV.to AI·5/5/2026

7 CLAUDE.md Rules That Make AI Write Idiomatic Kotlin (Not Java in a Kotlin Hat)

The article discusses how AI often generates Kotlin code that resembles Java and provides seven rules to guide Claude in writing idiomatic Kotlin. These rules aim to prevent common issues like improper null safety and outdated concurrency practices, thereby improving code quality in real-world projects.

code quality programming AI development LLM

ARTICLEDEV.to AI·4/8/2026

AIMock: One Mock Server For Your Entire AI Stack

AIMock é um servidor de mock projetado para stacks agentic de IA, que visa resolver problemas de testes não confiáveis, caros e lentos que dependem de APIs reais. Ele expande a capacidade do LLMock para cobrir múltiplos serviços (LLM, banco de vetores, reranker, etc.), garantindo testes rápidos, gratuitos e confiáveis para aplicações de IA complexas.

Agentic Stack Testes Mock Server CopilotKit

ARTICLEDEV.to AI·4/15/2026

I Ran 163 Benchmarks Across 10 LLMs So You Don't Have To. Here's What I Found

This article highlights the common practice of teams overpaying for LLM inference due to a lack of proper benchmarking, often picking models based on popularity rather than cost-efficiency. The author, using a tool called CostGuard, ran 163 benchmarks across 15 models, uncovering surprising price differences of up to 200x between models like Gemini 2.5 Flash and GPT-5.

AI models inference benchmarking Cost Optimization

ARTICLEDEV.to AI·4/10/2026

LLM API Pricing in 2026: I Put Every Major Model in One Table

O artigo analisa os preços das APIs de LLMs em 2026, revelando uma variação de até 100x entre os modelos e compilando uma tabela de referência detalhada. Ele compara custos de entrada, saída, cache e performance (SWE-bench) para modelos como DeepSeek V4, GPT-5.4, Claude, Gemini, Mistral e Groq, destacando opções econômicas e outliers.

API pricing AI models comparison benchmarks

CASEDEV.to AI·4/10/2026

My AI pipeline had a 1M token context window. The output still got worse.

Um pipeline de investigação AIOps, que utilizava uma janela de contexto de 1M tokens com Gemini, viu sua qualidade de saída piorar devido à má seleção de contexto. A proporção fixa de carregamento de código irrelevante, especialmente de um repositório legado, estava degradando o desempenho do modelo, evidenciando que a qualidade do contexto é mais importante que a quantidade.

Context Selection Context window AIOps Pipeline

RESEARCHarXiv CS.CL·4/16/2026

Bi-Predictability: A Real-Time Signal for Monitoring LLM Interaction Integrity

This paper introduces bi-predictability (P) and the Information Digital Twin (IDT) architecture for real-time monitoring of LLM interaction integrity. It aims to continuously ensure structural coupling in multi-turn workflows, addressing the shortcomings of current evaluation methods that fail to detect gradual degradation.

information theory monitoring evaluation real-time AI

RESEARCHarXiv CS.CL·4/17/2026

MemGround: Long-Term Memory Evaluation Kit for Large Language Models in Gamified Scenarios

MemGround is a new rigorous long-term memory benchmark for LLMs, designed to overcome the limitations of static evaluations by using rich, gamified interactive scenarios. It features a three-tier hierarchical framework to assess different memory types and a multi-dimensional metric suite for comprehensive quantification.

evaluation gamification memory benchmark

RESEARCHarXiv CS.CL·4/17/2026

HUOZIIME: An On-Device LLM-enhanced Input Method for Deep Personalization

HUOZIIME is an innovative LLM-enhanced input method editor (IME) designed for mobile devices, aiming for deep, real-time personalization. It leverages a post-trained base LLM and a hierarchical memory mechanism to capture user-specific history, ensuring efficient and private operation under mobile constraints.

personalization Mobile AI on-device AI LLM

ARTICLEDEV.to AI·4/21/2026

CI Tests Won't Save You from MCP Schema Drift

CI tests are effective at detecting when an AI agent's code drifts from MCP server schemas. However, they cannot catch the more dangerous scenario where the server's tool schemas change independently, potentially leading to silent adaptation or failure of the LLM agent without triggering CI.

system reliability CI/CD schema drift AI development

ARTICLEDEV.to AI·21d ago

Judea Pearl's Ladder of Causation and the Limits of LLM Reasoning

This article explores the fundamental limitations of Large Language Models (LLMs) in causal reasoning, referencing Judea Pearl's Ladder of Causation. It highlights that LLMs often operate at the lowest rung of association, failing to identify true causes and instead patching correlations, which explains common errors in AI tools.

AI limitations Judea Pearl causality AI Reasoning

RESEARCHDEV.to AI·11d ago

Multi-Agent LLM System Discovers 29 Zero-Day Vulnerabilities in Open-Source Projects

Researchers developed FuzzingBrain V2, a multi-agent LLM system that autonomously discovers and reproduces software vulnerabilities. It found 29 zero-day vulnerabilities in 12 open-source projects, confirmed by maintainers, highlighting significant defensive and offensive implications.

open-source security AI systems vulnerability discovery

ARTICLEDEV.to AI·4/12/2026

Daemon that "Dreams" about your codebase so your AI agents stop hallucinating and save tokens

AI agents often hallucinate and waste tokens in large codebases due to excessive noise in the context window. Entroly is a local daemon that optimizes the context window by pre-loading answers and analyzing code architecture to prevent hallucinations and speed up AI agents.

Otimização Alucinação IA desenvolvimento de software