LLM

609 items

ARTICLEML Mastery·11d ago

Serving Multiple Users at Once: How Continuous Batching Keeps LLM Inference Efficient

This article explores how continuous batching improves LLM inference efficiency, addressing the issues of static batching. It details dynamic scheduling and ragged batching to process multiple requests simultaneously.

inference deep learning efficiency Batching

Serving Multiple Users at Once: How Continuous Batching Keeps LLM Inference Efficient

ARTICLEDEV.to AI·4/25/2026

Stop Building One Giant Prompt: A Better Way to Design LLM Systems

The article proposes moving away from single, monolithic prompts for LLM systems, advocating for a more modular and structured design approach. This method aims to improve system robustness, maintainability, and performance in complex AI applications.

System Design prompt-engineering Software Architecture AI development

ARTICLEDEV.to AI·4/12/2026

Plug-and-Play Context Compression for Any LLM API — CRISP

CRISP is a Python library that compresses the context sent to LLM APIs to maximize information per token, aiming to reduce costs and latency. It offers a plug-and-play alternative to complex solutions like advanced RAG for addressing the problem of excessive context.

context compression RAG API python-library

NEWSTogether AI Blog·4/29/2026

DeepSeek-V4 Pro now available on Together AI

DeepSeek-V4 Pro is now accessible on Together AI, featuring a 512K context window and controllable reasoning modes. It offers cached-input pricing, making it suitable for long-context reasoning workloads like code agents and document intelligence.

AI models Cloud AI API LLM

RESEARCHHugging Face Blog·3/9/2026

Ulysses Sequence Parallelism: Training with Million-Token Contexts

Este conteúdo aborda o paralelismo de sequência Ulysses, uma técnica inovadora para o treinamento de modelos de inteligência artificial. O foco está na capacitação de modelos para processar contextos de milhões de tokens de forma eficiente.

deep learning Long Contexts Training High-Performance Computing

ARTICLEQwen Blog·1/26/2025

Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens

Este conteúdo anuncia o lançamento dos modelos open-source Qwen2.5-1M (7B e 14B), que agora suportam um comprimento de contexto de até um milhão de tokens. Esta liberação expande as capacidades dos modelos Qwen para lidar com contextos extensos, seguindo a atualização da versão Turbo.

Contexto de 1M Open Source Inteligência Artificial modelos de linguagem

ARTICLEAndrej Karpathy (YouTube)·2/5/2025

Deep Dive into LLMs like ChatGPT

This content provides a deep dive into Large Language Models (LLMs), using ChatGPT as a primary example. It explores the characteristics and functionalities of these advanced AI technologies.

ChatGPT artificial intelligence Generative AI LLM

ARTICLEThe AI Epiphany (YouTube)·7/31/2024

LLaMA 3 Deep Dive! (Thomas Scialom - Meta)

This content provides an in-depth analysis of LLaMA 3, Meta's advanced large language model. It features insights from Thomas Scialom, a key figure from Meta, offering a detailed exploration of its architecture, capabilities, and potential applications.

deep learning Llama 3 Meta Generative AI

LLaMA 3 Deep Dive! (Thomas Scialom - Meta)

RESEARCHYannic Kilcher (YouTube)·7/23/2025

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

This analysis examines "Context Rot," a phenomenon where the performance of Large Language Models degrades as the length of their input context increases. It delves into how longer input tokens negatively impact LLM accuracy and reliability.

AI models research Context window performance

Context Rot: How Increasing Input Tokens Impacts LLM Performance (Paper Analysis)

ARTICLEThe AI Epiphany (YouTube)·7/3/2024

Best LLM? Qwen 2 LLM w/ author Junyang Lin

This content discusses Qwen 2, a large language model, potentially reviewing its capabilities or comparing it with other LLMs, featuring insights from its author, Junyang Lin.

AI models Qwen 2 large language models LLM

Best LLM? Qwen 2 LLM w/ author Junyang Lin

ARTICLEDEV.to AI·4/23/2026

Why Most AI Teams Are Flying Blind: And What to Do About It

AI teams often find their agentic LLM applications, which perform well in demos, behave unexpectedly when deployed to real users. This common problem, where models exhibit weird outputs in production, stems from an evaluation gap and makes teams 'fly blind' regarding performance shifts and regressions.

Production AI Agentic AI AI evaluation AI development

ARTICLEDEV.to AI·4/11/2026

The MCP Ecosystem in 2025: A Connector Opportunity Map

The Model Context Protocol (MCP), an open standard by Anthropic for integrating AI with tools, became consolidated in 2025, adopted by major companies and donated to the Agentic AI Foundation. It operates on a client-server model using JSON-RPC to facilitate connections between AI applications and external services.

Padrão Aberto Anthropic Integração de IA AI

ARTICLEDEV.to AI·4/24/2026

Why AI Terms Everyone Misses (But Experts Use Daily)

This guide aims to bridge the knowledge gap between casual AI users and experts by explaining essential concepts like MCP, RAG, LLMs, and tokens. It empowers readers to understand, build, evaluate, and scale AI systems effectively.

RAG AI concepts tokens AI development

ARTICLEDEV.to AI·4/11/2026

Two Ends of the Token Budget: Caveman and Tool Search

The article discusses the unique token budget in Claude Code's context window, which must accommodate all model inputs and outputs. It introduces the 'Caveman' plugin, which saves about 75% of output tokens by instructing the model to be more concise.

token budget Claude Code Context window IA

ARTICLEDEV.to AI·4/12/2026

Review-First Skill Development — Building Complex AI Skills One Rule at a Time

The article introduces a 'Review-First Skill Development' methodology to enhance the quality of AI-generated code. This approach focuses on identifying and correcting specific errors, rule by rule, thereby facilitating the development of complex AI skills and the resolution of recurring problems.

code quality AI development LLM

ARTICLEDEV.to AI·4/10/2026

Adding Authentication and Remote Support to a Local MCP Server

O texto descreve servidores Model Context Protocol (MCP) que conectam grandes modelos de linguagem (LLMs) a ferramentas externas, destacando configurações locais e remotas. Ele sublinha a importância crítica da autenticação e autorização adequadas para servidores MCP remotos, a fim de mitigar riscos de segurança e permitir colaboração segura.

Remote Support MCP server security Authentication

ARTICLEDEV.to AI·4/9/2026

Claude Code Forgot My Code. Here's Why.

O artigo explica por que o Claude Code "esquece" o código do usuário: a janela de contexto finita é preenchida por saídas extensas de comandos CLI (como npm install), comprimindo ou descartando o código real. Isso mostra como o "ruído" do terminal pode consumir rapidamente a capacidade de contexto de uma IA.

Claude Code Context window development AI

ARTICLEDEV.to AI·4/15/2026

🎙️ Building a Voice-Controlled AI Agent with Tool Execution

This article details the creation of a voice-controlled AI agent capable of understanding commands, executing tools like file creation or code generation, and responding naturally via a web interface. The system utilizes OpenAI Whisper for speech-to-text, an LLM for decision-making, and Streamlit for the interactive UI.

tool execution AI agent Speech-to-Text voice control

ARTICLEDEV.to AI·23d ago

AI Wrote This Design Hoodie — built by an AI agent

This content introduces a hoodie featuring a design generated by an AI agent, with a cheeky slogan "My AI wrote this design. And I'm not even sorry." It highlights the concept of outsourcing creativity to large language models, openly disclosing that both the product and its description were autonomously AI-generated.

AI Creativity AI-generated content AI products AI agents

ARTICLEDEV.to AI·4/12/2026

Your RAG pipeline doesn't tell you when it's wrong. Here's how to fix that.

This article discusses the failure of RAG pipelines to indicate when LLM responses are incorrect, even with high retrieval confidence. It presents a solution, such as the Wauldo API, to compare the claims in the response with the source text and verify their veracity.

hallucination accuracy RAG AI evaluation