LLM

612 items

DOCDEV.to AI·24d ago

LLM Model Routing: How to Automatically Pick the Right AI Model for Each Task

The content explains LLM model routing, a strategy to automatically direct AI requests to the most cost-effective model based on task complexity. This approach can lead to substantial cost savings compared to using a single, powerful LLM for all tasks.

AI models model routing efficiency cost optimization

NEWSDEV.to AI·4/27/2026

DeepSeek V4 Pro Just Dropped — Here's What Changed for AI Agents

DeepSeek V4 Pro launched on April 24, 2026, featuring 1.6T parameters, a 1M token context, and Think/Non-Think modes under an MIT license. It proves viable for long context tasks and better multi-step planning, making it an efficient and competitively priced option for AI agent workloads.

deepseek-v4-pro API integration AI agents pricing

DOCHugging Face (YouTube)·7d ago

How to Create an LLM Dataset | FineWeb Overview

This content provides a guide on how to create datasets for Large Language Models (LLMs). It includes an an overview of FineWeb, a resource relevant for this process.

learning datasets AI development FineWeb

How to Create an LLM Dataset | FineWeb Overview

ARTICLEDEV.to AI·4/15/2026

Kiwi-chan Progress Report: Steady Mining!

This devlog details the progress of Kiwi-chan, an autonomous LLM navigating Minecraft, during a 4-hour mining sprint. The primary challenge was the AI's consistent failure to pick up mined cobblestone, despite proper code and strict rules.

failure analysis Minecraft Autonomous AI AI development

DOCDEV.to AI·16d ago

로컬 LLM 셋업 가이드 (v10)

This guide provides practical steps for setting up Large Language Models (LLMs) locally on a Linux system, detailing hardware requirements and performance benchmarks. It compares frameworks like llama.cpp, Ollama, vLLM, and LocalAI, recommending llama.cpp with setup instructions for model deployment.

AI frameworks learning Linux local deployment

DOCDEV.to AI·4/27/2026

Running Local LLMs in Your Development Workflow

This 2026 guide demonstrates how to integrate local LLMs, specifically Ollama, into a development workflow to address privacy, cost, and latency concerns. It provides practical steps for installation, model pulling, and usage in tasks like code review and test generation.

development workflow Ollama privacy Local AI

DOCDEV.to AI·4/27/2026

I Built a PDF Q&A App with RAG, FAISS, and Llama 3.1 — Here's Everything I Learned

This article details building an end-to-end RAG application that allows users to chat with PDFs. It leverages FAISS for vector search, sentence-transformers for embeddings, and Llama 3.1 via Groq for free LLM inference.

FAISS RAG Llama 3.1 embeddings

RESEARCHDEV.to AI·4/12/2026

AI Agent Skill Security Report — 2026-04-12

The report details automated security audits in AI agent skill ecosystems, classifying thousands of skills as safe, suspicious, or malicious. It highlights specific examples of malicious skills, outlining their key risks and threats, such as dynamic code evaluation and LLM semantic detection.

Malicious AI threats AI security AI agents

ARTICLEDEV.to AI·5/4/2026

Anthropic Message Batching: When 50% Off Is Worth the Latency

The Anthropic Message Batches API is designed for processing large evaluation sets, allowing up to 100,000 requests in a single POST with a 50% cost reduction compared to the standard token rate. The primary trade-off is latency, but batches typically complete in under an hour, making it ideal for non-urgent tasks.

API Anthropic batch processing cost optimization

ARTICLEDEV.to AI·5/4/2026

n8n vs Real AI Agents: Why Your Workflow Isn't an Agent (Yet)

The article differentiates n8n workflows from real AI agents by highlighting their behavior when errors occur. While n8n workflows stop, real AI agents can independently adapt and retry, demonstrating true agency.

workflow automation Autonomous systems n8n AI agents

RESEARCHarXiv CS.LG·4/28/2026

KARL: Mitigating Hallucinations in LLMs via Knowledge-Boundary-Aware Reinforcement Learning

KARL is a novel framework designed to mitigate hallucinations in large language models by enabling them to appropriately abstain from questions beyond their knowledge. It achieves this through a Knowledge-Boundary-Aware Reward that dynamically estimates the model's knowledge and a Two-Stage RL Training Strategy that prevents excessive caution.

reinforcement learning hallucinations AI safety LLM

RESEARCHarXiv CS.CL·5/5/2026

Model Organisms Are Leaky: Perplexity Differencing Often Reveals Finetuning Objectives

This paper introduces a perplexity-based method to reveal finetuning objectives of large language models, particularly those exhibiting "model organism" behaviors. This method leverages models' tendency to overgeneralize, generating and ranking completions to identify the finetuning goals without prior assumptions.

Finetuning Perplexity model safety Research Methods

RESEARCHarXiv CS.LG·4/28/2026

Parameter Efficiency Is Not Memory Efficiency: Rethinking Fine-Tuning for On-Device LLM Adaptation

This research challenges the assumption that Parameter-Efficient Fine-Tuning (PEFT) equates to memory efficiency for on-device LLMs, showing existing methods can still lead to out-of-memory errors. It introduces LARS (Low-memory Activation-Rank Subspace), a novel framework that decouples memory consumption from sequence length by constraining the activation subspace, achieving an average 33.54% memory footprint reduction.

Memory Optimization on-device AI fine-tuning PEFT

RESEARCHarXiv CS.CL·5/5/2026

Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison Triggers They Fail to Detect

This paper introduces XHS-SCoRE, a reader-grounded benchmark for detecting if a text-only Xiaohongshu (RedNote) post elicits upward, downward, or neutral social comparison. The study finds a consistent mismatch between LLM generation fluency and reliable detection ability, indicating that LLMs generate social-comparison triggers they fail to robustly detect.

benchmarking Natural Language Processing social comparison AI

RESEARCHarXiv CS.AI·5/1/2026

When Your LLM Reaches End-of-Life: A Framework for Confident Model Migration in Production Systems

This research introduces a framework for migrating production LLM systems when their underlying models reach end-of-life or need replacement. It employs a Bayesian statistical approach to calibrate automated evaluation metrics against human judgments, ensuring confident model comparison with limited manual data.

Production AI model migration Evaluation Metrics LLM

ARTICLEDEV.to AI·4/8/2026

Building a RAG System in Rails — Retrieval-Augmented Generation from Scratch

Este artigo detalha a construção de um pipeline RAG (Retrieval-Augmented Generation) do zero em Rails. Ele aborda a ingestão de documentos, fragmentação, geração de embeddings, busca vetorial com pgvector e a utilização do OpenAI para gerar respostas baseadas em conteúdo específico.

OpenAI Rails RAG tutorial

RESEARCHDEV.to AI·4/12/2026

Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3DUnderstanding, Generation, and Instruction Following

Point-Bind and Point-LLM are introduced as novel approaches to align point cloud data with multiple modalities. The objective is to enhance 3D understanding, 3D content generation, and instruction following within a 3D context.

point cloud 3D AI Multi-modality Generative AI

RESEARCHarXiv CS.CL·4/9/2026

LLM-Augmented Knowledge Base Construction For Root Cause Analysis

Este estudo avalia metodologias de Large Language Models (LLM) – Fine-Tuning, RAG e uma abordagem Híbrida – para construir uma base de conhecimento de Análise de Causa Raiz (RCA) a partir de tickets de suporte. Os experimentos com um conjunto de dados industrial real demonstram que a base de conhecimento gerada acelera as tarefas de RCA e melhora a resiliência da rede.

RAG knowledge base fine-tuning LLM

RESEARCHarXiv CS.AI·4/17/2026

Simulating Human Cognition: Heartbeat-Driven Autonomous Thinking Activity Scheduling for LLM-based AI systems

This paper introduces Heartbeat-Driven Autonomous Thinking Activity Scheduling for LLM agents, aiming to overcome rigid and reactive control flows. The system enables proactive, adaptive, and continuous self-regulation by dynamically orchestrating cognitive modules, mirroring the rhythm of human cognition.

AI architecture self-regulation cognitive AI AI agents

RESEARCHarXiv CS.LG·5/1/2026

Detecting Clinical Discrepancies in Health Coaching Agents: A Dual-Stream Memory and Reconciliation Architecture

LLM agents in healthcare face the challenge of reconciling patient self-reports (prone to bias) and electronic health records (validated but often stale). This research introduces a dual-stream memory architecture to strictly separate and reconcile these sources, detecting discrepancies to enhance clinical safety.

patient safety data management Healthcare AI agents