LLM

611 items

CASEDEV.to AI·24d ago

Building an AI WhatsApp Bot for Business: Lessons from SARA

SARA is a WhatsApp AI assistant designed for SMBs, handling customer inquiries, qualifying leads, and scheduling appointments 24/7. It leverages WhatsApp's high engagement rates and supports multiple languages, built using Baileys, Ollama, and LLaMA 3.

AI assistant SMBs WhatsApp bot customer service AI

NEWSDEV.to AI·4/17/2026

Everything You Need to Know About Claude Opus 4.7

Anthropic has released Claude Opus 4.7, a direct upgrade to Opus 4.6 that maintains the same price and API shape. This new version offers significant improvements in coding and agentic tasks, showing notably better results in production benchmarks.

Anthropic AI Model LLM

ARTICLEDEV.to AI·15d ago

I Got 96% Recall on LLM Hallucination Detection With No ML Model – Just 50 Lines of Python

This article presents a method for detecting LLM hallucinations using statistical signals and just 50 lines of Python code, achieving 96% recall without needing to train an additional ML model. The approach avoids the computational overhead and opacity of other methods like SelfCheckGPT.

hallucination detection statistical analysis machine learning Python

ARTICLEDEV.to AI·25d ago

Why your local LLM knowledge base gives bad answers (and how to fix it)

Local LLMs often provide poor answers from personal knowledge bases, not due to the model itself, but due to issues in the retrieval layer. This article explores the frustrating problem and how the retrieval pipeline works.

Retrieval Augmented Generation knowledge base Local AI LLM

ARTICLEDEV.to AI·4/17/2026

Claude Opus 4.6 vs 4.7: Every Difference Side by Side

Claude Opus 4.7 introduces significant upgrades including 3x vision resolution, a new 'xhigh' effort slot, removed sampling parameters, and a new tokenizer with higher token usage. It also features behavioral shifts with more literal prompts and fewer tool calls, alongside three breaking changes requiring immediate migration from 4.6 code.

API changes AI updates Anthropic model comparison

RESEARCHDEV.to AI·4/17/2026

A comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability

This content provides a comprehensive evaluation of ChatGPT's zero-shot Text-to-SQL capability, meaning its ability to convert natural language into SQL queries without prior examples. It explores the model's performance and limitations in this complex task.

evaluation Text-to-SQL ChatGPT benchmark

ARTICLEDEV.to AI·5/9/2026

10 Trending Reddit Posts About AI Agents — What the Community Is Actually Talking About (May 2026)

This article analyzes the 10 most trending Reddit posts about AI agents in May 2026, revealing community discussions focused on debugging and production reality over hype. It highlights insights from subreddits like r/LocalLLaMA regarding practical agent implementation.

development Reddit community discussion AI Agents

RESEARCHDEV.to AI·5/7/2026

Stateless scheduler doubles LLM training speed

Fine-tuning large language models often faces bottlenecks from rigid GPU allocation and inefficient pipeline parallelism. A new stateless scheduler, RoundPipe, optimizes training by dynamically dispatching computation stages across a pool of GPUs, effectively doubling LLM training speed.

deep learning machine learning GPU optimization Parallelism

DOCDEV.to AI·26d ago

Spin Up a Multi‑Machine MCP Server Mesh with Cord in 10 Minutes

This guide demonstrates how to quickly set up a multi-machine MCP server mesh using Cord agents and an LLM runtime in under ten minutes. It focuses on achieving fast discovery, secure authentication, and zero-copy data sharing for distributed AI agent stacks without writing custom glue code.

tutorials learning distributed systems AI Agents

RESEARCHarXiv CS.AI·22d ago

LinAlg-Bench: A Forensic Benchmark Revealing Structural Failure Modes in LLM Mathematical Reasoning

LinAlg-Bench is a new diagnostic benchmark evaluating 10 frontier large language models (LLMs) on structured linear algebra computation, revealing structural failure modes. It assesses LLM performance across a dimensional gradient of matrices, classifying failures into ten primary error types and identifying a behavioral threshold at 4x4 matrices.

mathematical reasoning benchmarking linear algebra AI evaluation

RESEARCHarXiv CS.AI·5/9/2026

LaTA: A Drop-in, FERPA-Compliant Local-LLM Autograder for Upper-Division STEM Coursework

LaTA is an open-source, FERPA-compliant local-LLM autograder for upper-division STEM courses, designed to run on-premises to mitigate data risk. It uses a four-stage pipeline and a local chain-of-thought LLM to grade LaTeX-native student work against reference solutions using a YAML rubric.

open-source learning autograding security

RESEARCHarXiv CS.LG·4/23/2026

WorkflowGen:an adaptive workflow generation mechanism driven by trajectory experience

WorkflowGen addresses the high overhead and instability of LLM agents in complex tasks by proposing an adaptive, trajectory experience-driven framework for workflow generation. It captures full execution trajectories to extract reusable knowledge and performs lightweight generation on variable nodes, significantly reducing token usage and improving efficiency.

workflow automation efficiency AI Agents LLM

RESEARCHarXiv CS.LG·4/23/2026

Transparent Screening for LLM Inference and Training Impacts

This paper presents a transparent screening framework for estimating the inference and training impacts of large language models under limited observability. It aims to improve comparability, transparency, and reproducibility by providing an auditable proxy methodology for opaque proprietary services.

transparency sustainability LLM

RESEARCHarXiv CS.CL·5/7/2026

Not All That Is Fluent Is Factual: Investigating Hallucinations of Large Language Models in Academic Writing

This study investigates hallucinations in Large Language Models (ChatGPT, Grok, Gemini, Copilot) when generating academic content, using 80 prompts across four categories. A novel weighted metric, the Hallucination Index (HI), was introduced to measure factual accuracy and reference validity.

academic writing AI quality Model Evaluation hallucinations

DOCDEV.to AI·4/23/2026

How to Build Karpathy's LLM Wiki: The Complete Guide to AI-Maintained Knowledge Bases

This content describes Andrej Karpathy's LLM Wiki pattern, where an LLM agent builds and maintains a structured markdown knowledge base from raw sources. The guide provides a complete setup for creating such an AI-maintained knowledge system, offering an alternative to traditional RAG.

tutorials knowledge management AI Agents LLM

RESEARCHarXiv CS.CL·28d ago

RETUYT-INCO at BEA 2026 Shared Task 2: Meta-prompting in Rubric-based Scoring for German

This paper introduces RETUYT-INCO's participation in the BEA 2026 shared task on rubric-based short answer scoring for German, proposing a "Meta-prompting" method where an LLM generates custom prompts for grading. The team achieved 6th place in Track 1 and 4th place in Track 3, demonstrating the effectiveness of their LLM-based and other approaches.

German learning Short Answer Scoring Meta-prompting

CASEDEV.to AI·4/11/2026

How we turned a flaky OpenClaw agent into a deterministic, 7.2 cheaper production workflow

Este conteúdo detalha como reduzir os custos de LLM em fluxos de trabalho OpenClaw em 7,2 vezes. A solução envolveu a substituição da orquestração constante por LLMs pela compilação única de workflows usando AI Native Lang (AINL), garantindo eficiência e economia significativas em produção.

workflow automation cost reduction AI Agents AINL

ARTICLEDEV.to AI·5/9/2026

Steno: Opensource AI powered intelligence layer for your confidential conversations.

Steno is an open-source, privacy-focused AI notepad project that offers private data handling without cloud usage or limits. Version v0.3.0 now enables users to query across all their notes over time.

privacy security AI opensource

ARTICLEDEV.to AI·5d ago

Why LLM Agents Still Can't Query NoSQL Databases

LLMs excel at querying SQL databases due to SQL's precise nature and abundant training data, making it a natural interface. However, LLM agents struggle significantly with NoSQL databases, a common production data store, primarily because NoSQL lacks the specificity and consistent syntax found in SQL.

NoSQL SQL databases AI Agents

ARTICLEDEV.to AI·5d ago

AI API Cost Attribution in 2026: How to Track LLM Spend by Team and Request

Managing AI API costs by 2026 will require detailed attribution per team and request, not just per account. This entails propagating a stable ownership contract (like trace_id and owner_team) across all hops from gateway to model providers, to prevent attribution failures when the bill arrives.

cost management attribution API Management FinOps