LLMs

715 items

CASEDEV.to AI·3d ago

We Built an AI That Remembers Everything Your Team Forgets

An AI system called ECHO was developed to transform Slack chaos into a living Knowledge Graph, addressing team forgetfulness. It uses LLMs for entity extraction, builds relationships into a graph, and applies temporal decay to maintain the relevance of team expertise.

LLMs Knowledge Graph team collaboration knowledge management

RESEARCHDEV.to AI·4/13/2026

TALM: Tool Augmented Language Models

TALM (Tool Augmented Language Models) focuses on integrating external tools with large language models to augment their capabilities. This approach allows LLMs to perform complex tasks more effectively by leveraging specialized functions and real-world interactions.

language models LLMs NLP Tool Augmentation

ARTICLEDEV.to AI·3d ago

How I built an intent drift detector for LLM agents

This article details the creation of SIP (State Integrity Protocol), a tool designed to detect intent and semantic drift in LLM agent outputs. It addresses the silent failure problem of AI agents by automatically checking for discrepancies between expected and actual outcomes.

LLMs Semantic Drift Intent Detection AI agents

RESEARCHarXiv CS.CL·4/13/2026

Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models

This study evaluates the performance of prompting strategies (chain-of-thought and zero-shot) in extended reasoning LLMs like Grok-4.1, varying the sampling temperature across 39 challenging mathematical problems. It found that zero-shot prompting peaks at moderate temperatures, while chain-of-thought performs best at temperature extremes, significantly increasing the benefit of extended reasoning.

mathematical reasoning LLMs Prompting Temperature

ARTICLEDEV.to AI·3d ago

AI agent memory management: beyond the context window

This article addresses the critical issue of AI agents forgetting information due to context window limitations, where older messages are evicted. It highlights that this is a memory architecture problem, not hallucination, and proposes going beyond treating the context window as the agent's sole memory.

AI architecture LLMs Context window memory management

RESEARCHarXiv CS.CL·4/23/2026

Do Hallucination Neurons Generalize? Evidence from Cross-Domain Transfer in LLMs

Recent research indicates that "hallucination neurons" (H-neurons) predicting LLM hallucinations do not generalize across different knowledge domains. This suggests that hallucination may not be a single mechanism with a universal neural signature, but rather context-dependent.

LLMs hallucination AI safety AI Research

RESEARCHarXiv CS.CL·4d ago

Improving Heart-Focused Medical Question Answering in LLMs via Variance-Aware Rubric Rewards with GRPO

This research investigates optimizing Large Language Models (LLMs) for heart-focused medical question answering using Group Relative Policy Optimization (GRPO) for post-training. A Variance-Aware Reward Framework is proposed to enhance rubric-based supervision with continuous analytical reward functions.

LLMs Medical Question Answering GRPO healthcare AI

ARTICLEDEV.to AI·4/13/2026

I built a data platform that lets AI agents query 2,500+ verified datasets

The creator built autario, a data platform making 2,500+ verified public datasets from various sources queryable for humans, apps, and especially AI agents. This platform aims to prevent LLM hallucinations by allowing real-time data querying and chart publishing with verified information.

verified data LLMs Data Platform data querying

RESEARCHarXiv CS.CL·19d ago

MedicalBench: Evaluating Large Language Models Toward Improved Medical Concept Extraction

This paper introduces MedicalBench, a new benchmark for evaluating Large Language Models in medical concept extraction from electronic health records. It focuses on implicit medical reasoning and evidence grounding, addressing the challenge of identifying concepts not explicitly stated.

LLMs concept extraction Healthcare Benchmarking

RESEARCHarXiv CS.AI·12d ago

Why LLMs Fail at Causal Discovery and How Interventional Agents Escape

This research paper reveals that large language models fundamentally fail at causal discovery due to their inability to distinguish between causal graphs generating similar observational data. It introduces a "kernel obstruction theorem" to formalize this intrinsic limitation of current learning paradigms.

LLMs research Causal Discovery machine learning

RESEARCHarXiv CS.CL·4/16/2026

Mathematical Reasoning Enhanced LLM for Formula Derivation: A Case Study on Fiber NLI Modellin

This research introduces a mathematical reasoning-enhanced generative AI approach for deriving optical communication formulas, specifically for fiber nonlinear interference modelling. By guiding an LLM with structured prompts, the study successfully reconstructed known expressions and derived a novel approximation, demonstrating both physical consistency and practical accuracy.

mathematical reasoning LLMs Scientific Discovery Generative AI

RESEARCHarXiv CS.CL·21d ago

Retrieval-Based Multi-Label Legal Annotation: Extensible, Data-Efficient and Hallucination-Free

The paper proposes casting multi-label legal annotation as a retrieval task, using frozen models and k-nearest neighbors to assign labels. This method achieves competitive accuracy and strong data efficiency across legal datasets, significantly reducing computational costs compared to fine-tuning large language models.

Multi-label Classification LLMs Legal AI Data efficiency

RESEARCHarXiv CS.CL·13d ago

Pretraining Data Exposure in Large Language Models: A Survey of Membership Inference, Data Contamination, and Security Implications

This paper offers the first unified survey of Pretraining Data Exposure (PDE) in Large Language Models (LLMs), covering data contamination and membership inference. It formalizes PDE, reviews attack and defense methods, and highlights open challenges to ensure evaluation integrity and protect privacy.

LLMs membership inference data privacy security

RESEARCHarXiv CS.AI·5d ago

VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

We introduce VAMPS, a new benchmark for multimodal large language models (MLLMs) focusing on visual-assisted mathematical problem solving. It contains 1,168 bilingual multiple-choice question-answer pairs drawn from Iranian University Entrance Exams, where plotting provides a natural solution strategy.

multimodal AI LLMs Benchmarking mathematics

ARTICLEDEV.to AI·4/21/2026

How we handle LLM context window limits without losing conversation quality

This article addresses the critical challenge of LLM context window limits, which causes chatbots to forget information and agents to lose track of goals, despite models offering larger windows. It highlights that simply expanding context windows is insufficient due to prohibitive costs and increased latency, promising to share production strategies and trade-offs.

LLMs Context window Cost Optimization performance

ARTICLEDEV.to AI·4/8/2026

I Built a Tool to Test Whether Multiple LLMs Working Together Can Beat a Single Model

O Occursus Benchmark é uma plataforma de benchmarking de código aberto que testa se múltiplas LLMs colaborando podem superar um único modelo. A ferramenta avalia 22 estratégias de orquestração em quatro provedores de LLMs, usando julgamento cego duplo para pontuar a qualidade das saídas.

multi-model AI avaliação de desempenho Orquestração LLMs

RESEARCHarXiv CS.AI·5d ago

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

StepPRM-RTL is a novel framework that enhances LLM-based RTL code generation by combining stepwise trajectory modeling, process-reward modeling (PRM), and retrieval-augmented fine-tuning (RAFT). It uses dense feedback from a PRM to guide reinforcement-style updates and Monte Carlo Tree Search (MCTS) to enrich the training dataset.

LLMs reinforcement learning code generation RTL Synthesis

ARTICLEDEV.to AI·4/11/2026

Why Chunking Is the Biggest Mistake in RAG Systems

Este artigo critica a técnica de 'chunking' em sistemas RAG, destacando seus problemas de perda de contexto e erros em documentos estruturados, como registros clínicos. Propõe a indexação ciente da estrutura e a sumarização como métodos mais eficazes para lidar com dados complexos.

chunking LLMs RAG Document Intelligence

ARTICLEDEV.to AI·4d ago

<think>

This article delves into cost-effective alternatives to GPT-4o, revealing how other AI models can offer significant savings for developers. It provides direct cost comparisons, highlighting options like DeepSeek V4 Flash and Qwen3-32B.

LLMs API Management development Cost Optimization

DOCML Mastery·5d ago

Using Scikit-LLM with Open-Source LLMs

This article provides a tutorial on integrating locally hosted open-source large language models such as Mistral, Gemma, and Llama 3 for language tasks like text classification. It demonstrates how to achieve this for free using Ollama and the Scikit-LLM Python library.

Open Source LLMs learning Python