← heapsort-ai

LLMs

722 items

RESEARCHarXiv CS.CL·28d ago

ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV

This paper introduces ClinicalBench, a 400-question benchmark designed to stress-test assertion-aware retrieval for cross-admission clinical QA on MIMIC-IV using real EHR notes. It also presents EpiKG, a patient knowledge graph system that incorporates assertion and temporality tags to route retrieval by question intent, demonstrating significant performance improvements across various LLMs.

28
RESEARCHarXiv CS.CL·7d ago

Fixing FOLIO and MALLS: Verified Annotations and an LLM-assisted Framework to Focus Human Relabeling

A systematic inspection of extsf{FOLIO} and extsf{MALLS} validation splits revealed high rates of incorrect FOL formalizations and ambiguous NL sentences, distorting AI model evaluation. The authors developed and released corrected ground truths for these datasets, demonstrating how annotation errors impact the evaluation of state-of-the-art LLMs.

28
DOCDEV.to AI·4/22/2026

RAG Systems in Production: Building Enterprise Knowledge Search

Retrieval-Augmented Generation (RAG) systems are presented as a revolutionary approach for enterprises to build intelligent knowledge systems by combining LLMs with domain-specific knowledge. This guide, based on Groovy Web's experience with Fortune 500 companies, covers the comprehensive process of building and deploying production-ready RAG systems, from architecture to monitoring.

28
RESEARCHarXiv CS.AI·4/13/2026

SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks

Sequence-Level PPO (SPPO) addresses the limitations of standard token-level PPO in long-horizon LLM reasoning tasks by reformulating the process as a Sequence-Level Contextual Bandit problem. This approach uses a decoupled scalar value function to derive low-variance advantage signals, offering improved sample efficiency and stability without the high computational overhead of critic-free alternatives.

28
RESEARCHarXiv CS.CL·4/10/2026

Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs

Este artigo propõe uma estrutura de refinamento baseada em raciocínio que utiliza LLMs como juízes semânticos para validar e reestruturar os resultados de algoritmos de agrupamento de texto não supervisionados. A estrutura inclui verificação de coerência, adjudicação de redundância e fundamentação de rótulos, visando melhorar a qualidade dos clusters sem dados rotulados.

28
ARTICLEDEV.to AI·4/18/2026

Multi-Agent Architecture: Specialist Routing in an Autonomous Task System

This article details a specialist routing architecture for autonomous agent systems, arguing against the inefficiency and cost of using a single powerful generalist model for all tasks. By classifying requests and employing specialized agents, this approach optimizes expenses and produces cleaner, more contextually relevant outputs, based on production deployment.

28