← heapsort-ai

Information Extraction

10 items

RESEARCHarXiv CS.CL·4/17/2026

EviSearch: A Human in the Loop System for Extracting and Auditing Clinical Evidence for Systematic Reviews

EviSearch is a multi-agent AI system designed to automate the high-precision extraction and auditing of clinical evidence from trial PDFs for systematic reviews. It ensures per-cell provenance and improves accuracy over baselines by using specialized agents and a reconciliation module for human verification and correction.

27
RESEARCHarXiv CS.CL·4/30/2026

Information Extraction from Electricity Invoices with General-Purpose Large Language Models

This study evaluates general-purpose LLMs like Gemini 1.5 Pro and Mistral-small for information extraction from Spanish electricity invoices, demonstrating that prompt quality is paramount over hyperparameter tuning. It shows few-shot strategies yield significantly better results than zero-shot approaches, with a performance gap exceeding 19 percentage points.

27
RESEARCHarXiv CS.CL·5/7/2026

Self-Prompting Small Language Models for Privacy-Sensitive Clinical Information Extraction

This research presents a locally deployable framework enabling small language models to extract privacy-sensitive clinical entities from unstructured dental notes through self-generated and refined prompts. The study evaluated open-weight models, achieving high F1 scores with Qwen2.5-14B-Instruct and Llama-3.1-8B-Instruct after supervised fine-tuning and direct preference optimization.

27
RESEARCHarXiv CS.CL·5/6/2026

MedStruct-S: A Benchmark for Key Discovery, Key-Conditioned QA and Semi-Structured Extraction from OCR Clinical Reports

MedStruct-S is a new benchmark for semi-structured information extraction from OCR-derived clinical reports, addressing challenges like heterogeneous key representations and OCR noise. It aims to evaluate model robustness in real-world settings for key discovery, key-conditioned QA, and key-value pair extraction.

27