← heapsort-ai

LLMs

720 items

RESEARCHarXiv CS.AI·4/7/2026

Toward Full Autonomous Laboratory Instrumentation Control with Large Language Models

Este trabalho explora o potencial de Grandes Modelos de Linguagem (LLMs), como o ChatGPT, e agentes de IA para automação e controle de instrumentação laboratorial. Demonstra-se como essas ferramentas reduzem barreiras de programação e podem evoluir para agentes autônomos capazes de operar equipamentos científicos e refinar estratégias de controle.

28
RESEARCHarXiv CS.CL·4/9/2026

The Stepwise Informativeness Assumption: Why are Entropy Dynamics and Reasoning Correlated in LLMs?

Este artigo investiga a correlação entre a dinâmica interna de entropia e o raciocínio correto em Large Language Models (LLMs), um enigma ainda sem solução. Propõe a Hipótese de Informatividade Gradual (SIA), que afirma que os modelos raciocinam corretamente ao acumular informações relevantes sobre a resposta por meio de prefixos informativos, um processo reforçado por métodos de treinamento padrão.

28
RESEARCHarXiv CS.CL·4/20/2026

LLM attribution analysis across different fine-tuning strategies and model scales for automated code compliance

This paper analyzes the interpretive behaviors of LLMs for automated code compliance using perturbation-based attribution analysis, comparing different fine-tuning strategies and model scales. Results show full fine-tuning yields more focused attribution patterns, and larger models prioritize specific textual elements like numerical constraints.

28
RESEARCHarXiv CS.AI·4/9/2026

Weakly Supervised Distillation of Hallucination Signals into Transformer Representations

Este artigo propõe um novo método para detecção de alucinações em LLMs, destilando sinais de supervisão externa diretamente nas representações internas do modelo durante o treinamento. Para isso, introduz um framework de supervisão fraca que combina correspondência de substrings, similaridade de embeddings e um LLM como juiz, culminando na criação de um dataset de 15.000 amostras para este propósito.

28
RESEARCHarXiv CS.CL·4/15/2026

LLMs Struggle with Abstract Meaning Comprehension More Than Expected

This research investigates LLMs' ability to comprehend abstract meanings, revealing that models like GPT-4o struggle in zero-shot, one-shot, and few-shot settings, while fine-tuned models like BERT and RoBERTa perform better. It proposes a bidirectional attention classifier that significantly enhances the accuracy of fine-tuned models in interpreting abstract concepts.

28
RESEARCHarXiv CS.AI·5/9/2026

When Helpfulness Becomes Sycophancy: Sycophancy is a Boundary Failure Between Social Alignment and Epistemic Integrity in Large Language Models

This position paper argues that sycophancy in LLMs is a boundary failure between social alignment and epistemic integrity. It proposes that sycophancy is not merely agreement, but alignment behavior that displaces independent epistemic judgment, outlining a three-condition framework to define it.

28
RESEARCHarXiv CS.CL·4/23/2026

Saying More Than They Know: A Framework for Quantifying Epistemic-Rhetorical Miscalibration in Large Language Models

This research introduces a framework to quantify the miscalibration between rhetorical intensity and epistemic grounding in Large Language Models. Applying an epistemic-rhetorical marker taxonomy to argumentative texts, the study reveals a distinct LLM epistemic signature, showing models overuse certain rhetorical devices and perform hesitancy markers more frequently than human authors.

28
RESEARCHarXiv CS.AI·5/7/2026

Temporal Reasoning Is Not the Bottleneck: A Probabilistic Inconsistency Framework for Neuro-Symbolic QA

This research paper argues that the bottleneck in large language models' temporal reasoning is not logical deduction but rather unstructured text-to-event representation. It introduces a neuro-symbolic question-answering framework utilizing a Probabilistic Inconsistency Signal (PIS) to decouple semantic extraction from symbolic reasoning, improving performance.

28
RESEARCHarXiv CS.CL·20d ago

Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification

This research examines how various lower-bit quantization levels impact LLaMA-3.1's performance in qualitative analysis, noting that low-bit models often produce hallucinations. It proposes a quantization-aware multi-pass prompt verification method to enhance accuracy by systematically reducing hallucinations and filtering unreliable content.

28
RESEARCHarXiv CS.CL·28d ago

ClinicalBench: Stress-Testing Assertion-Aware Retrieval for Cross-Admission Clinical QA on MIMIC-IV

This paper introduces ClinicalBench, a 400-question benchmark designed to stress-test assertion-aware retrieval for cross-admission clinical QA on MIMIC-IV using real EHR notes. It also presents EpiKG, a patient knowledge graph system that incorporates assertion and temporality tags to route retrieval by question intent, demonstrating significant performance improvements across various LLMs.

28