← heapsort-ai

Natural Language Processing

168 items

RESEARCHarXiv CS.CL·4/15/2026

Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs

This paper introduces the Weighted Syntactic and Semantic Context Assessment Summary (wSSAS), a deterministic framework to optimize text categorization using LLMs. It addresses LLM limitations by organizing text hierarchically and employing a Signal-to-Noise Ratio (SNR) to focus on high-value semantic features.

27
RESEARCHarXiv CS.CL·5/5/2026

Psychologically Potent, Computationally Invisible: LLMs Generate Social-Comparison Triggers They Fail to Detect

This paper introduces XHS-SCoRE, a reader-grounded benchmark for detecting if a text-only Xiaohongshu (RedNote) post elicits upward, downward, or neutral social comparison. The study finds a consistent mismatch between LLM generation fluency and reliable detection ability, indicating that LLMs generate social-comparison triggers they fail to robustly detect.

27
RESEARCHarXiv CS.CL·4/10/2026

TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization

Este estudo apresenta o dataset TR-EduVSum, focado em vídeos educacionais turcos, e propõe o método AutoMUP. Este método gera resumos padrão-ouro de forma automática e reproduzível a partir de múltiplos resumos humanos, usando agrupamento de unidades de significado e modelagem estatística de consenso.

27
RESEARCHarXiv CS.CL·5/5/2026

Compared to What? Baselines and Metrics for Counterfactual Prompting

This work argues that observed effects from "counterfactual prompting" in LLMs cannot be attributed to a targeted factor without accounting for meaning-preserving text modifications that establish general model sensitivity. The research shows that prediction flip rates when surgically changing patient gender are statistically indistinguishable from rates induced by simply paraphrasing inputs, suggesting that special sensitivity to patient gender cannot be concluded.

27
RESEARCHarXiv CS.CL·4/27/2026

An End-to-End Ukrainian RAG for Local Deployment. Optimized Hybrid Search and Lightweight Generation

This paper introduces a highly efficient Retrieval-Augmented Generation (RAG) system specifically for Ukrainian document question answering, which achieved 2nd place in the UNLP 2026 Shared Task. It features a custom hybrid search and a specialized Ukrainian language model, compressed for high-quality, verifiable local deployment on resource-constrained hardware.

27
RESEARCHarXiv CS.CL·4/9/2026

Beyond Facts: Benchmarking Distributional Reading Comprehension in Large Language Models

Este artigo introduz o Text2DistBench, um novo benchmark para avaliar a capacidade de LLMs inferirem conhecimento distribucional a partir de linguagem natural. Diferente dos benchmarks tradicionais, ele foca em tarefas do mundo real, como estimar proporções de sentimentos ou identificar tópicos frequentes em coleções de texto como comentários do YouTube.

27
RESEARCHarXiv CS.CL·4/30/2026

MATH-PT: A Math Reasoning Benchmark for European and Brazilian Portuguese

This paper introduces MATH-PT, a novel dataset of 1,729 mathematical problems in European and Brazilian Portuguese, to address the linguistic bias in LLM mathematical reasoning evaluations. The benchmark reveals that frontier reasoning models achieve strong performance in multiple-choice questions but their performance decreases for open-ended questions.

27
RESEARCHarXiv CS.CL·5/1/2026

BatteryPass-12K: The First Dataset for the Novel Digital Battery Passport Conformance Task

This paper introduces BatteryPass-12K, the first public dataset for the novel task of digital battery passport (DBP) conformance classification, addressing a critical need before new EU regulations. It benchmarks 22 language models, finding that "Thinking models" like GPT-5.4 achieve the best performance, and few-shot examples significantly enhance results on this challenging task.

27
RESEARCHarXiv CS.CL·4/16/2026

A Proactive EMR Assistant for Doctor-Patient Dialogue: Streaming ASR, Belief Stabilization, and Preliminary Controlled Evaluation

This paper introduces a proactive EMR assistant for doctor-patient dialogue, designed to overcome limitations of passive systems by integrating streaming ASR, belief stabilization, and action planning. The system was evaluated in a preliminary controlled setting, achieving an F1 of 0.84 and Recall@5 of 0.87.

27
RESEARCHarXiv CS.CL·4/30/2026

CogRAG+: Cognitive-Level Guided Diagnosis and Remediation of Memory and Reasoning Deficiencies in Professional Exam QA

CogRAG+ is a training-free framework designed to diagnose and remediate memory and reasoning deficiencies in large language models for professional exam QA. It decouples and aligns retrieval and reasoning with human cognitive hierarchies, employing Reinforced Retrieval and cognition-stratified Constrained Reasoning to enhance accuracy and consistency.

27