← heapsort-ai

LLMs

722 items

RESEARCHarXiv CS.CL·5/8/2026

A Few Good Clauses: Comparing LLMs vs Domain-Trained Small Language Models on Structured Contract Extraction

This paper evaluates whether a domain-trained Small Language Model (SLM) can outperform frontier Large Language Models on structured contract extraction at radically lower cost. Olava Extract achieved the strongest aggregate performance and highest precision scores, reducing inference cost by 78% to 97% compared with the frontier models tested.

27
RESEARCHarXiv CS.LG·4/20/2026

The Spectral Geometry of Thought: Phase Transitions, Instruction Reversal, Token-Level Dynamics, and Perfect Correctness Prediction in How Transformers Reason

This research paper discovers spectral phase transitions in large language models' hidden activation spaces during reasoning versus factual recall. A systematic spectral analysis across 11 models and 5 architecture families identifies seven core phenomena, including reasoning spectral compression and instruction tuning spectral reversal.

27
RESEARCHarXiv CS.AI·4/27/2026

When Does LLM Self-Correction Help? A Control-Theoretic Markov Diagnostic and Verify-First Intervention

This research frames LLM self-correction as a cybernetic feedback loop, using a two-state Markov model to determine when iterative refinement helps versus hurts. It identifies a critical EIR threshold (<= 0.5%) separating beneficial from harmful self-correction, showing that only a few models improve, while others like GPT-5 degrade.

27
RESEARCHarXiv CS.CL·4/27/2026

When Cow Urine Cures Constipation on YouTube: Limits of LLMs in Detecting Culture-specific Health Misinformation

This research examines how LLMs struggle to detect culture-specific health misinformation, using cow urine discourse in India as a case study. It finds that LLMs, primarily trained on Western data, are ill-equipped to analyze content blending traditional language with pseudo-scientific claims, highlighting the need for cultural competency in AI-assisted analysis.

27
RESEARCHarXiv CS.LG·4/17/2026

TOPCELL: Topology Optimization of Standard Cell via LLMs

TOPCELL is a novel framework that uses Large Language Models (LLMs) to optimize transistor topology in standard cell design, overcoming the limitations of traditional exhaustive search methods. By reformulating topology exploration as a generative task and employing GRPO for fine-tuning, it significantly improves the discovery of routable and physically-aware layouts for advanced technology nodes.

27
ARTICLEDEV.to AI·29d ago

When I started running models locally, I thought quantization meant squeezing more into RAM. Turns o

The article advises against defaulting to Q4_K_M for local LLM inference, emphasizing that optimal performance comes from testing quantization levels tailored to specific workflows. It suggests that aggressive quantization like Q3_K_S can significantly cut latency with imperceptible quality loss for many tasks, though context length presents a trade-off.

27
RESEARCHarXiv CS.AI·4/20/2026

Structured Abductive-Deductive-Inductive Reasoning for LLMs via Algebraic Invariants

This research introduces a symbolic reasoning scaffold to address systematic limitations in LLMs' structured logical reasoning, such as conflating hypothesis generation and propagating weak inferences. It operationalizes Peirce's tripartite inference, enforcing logical consistency through algebraic invariants, notably the 'Weakest Link bound' to prevent conclusion reliability from exceeding its least-supported premise.

27
RESEARCHarXiv CS.CL·4/24/2026

Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech

This paper introduces Hierarchical Policy Optimization (HPO) for Simultaneous Speech Translation (SST) using LLMs, addressing challenges like high computational cost and imperfect supervised fine-tuning data. HPO employs a hierarchical reward to balance translation quality and latency, demonstrating substantial improvements in COMET and MetricX scores.

27
RESEARCHarXiv CS.CL·5/4/2026

How Frontier LLMs Adapt to Neurodivergence Context: A Measurement Framework for Surface vs. Structural Change in System-Prompted Responses

This study proposes NDBench, a benchmark to examine how frontier LLMs adapt their outputs based on neurodivergence context in system prompts. Findings consistently show that LLMs exhibit significant adaptation, yielding lengthier and more structured outputs under fully instructed conditions.

27