← heapsort-ai

natural language processing

167 items

RESEARCHarXiv CS.CL·4/6/2026

Dependency-Guided Parallel Decoding in Discrete Diffusion Language Models

Modelos de linguagem de difusão discreta (dLLMs) aceleram a geração de texto, mas a decodificação paralela degrada a qualidade ao desconsiderar a dependência entre tokens. DEMASK propõe um preditor leve que estima influências condicionais para guiar o desmascaramento simultâneo, comprovadamente melhorando a qualidade. A técnica resulta em um ganho de velocidade de 1.7 a 2.2x, mantendo ou superando o desempenho.

29
RESEARCHarXiv CS.CL·4d ago

Efficient Punctuation Restoration via Weighted Lookahead Scoring Method for Streaming ASR Systems

This paper introduces a non-autoregressive scoring method for efficient punctuation restoration in streaming Automatic Speech Recognition (ASR) systems. It compares punctuation insertion hypotheses against a no-insertion baseline using a bounded K-subword-token lookahead, outperforming existing prompt-based methods.

28
RESEARCHarXiv CS.CL·6d ago

Fixing FOLIO and MALLS: Verified Annotations and an LLM-assisted Framework to Focus Human Relabeling

A systematic inspection of extsf{FOLIO} and extsf{MALLS} validation splits revealed high rates of incorrect FOL formalizations and ambiguous NL sentences, distorting AI model evaluation. The authors developed and released corrected ground truths for these datasets, demonstrating how annotation errors impact the evaluation of state-of-the-art LLMs.

28
RESEARCHarXiv CS.CL·22d ago

Greedy or not, here I come: Language production under vocabulary constraints in humans and resource-rational models

This research explores how humans communicate with limited vocabularies, comparing their strategies to computational sampling algorithms powered by large language models. The study reveals that human language production under constraint often mirrors greedy sampling, although more skilled individuals exhibit non-greedy revision behaviors.

28
RESEARCHarXiv CS.CL·22d ago

Fluency and Faithfulness in Human and Machine Literary Translation

This research investigates the balance between fluency and faithfulness in literary translation, comparing human, Google Translate, and TranslateGemma performance across 106 novels in 16 source languages. It reveals a consistent negative correlation between fluency and faithfulness, particularly for human and Google Translate, and indicates that segment length significantly impacts automatic evaluation.

28
RESEARCHarXiv CS.CL·15d ago

Learnability-Informed Fine-Tuning of Diffusion Language Models

This research introduces LIFT, a learnability-informed fine-tuning algorithm designed to enhance the reasoning capabilities of diffusion language models. LIFT addresses the shortcomings of standard SFT by adaptively learning tokens based on their difficulty and available context during different diffusion time steps, showing improved performance over existing baselines.

28
ARTICLEDEV.to AI·5/1/2026

From Mumbles to Memos: Teaching AI to Decipher Technician Voice Notes

This article addresses the productivity bottleneck caused by manually deciphering technician voice notes, proposing AI as a solution to transform field recordings into professional summaries. It outlines a methodology, the 'Actionable Framework: The 3-Part Jargon List,' to train AI to categorize specific information from unstructured audio.

27