Natural Language Processing

168 items

RESEARCHarXiv CS.CL·4/17/2026

Chinese Essay Rhetoric Recognition Using LoRA, In-context Learning and Model Ensemble

This paper explores Chinese essay rhetoric recognition using Large Language Models (LLMs), LoRA, and in-context learning to assess linguistic and higher-order thinking skills. The proposed method achieved the best performance and won first prize in the CCL 2025 Chinese essay rhetoric recognition evaluation task.

AI for education LLMs machine learning rhetoric recognition

RESEARCHarXiv CS.CL·5/8/2026

SLAM: Structural Linguistic Activation Marking for Language Models

SLAM (Structural Linguistic Activation Marking) is a novel white-box watermarking scheme for LLMs that embeds the mark into structural geometry rather than token frequencies. It achieves 100% detection accuracy with minimal quality loss, outperforming existing schemes.

LLMs watermarking Natural Language Processing model generation

RESEARCHarXiv CS.CL·20d ago

Parallel LLM Reasoning for Bias-Resilient, Robust Conceptual Abstraction

This study proposes a structured framework to improve LLM reasoning when analyzing long documents, addressing issues like contextual bias and omission error. It combines parallel chunk-level processing with evidence-anchored consolidation to generate more robust and bias-resilient conceptual abstractions.

Contextual Reasoning Natural Language Processing AI Research Bias

RESEARCHarXiv CS.CL·4/17/2026

Decoupling Scores and Text: The Politeness Principle in Peer Review

This study investigates the difficulty of interpreting peer review feedback, comparing the effectiveness of numerical scores versus text in predicting acceptance. The research reveals that score-based models are significantly more accurate (91%) than text-based models (81% even with LLMs), indicating textual information is considerably less reliable.

machine learning Natural Language Processing large language models Peer review

RESEARCHarXiv CS.CL·5/8/2026

Generating Query-Focused Summarization Datasets from Query-Free Summarization Datasets

This paper proposes an evidence-based model to generate queries from query-free summarization datasets, addressing the challenge of finding suitable datasets for Query-Focused Summarization (QFS). Experimental results indicate that summaries generated using these evidence-based queries achieve competitive ROUGE scores, supporting their effectiveness for the QFS task.

query generation Natural Language Processing datasets summarization

RESEARCHarXiv CS.CL·4/24/2026

Machine learning and digital pragmatics: Which word category influences emoji use most?

This study employs Machine Learning, specifically the MARBERT model, to predict emoji use in Arabic tweets collected from X.com. The model achieved an overall accuracy of 0.75, indicating promising results while highlighting a need for further model improvement.

Emoji Prediction Social Media Analysis Arabic Language machine learning

RESEARCHarXiv CS.CL·5/8/2026

AdaGATE: Adaptive Gap-Aware Token-Efficient Evidence Assembly for Multi-Hop Retrieval-Augmented Generation

AdaGATE is a training-free evidence controller for multi-hop Retrieval-Augmented Generation (RAG) designed to address noisy or redundant retrieved evidence in limited contexts. It frames evidence selection as a token-constrained repair problem, combining entity-centric gap tracking and targeted micro-query generation to balance coverage, corroboration, and novelty.

Retrieval Augmented Generation AI models Multi-hop RAG Evidence Selection

RESEARCHarXiv CS.CL·4/20/2026

Applied Explainability for Large Language Models: A Comparative Study

This paper presents a comparative study of three explainability techniques (Integrated Gradients, Attention Rollout, and SHAP) on a fine-tuned DistilBERT model for sentiment classification. The study concludes that gradient-based attribution provides more stable and intuitive explanations, while attention-based methods are computationally efficient but less aligned with prediction-relevant features.

Comparative Study Natural Language Processing Explainable AI large language models

RESEARCHarXiv CS.CL·4/24/2026

Hierarchical Policy Optimization for Simultaneous Translation of Unbounded Speech

This paper introduces Hierarchical Policy Optimization (HPO) for Simultaneous Speech Translation (SST) using LLMs, addressing challenges like high computational cost and imperfect supervised fine-tuning data. HPO employs a hierarchical reward to balance translation quality and latency, demonstrating substantial improvements in COMET and MetricX scores.

LLMs machine learning Natural Language Processing speech-translation

RESEARCHarXiv CS.CL·4/21/2026

Cross-Family Speculative Decoding for Polish Language Models on Apple~Silicon: An Empirical Evaluation of Bielik~11B with UAG-Extended MLX-LM

This research evaluates cross-family speculative decoding for Polish LLMs on Apple Silicon, extending the MLX-LM framework with Universal Assisted Generation (UAG) for cross-tokenizer compatibility. Experiments show that context-aware token translation significantly improves acceptance rates for Bielik 11B on Polish language datasets.

apple-silicon Natural Language Processing Inference Optimization Speculative Decoding

ARTICLEDEV.to AI·4/16/2026

From Mumbles to Memos: Teaching AI to Understand Technician Voice Notes and Jargon

This content discusses how local HVAC or plumbing business owners waste time manually deciphering technician voice notes full of jargon. It proposes using AI to automate this by training it to extract specific, structured data from unstructured speech, overcoming this business bottleneck.

Natural Language Processing Small business AI automation

RESEARCHarXiv CS.CL·4/21/2026

CFMS: Towards Explainable and Fine-Grained Chinese Multimodal Sarcasm Detection Benchmark

CFMS introduces the first fine-grained Chinese multimodal sarcasm detection benchmark, comprising 2,796 image-text pairs with triple-level annotations. This dataset aims to improve AI's fine-grained semantic understanding and metaphoric reasoning, addressing limitations in existing benchmarks.

Dataset multimodal AI Natural Language Processing benchmark

RESEARCHarXiv CS.LG·4/24/2026

Absorber LLM: Harnessing Causal Synchronization for Test-Time Training

Transformers struggle with high computational costs and memory consumption for long sequences, while alternatives lose long-tail dependencies. Absorber LLM proposes a self-supervised causal synchronization to absorb historical contexts into parameters, ensuring a contextless model matches the original full-context one on future generations.

AI architecture Natural Language Processing Machine Learning Optimization large language models

RESEARCHarXiv CS.CL·4/21/2026

LiFT: Does Instruction Fine-Tuning Improve In-Context Learning for Longitudinal Modelling by Large Language Models?

LiFT is a new instruction fine-tuning framework designed to improve in-context learning for large language models on longitudinal NLP tasks, which require reasoning over temporally ordered text. It uses a curriculum that progressively increases temporal difficulty, incorporating few-shot structure and temporal conditioning, consistently outperforming base models across various datasets and parameter sizes.

LLMs temporal reasoning Natural Language Processing in-context learning

RESEARCHarXiv CS.CL·26d ago

Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation

This paper introduces Derivation Prompting, a novel prompting technique for the Retrieval-Augmented Generation (RAG) framework. The method aims to reduce hallucinations and erroneous reasoning in Large Language Models (LLMs) by systematically applying predefined rules to derive conclusions. A case study demonstrated a significant reduction in unacceptable answers compared to traditional RAG methods.

LLMs RAG Prompting Natural Language Processing

RESEARCHarXiv CS.CL·5/7/2026

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

This paper details participation in SemEval-2026 Task 13, focusing on lightweight detection of LLM-generated code using stylometric signals. The approach employs ratio-based features, parsing engines, and language classifiers, proving computationally efficient with near-instant inference time.

security machine learning Natural Language Processing Code Analysis

RESEARCHarXiv CS.CL·5/11/2026

Can LLMs Take Retrieved Information with a Grain of Salt?

This paper evaluates the ability of large language models (LLMs) to adapt their responses to the certainty of retrieved information, revealing systematic limitations. It proposes an interaction strategy combining prior reminders, certainty recalibration, and context simplification to enhance LLM reliability. This approach reduces obedience errors by 25% without modifying model weights.

LLMs context certainty Natural Language Processing AI reliability

RESEARCHarXiv CS.CL·22d ago

Exploring Lightweight Large Language Models for Court View Generation

The research explores the capabilities of lightweight Large Language Models (LLMs) in Criminal Court View Generation (CVG) and their impact on charge prediction within Legal AI. It systematically investigates architectural differences, model size, and comparison with Deep Neural Networks, introducing the CVGEvalKit framework for evaluation.

Legal AI research Court View Generation Natural Language Processing

RESEARCHarXiv CS.CL·5/11/2026

MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media

MultiSoc-4D is a new Bengali social media dataset benchmark designed to diagnose LLM behavior in closed-set annotation. The research identifies "instruction-induced label collapse," a phenomenon where LLMs systematically prefer fallback labels, leading to under-detection of minority categories.

LLMs Natural Language Processing Data Annotation Benchmarks

RESEARCHarXiv CS.CL·22d ago

A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Research

This research introduces a scalable computational approach to measure manner and result verbs, a crucial distinction for developmental language studies. It leverages large language models for sentence annotations and trains a RoBERTa-based classifier, demonstrating promising performance on evaluation datasets.

Language Acquisition machine learning Natural Language Processing linguistics