LLM

612 items

RESEARCHarXiv CS.CL·5/7/2026

FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

This paper details participation in SemEval-2026 Task 13, focusing on lightweight detection of LLM-generated code using stylometric signals. The approach employs ratio-based features, parsing engines, and language classifiers, proving computationally efficient with near-instant inference time.

security machine learning Natural Language Processing Code Analysis

RESEARCHarXiv CS.LG·26d ago

Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction

Collider-Bench is a new benchmark designed to evaluate LLM agents' ability to reproduce experimental analyses from the LHC using public data and software. Agents must apply physical reasoning and domain knowledge to overcome missing implementation details and generate predicted collision event yields.

particle physics benchmarking scientific reproduction AI agents

RESEARCHarXiv CS.AI·25d ago

From Descriptive to Prescriptive: Uncover the Social Value Alignment of LLM-based Agents

This paper proposes a novel value-based framework that employs GraphRAG to align LLM-based agents with human social values, addressing deficiencies in self-cognition and dilemma decision. The method exhibits significant performance gains on the DAILYDILEMMAS benchmark, providing a basis for the emergence of self-emotion in AI systems.

ethics GraphRAG Social Value AI agents

RESEARCHarXiv CS.CL·22d ago

Exploring Lightweight Large Language Models for Court View Generation

The research explores the capabilities of lightweight Large Language Models (LLMs) in Criminal Court View Generation (CVG) and their impact on charge prediction within Legal AI. It systematically investigates architectural differences, model size, and comparison with Deep Neural Networks, introducing the CVGEvalKit framework for evaluation.

Legal AI research Court View Generation Natural Language Processing

RESEARCHarXiv CS.AI·29d ago

Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction

This paper investigates strategies to improve multimodal LLM accuracy in extracting data from scientific charts. It demonstrates that a simple grid-based spatial priming method significantly outperforms semantic prompting techniques.

Data Extraction spatial priming chart analysis AI research

RESEARCHarXiv CS.CL·8d ago

CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards

This paper proposes CSRP, a three-stage framework for Chinese Grammatical Error Correction (CGEC) using Large Language Models (LLMs). CSRP addresses challenges of general-purpose models and metric optimization with continual pre-training, Chain-of-Thought SFT, and policy optimization with efficiency-aware rewards that penalize unnecessary edits, achieving state-of-the-art performance on the NACGEC benchmark.

reinforcement learning Grammar Correction Natural Language Processing AI research

RESEARCHarXiv CS.AI·18d ago

The Shape of Testimony: A Scalable Framework for Oral History Archive Comparison

This study presents a scalable computational framework for comparing oral history archives, specifically focusing on Holocaust survivor testimonies. By leveraging LLM-based analysis, discourse segmentation, and topic modeling, it quantifies the "structuredness" of testimonies. The research largely corroborates earlier distinctions while revealing significant overlaps between collections.

Archive Comparison Computational Analysis Oral History digital humanities

RESEARCHarXiv CS.CL·5/11/2026

IntentGrasp: A Comprehensive Benchmark for Intent Understanding

IntentGrasp is a new comprehensive benchmark for evaluating the intent understanding capability of Large Language Models, derived from 49 high-quality corpora. Extensive evaluations on 20 LLMs showed unsatisfactory performance, with scores below 60% on the All Set and 25% on the Gem Set.

evaluation benchmarking IntentGrasp intent understanding

RESEARCHarXiv CS.LG·5/11/2026

LKV: End-to-End Learning of Head-wise Budgets and Token Selection for LLM KV Cache Eviction

This paper introduces LKV (Learned KV Eviction), a novel approach to optimize Key-Value (KV) cache memory in Large Language Models (LLMs). LKV formulates KV compression as an end-to-end differentiable optimization problem, learning budgets and token selection to overcome limitations of heuristic methods.

deep learning Memory Optimization efficiency KV cache

RESEARCHarXiv CS.LG·23d ago

GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding

This paper introduces Group-Query Latent Attention (GQLA), a modification to Multi-head Latent Attention (MLA). GQLA exposes two algebraically equivalent decoding paths, allowing a single set of trained weights to adapt efficiently to different hardware platforms like H100 and H20 without retraining.

deep learning Attention Mechanism AI Efficiency hardware optimization

RESEARCHarXiv CS.AI·23d ago

SkillSmith: Compiling Agent Skills into Boundary-Guided Runtime Interfaces

SkillSmith is a novel compiler-runtime framework that optimizes skill execution in LLM-based agent systems. It reduces token usage and redundancy by compiling skill packages into minimal executable interfaces.

skill management efficiency compilers AI agents

RESEARCHarXiv CS.LG·21d ago

Theory-optimal Quantization Based on Flatness

This research models the relationship between quantization error and outliers in Large Language Models (LLMs) and introduces a new metric, Flatness, to quantify outlier distribution. Based on this, it derives a theoretical optimal solution and proposes Bidirectional Diagonal Quantization (BDQ) for post-training quantization.

deep learning machine learning quantization AI

RESEARCHarXiv CS.LG·28d ago

Rotation-Preserving Supervised Fine-Tuning

This paper introduces Rotation-Preserving Supervised Fine-Tuning (RPSFT) to improve out-of-domain generalization in large language models while mitigating the degradation caused by standard SFT. RPSFT penalizes changes in projected singular subspaces of pretrained weights, acting as an efficient proxy for Fisher-sensitive directions and outperforming standard SFT baselines.

neural networks research machine learning fine-tuning

RESEARCHarXiv CS.CL·27d ago

BoostTaxo: Zero-Shot Taxonomy Induction via Boosting-Style Agentic Reasoning and Constraint-Aware Calibration

BoostTaxo introduces a novel boosting-style LLM framework designed for zero-shot taxonomy induction, aiming to overcome limitations in generalization and efficiency of existing methods. It refines taxonomy construction through a coarse-to-fine parent identification process, leveraging retrieval-augmented definition refinement and hybrid candidate selection.

Taxonomy induction Semantic hierarchies AI research LLM

RESEARCHarXiv CS.AI·21d ago

Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance

This position paper advocates for developing systematic methodologies to generate synthetic sequences, termed 'data probes,' to fundamentally understand how data characteristics affect LLM performance across various stages. The aim is to move beyond current compute-intensive empirical approaches by providing a principled way to comprehend model behavior.

research machine learning data LLM

RESEARCHarXiv CS.AI·21d ago

Embedding by Elicitation: Dynamic Representations for Bayesian Optimization of System Prompts

This paper introduces ReElicit, a Bayesian optimization framework based on "embedding by elicitation" for tuning system prompts in AI. It leverages LLMs to elicit an interpretable feature space and a Gaussian process surrogate to select and refine prompts based on aggregate feedback.

Bayesian Optimization Optimization System prompts machine learning

RESEARCHarXiv CS.AI·12d ago

Frontier LLM-based agents can overcome the ontology curation bottleneck for natural phenotypes

Frontier LLM-based agents demonstrate potential in overcoming the ontology curation bottleneck for natural phenotypes, a labor-intensive process reliant on human experts. This could significantly scale the annotation of free-text phenotype descriptions to ontology terms, essential for comparative morphological data integration.

Phenotype Annotation NLP Research Methods LLM

RESEARCHarXiv CS.AI·12d ago

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

This research evaluates LLM-generated reviews for scientific papers from both author and reviewer perspectives. It identifies limited alignment between LLM and human reviews and explores how authors can effectively "game" LLM reviews to improve submissions.

scientific review human-AI interaction AI evaluation LLM

RESEARCHarXiv CS.LG·12d ago

Representation Signatures and Risk-Feedback Alignment in LLM Trading Agents

This research investigates the behavioral alignment and representation dynamics of large language model (LLM) agents in financial decision environments. Using TradeArena, measurable pre-failure signatures were found, including planning embeddings drifting and fused plan-risk representations separating before drawdowns, indicating effective-rank contraction.

Behavioral Alignment Financial AI Trading Agents risk management

RESEARCHarXiv CS.LG·15d ago

Truthful Online Preference Aggregation for LLM Fine-Tuning in Mobile Crowdsourcing

This paper investigates truthful online preference aggregation for fine-tuning Large Language Models (LLMs) in mobile crowdsourcing. It proposes a novel online weighted aggregation mechanism to address strategic misreporting by workers, modeling the process as a dynamic Bayesian game. The goal is to overcome existing approaches that fail to identify the most accurate worker and result in linear regret.

Preference Aggregation machine learning game theory Crowdsourcing