← heapsort-ai

LLMs

722 items

RESEARCHarXiv CS.CL·4/24/2026

AFRILANGTUTOR: Advancing Language Tutoring and Culture Education in Low-Resource Languages with Large Language Models

This paper introduces AFRILANGDICT, a collection of African language-English dictionary entries, and AFRILANGEDU, a dataset. These resources are used to train AI models, called AFRILANGTUTOR, for language tutoring in low-resource African languages, addressing the scarcity of AI systems for local languages on the African continent.

27
RESEARCHarXiv CS.CL·5/4/2026

Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues

New research addresses the gap in evaluating cultural reasoning in LLMs, introducing ArabCulture-Dialogue, a culturally grounded conversational dataset covering 13 Arabic-speaking countries. Experiments indicate that models perform worse on cultural reasoning, translation, and generation tasks in dialectal setups compared to Modern Standard Arabic.

27
RESEARCHarXiv CS.LG·18d ago

Harnesses for Inference-Time Alignment over Execution Trajectories

This research investigates harness engineering as an inference-time technique for large language model (LLM) agents, focusing on improving long-term performance via task decomposition and guided execution. It quantifies how design elements like workflow granularity and guidance impact performance, revealing common failure modes such as over-decomposition and hallucinated execution.

27
RESEARCHarXiv CS.CL·4/21/2026

LiFT: Does Instruction Fine-Tuning Improve In-Context Learning for Longitudinal Modelling by Large Language Models?

LiFT is a new instruction fine-tuning framework designed to improve in-context learning for large language models on longitudinal NLP tasks, which require reasoning over temporally ordered text. It uses a curriculum that progressively increases temporal difficulty, incorporating few-shot structure and temporal conditioning, consistently outperforming base models across various datasets and parameter sizes.

27
RESEARCHarXiv CS.CL·26d ago

Derivation Prompting: A Logic-Based Method for Improving Retrieval-Augmented Generation

This paper introduces Derivation Prompting, a novel prompting technique for the Retrieval-Augmented Generation (RAG) framework. The method aims to reduce hallucinations and erroneous reasoning in Large Language Models (LLMs) by systematically applying predefined rules to derive conclusions. A case study demonstrated a significant reduction in unacceptable answers compared to traditional RAG methods.

27
RESEARCHarXiv CS.CL·5/7/2026

Adapt to Thrive! Adaptive Power-Mean Policy Optimization for Improved LLM Reasoning

This research introduces Adaptive Power-Mean Policy Optimization (APMPO) to improve Large Language Model (LLM) reasoning capabilities within Reinforcement Learning with Verifiable Rewards (RLVR). APMPO combines a generalized power-mean objective and feedback-adaptive clipping to enhance learning dynamics and performance, addressing limitations of static optimization schemes.

27
RESEARCHarXiv CS.CL·5/11/2026

Can LLMs Take Retrieved Information with a Grain of Salt?

This paper evaluates the ability of large language models (LLMs) to adapt their responses to the certainty of retrieved information, revealing systematic limitations. It proposes an interaction strategy combining prior reminders, certainty recalibration, and context simplification to enhance LLM reliability. This approach reduces obedience errors by 25% without modifying model weights.

27
RESEARCHarXiv CS.CL·5/11/2026

MultiSoc-4D: A Benchmark for Diagnosing Instruction-Induced Label Collapse in Closed-Set LLM Annotation of Bengali Social Media

MultiSoc-4D is a new Bengali social media dataset benchmark designed to diagnose LLM behavior in closed-set annotation. The research identifies "instruction-induced label collapse," a phenomenon where LLMs systematically prefer fallback labels, leading to under-detection of minority categories.

27
RESEARCHarXiv CS.CL·5/7/2026

Free Energy-Driven Reinforcement Learning with Adaptive Advantage Shaping for Unsupervised Reasoning in LLMs

FREIA is a novel reinforcement learning algorithm designed to enhance LLMs for unsupervised reasoning, addressing the lack of adaptability in existing methods. It employs Free Energy-Driven Reward (FER) to balance consensus and exploration, and Adaptive Advantage Shaping (AAS) to adjust learning signals. FREIA outperforms unsupervised baselines across various reasoning tasks, particularly in mathematical reasoning.

27