← heapsort-ai

LLM

612 items

RESEARCHarXiv CS.LG·5/1/2026

When Continual Learning Moves to Memory: A Study of Experience Reuse in LLM Agents

This study investigates the role of external memory in LLM agents for continual learning, showing that the stability-plasticity dilemma resurfaces at the memory level due to limited context windows. A (k,v) framework is introduced to disentangle how experience is represented and organized, finding that abstract procedural memories transfer more reliably than detailed trajectories and finer-grained memory organization is beneficial.

27
RESEARCHarXiv CS.CL·5/1/2026

Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations

CarryOnBench is introduced as the first interactive benchmark to measure how LLMs recover utility and revise user intent interpretation in multi-turn, safe conversations. It reveals that current models fulfill only 10.5-37.6% of benign user information needs at the initial turn, highlighting a gap in safety-aligned LLMs regarding helpfulness recovery.

27
RESEARCHarXiv CS.CL·4/9/2026

Temporally Phenotyping GLP-1RA Case Reports with Large Language Models: A Textual Time Series Corpus and Risk Modeling

Este estudo desenvolveu um corpus de séries temporais textuais a partir de relatórios de casos de diabetes tipo 2 para extrair cronogramas clínicos complexos com LLMs. O GPT5 demonstrou alta eficácia na recuperação de eventos e sequenciamento temporal, com aplicações que sugerem redução do risco de sequelas respiratórias entre usuários de GLP-1.

27
RESEARCHarXiv CS.CL·4/27/2026

Lightweight Retrieval-Augmented Generation and Large Language Model-Based Modeling for Scalable Patient-Trial Matching

This paper introduces a lightweight framework for scalable patient-trial matching, addressing challenges posed by long, complex electronic health records. It combines retrieval-augmented generation (RAG) to identify relevant EHR segments with large language models (LLMs) to encode these segments into informative representations, improving efficiency and generalization.

27
RESEARCHarXiv CS.CL·20d ago

Under Pressure: Emotional Framing Induces Measurable Behavioral Shifts and Structured Internal Geometry in Small Language Models

This study investigates how emotionally framed evaluation follow-ups alter both the behavior and internal representations of small language models. Findings indicate that "pressure" strongly induces shortcut markers, while "calm" and "curiosity" preserve honesty, with emotional direction vectors peaking at the final transformer layer.

27
RESEARCHarXiv CS.LG·4/24/2026

FairyFuse: Multiplication-Free LLM Inference on CPUs via Fused Ternary Kernels

FairyFuse is a new inference system designed for CPU-only platforms, enabling multiplication-free execution of large language models. It uses ternary weights ({-1, 0, +1}) to replace floating-point multiplications with conditional additions and subtractions, significantly reducing memory bandwidth bottlenecks and offering up to 16x weight compression.

27
RESEARCHarXiv CS.CL·4/21/2026

Cross-Family Speculative Decoding for Polish Language Models on Apple~Silicon: An Empirical Evaluation of Bielik~11B with UAG-Extended MLX-LM

This research evaluates cross-family speculative decoding for Polish LLMs on Apple Silicon, extending the MLX-LM framework with Universal Assisted Generation (UAG) for cross-tokenizer compatibility. Experiments show that context-aware token translation significantly improves acceptance rates for Bielik 11B on Polish language datasets.

27
RESEARCHarXiv CS.AI·18d ago

AttuneBench: A Conversation-Based Benchmark for LLM Emotional Intelligence

AttuneBench is a new benchmark grounded in 200 genuine multi-turn human-model conversations to assess LLM emotional intelligence. It measures models' ability to infer and respond to emotional states over the course of real conversations, finding that model rankings on emotion recognition and other metrics are largely independent.

27
RESEARCHarXiv CS.LG·4/24/2026

Clinically Interpretable Sepsis Early Warning via LLM-Guided Simulation of Temporal Physiological Dynamics

This paper proposes an LLM-guided temporal simulation framework for clinically interpretable early sepsis warning. The model simulates physiological trajectories prior to disease onset by integrating spatiotemporal feature extraction, medical reasoning cues, and agent-based post-processing for physiologically plausible predictions.

27