← heapsort-ai

research

78 items

RESEARCHarXiv CS.LG·5/1/2026

When Continual Learning Moves to Memory: A Study of Experience Reuse in LLM Agents

This study investigates the role of external memory in LLM agents for continual learning, showing that the stability-plasticity dilemma resurfaces at the memory level due to limited context windows. A (k,v) framework is introduced to disentangle how experience is represented and organized, finding that abstract procedural memories transfer more reliably than detailed trajectories and finer-grained memory organization is beneficial.

27
RESEARCHarXiv CS.CL·22d ago

Exploring Lightweight Large Language Models for Court View Generation

The research explores the capabilities of lightweight Large Language Models (LLMs) in Criminal Court View Generation (CVG) and their impact on charge prediction within Legal AI. It systematically investigates architectural differences, model size, and comparison with Deep Neural Networks, introducing the CVGEvalKit framework for evaluation.

27
RESEARCHarXiv CS.AI·18d ago

AOP-Wiki EMOD 3.0: Data Model Expansions and Content Evaluation Framework for Using Agentic AI to Improve Integration between AOPs and New Approach Methodologies (NAMs)

This paper introduces AOP-Wiki EMOD 3.0, focusing on data model expansions and a content evaluation framework. It leverages agentic AI to improve the integration between Adverse Outcome Pathways (AOPs) and New Approach Methodologies (NAMs), addressing current limitations in the AOP-Wiki's infrastructure to support continued growth.

27
RESEARCHarXiv CS.AI·23d ago

NOVA: Fundamental Limits of Knowledge Discovery Through AI

The NOVA framework models AI knowledge discovery as an adaptive sampling process, identifying conditions for genuine knowledge accumulation and common failure modes like contamination and forgetting. It highlights a "contamination trap" where invalid artifacts can accumulate faster than genuine discoveries as easy-to-find knowledge is exhausted, even with small false-positive rates.

27
RESEARCHarXiv CS.LG·28d ago

Rotation-Preserving Supervised Fine-Tuning

This paper introduces Rotation-Preserving Supervised Fine-Tuning (RPSFT) to improve out-of-domain generalization in large language models while mitigating the degradation caused by standard SFT. RPSFT penalizes changes in projected singular subspaces of pretrained weights, acting as an efficient proxy for Fisher-sensitive directions and outperforming standard SFT baselines.

27
RESEARCHarXiv CS.AI·21d ago

Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance

This position paper advocates for developing systematic methodologies to generate synthetic sequences, termed 'data probes,' to fundamentally understand how data characteristics affect LLM performance across various stages. The aim is to move beyond current compute-intensive empirical approaches by providing a principled way to comprehend model behavior.

27
RESEARCHarXiv CS.LG·15d ago

LLM-AutoSciLab: Closed-Loop Scientific Discovery via Active Experimentation with LLMs

LLM-AutoSciLab proposes a closed-loop framework for scientific discovery, moving beyond static inference by actively coupling hypothesis generation with experiment selection and mechanism refinement. It iteratively suggests plausible hypotheses, selects informative experiments to distinguish or refine them, and updates its state using the resulting evidence.

27
RESEARCHarXiv CS.AI·14d ago

Experiments in Agentic AI for Science

This paper introduces two novel frameworks for developing autonomous, agentic AI in scientific workflows, leveraging a hybrid Local Body, Remote Brain architecture with LLM cloud backends. The systems, DeepTS/DeepCollector and DeepScribe, automate time-series dataset curation and scientific presentation analysis, demonstrating how agentic AI can overcome context and reasoning limitations.

27
ARTICLEDEV.to AI·14d ago

AI for science is becoming a builder workflow, not a lab demo

The next valuable shift in AI focuses on helping people conduct better investigations, evolving from answering questions to supporting research workflows. This is exemplified by Google's Gemini for Science, highlighting AI tools built around practical research processes. This model is valuable not only for scientists but for anyone who needs to turn messy information into defensible results, encouraging sharper questions and testing assumptions.

27
RESEARCHarXiv CS.CL·5/6/2026

Geometric Deviation as an Unsupervised Pre-Generation Reliability Signal: Probing LLM Representations for Answerability

This research explores using geometric deviation of LLM hidden states as a pre-generation signal to determine if a query is outside the model's knowledge, requiring no labeled failure data. Across various models and prompt forms, it finds that this signal effectively predicts unanswerable math prompts but not factual ones.

27