← heapsort-ai

machine learning

790 items

RESEARCHarXiv CS.LG·22d ago

AdaGraph: A Graph-Native Clustering Algorithm That Overcomes the Curse of Dimensionality and Enables Scientific Discovery

AdaGraph is a graph-native clustering algorithm from the Structure-Centric Machine Learning (SC-ML) paradigm, which fundamentally dissolves the curse of dimensionality by replacing geometry-centric computation with topology-based computation. Operating within kNN graph topology, it requires no a priori specification of cluster numbers, handles noise, and scales effectively.

27
RESEARCHarXiv CS.LG·5/11/2026

From Canopy to Collision: A Hybrid Predictive Framework for Identifying Risk Factors in Tree-Involved Traffic Crashes

This study develops a hybrid predictive framework using machine learning (CatBoost, SHAP) and logistic regression to identify and quantify risk factors contributing to injury severity in tree-involved traffic crashes. It analyzes CRSS data from 2020-2023 to understand high-energy impacts often resulting in fatal or severe injuries.

27
RESEARCHarXiv CS.LG·22d ago

Forecasting Medium-Horizon Alzheimer's Disease Progression: Residual Gap-Aware Transformers for 24-Month CDR-SB Change from ADNI Clinical and Biomarker Histories

This paper introduces a residual gap-aware transformer for forecasting 24-month Alzheimer's disease progression using ADNI clinical and biomarker histories. The research analyzes changes in CDR-SB scores, anchoring samples at mild cognitive impairment visits.

27
RESEARCHarXiv CS.LG·26d ago

A Unified Geometric Framework for Weighted Contrastive Learning

Contrastive learning aims to preserve relational structure in sample representations by reflecting a similarity graph. This paper interprets weighted InfoNCE objectives as Distance Geometry Problems, providing a unified geometric framework and exact characterizations of optimal embeddings, revealing how class imbalance affects inter-class similarities in SupCon.

27
RESEARCHarXiv CS.LG·29d ago

Statistical Inference and Quality Measures of KV Cache Quantisations Inspired by TurboQuant

This research analyzes three KV cache quantization schemes (KV, KQV, QKQV) and their impact on inner product variance, especially how QJL on K inflates it, amplified by softmax. Empirical findings highlight KQV's superior performance at a budget of n=4, an unconditional K-V asymmetry where QKQV is consistently worse than KQV in KL divergence, and budget-dependent crossovers for geometric K reconstruction.

27
RESEARCHarXiv CS.CL·22d ago

A Scalable Tool for Measuring Manner and Result Verbs in Developmental Language Research

This research introduces a scalable computational approach to measure manner and result verbs, a crucial distinction for developmental language studies. It leverages large language models for sentence annotations and trains a RoBERTa-based classifier, demonstrating promising performance on evaluation datasets.

27
RESEARCHarXiv CS.LG·5/7/2026

Lookahead Drifting Model

This paper proposes a "lookahead drifting model" for distribution mapping, which enhances image generation performance via one-step neural functional evaluation. The model computes a set of drifting terms sequentially at each training iteration, utilizing positive samples and model outputs to capture higher-order gradient information.

27
RESEARCHarXiv CS.LG·29d ago

Path-Based Gradient Boosting for Graph-Level Prediction

We propose PathBoost, a gradient tree boosting method for graph-level classification and regression, which learns discriminative path-based features directly from the input graph structure. This method introduces adaptations for binary classification, incorporates multiple node and edge attributes, and automatically selects anchor nodes, outperforming or matching graph neural networks and graph kernel approaches on several benchmark datasets.

27
RESEARCHarXiv CS.AI·5/11/2026

CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

This paper introduces Deployment-Time Learning (DTL) as a new stage for LLMs, allowing them to continually adapt from experience post-training without modifying core parameters. It presents CASCADE, a framework that uses an explicit, evolving episodic memory for LLM agents, formalizing experience reuse as a contextual bandit problem with no-regret guarantees.

27