← heapsort-ai

data analysis

70 items

RESEARCHarXiv CS.AI·5/1/2026

Unsupervised Electrofacies Classification and Porosity Characterization in the Offshore Keta Basin Using Wireline Logs

This study applies an unsupervised machine learning workflow, specifically K-means clustering, for electrofacies analysis and porosity characterization in offshore basin wireline log data. The methodology identified four distinct electrofacies with moderate separation, providing a robust log-only approach for geological interpretation where core data is scarce.

27
RESEARCHarXiv CS.LG·5/4/2026

Learning physically grounded traffic accident reconstruction from public accident reports

This paper presents a method for traffic accident reconstruction from public reports and scene measurements, formulating it as a parameterized multimodal learning problem. Researchers created the CISS-REC dataset with 6,217 real-world cases and developed a framework that outperforms baselines in reconstruction fidelity, including accident point accuracy.

27
RESEARCHarXiv CS.LG·29d ago

From Canopy to Collision: A Hybrid Predictive Framework for Identifying Risk Factors in Tree-Involved Traffic Crashes

This study develops a hybrid predictive framework using machine learning (CatBoost, SHAP) and logistic regression to identify and quantify risk factors contributing to injury severity in tree-involved traffic crashes. It analyzes CRSS data from 2020-2023 to understand high-energy impacts often resulting in fatal or severe injuries.

27
RESEARCHarXiv CS.LG·8d ago

LongDS-Bench: On the Failure of Long-Horizon Agentic Data Analysis

This research introduces LongDS, a new benchmark for evaluating AI agents in long-horizon, multi-turn data analysis tasks, featuring 68 tasks from real-world Kaggle notebooks. It reveals that state-of-the-art models achieve only 48.45% accuracy, with performance significantly dropping in later turns, highlighting a critical failure in tracking evolving analytical context.

27
RESEARCHarXiv CS.AI·12d ago

Discovery Agents for Real-Time Analytics: Toward Proactive Insight Systems

This research proposes a multi-agent architecture for autonomous insight discovery in real-time data streams, addressing the limitations of reactive analytics systems. It employs a continuous loop of hypothesis generation, analytics compilation, validation, and visualization, leveraging technologies like Kafka, Flink, and large language models.

27