← heapsort-ai

Reproducibility

9 items

RESEARCH↑ trendingReddit r/MachineLearning·5/5/2026

Struggling to reproduce paper results before improving them — stuck below reported accuracy [R]

A PhD student in AI/computer vision is struggling to reproduce the reported accuracy of a published paper, consistently achieving ~73% against the paper's ~77% baseline. Despite thorough checks and attempts to contact authors, the student is encountering a reproducibility gap that impedes further research.

36
ARTICLE↑ trendingReddit r/MachineLearning·4/27/2026

Submitting to top ML Conferences without Sharing code [D]

A researcher asks for feedback on whether to stop sharing code in ML conference submissions (e.g., NIPS, ICML) due to concerns about idea theft, suggesting publishing it only after acceptance. They note that while reviewers often expect code, some recent submissions without it haven't been penalized, and other reproducibility aspects could be emphasized.

35
RESEARCHDEV.to AI·5/7/2026

AI agent logs expose reproducibility gaps

AI agent logs reveal significant reproducibility gaps, where autonomous agents frequently fail even after initial successes, especially in web navigation tasks. Research, including the SWE-chat corpus, highlights that less than half of agent-produced code survives into user commits, exposing a critical discrepancy between benchmark scores and real-world reliability.

27
RESEARCHarXiv CS.AI·4/27/2026

Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results

This work introduces an agentic reproduction system that uses LLMs to replicate social science research results, given only a paper's methods description and original data. Evaluating different agents and LLMs across 48 papers, it finds that published results can largely be recovered, though performance varies and failures are traceable to agent errors.

27
RESEARCHarXiv CS.AI·24d ago

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

Agentic LLM frameworks often suffer from hallucinated routing and non-reproducible execution when relying on prompted orchestration. GraphBit introduces an engine-orchestrated framework that explicitly and deterministically defines workflows as a directed acyclic graph, ensuring reproducibility and auditability with a Rust-based engine and a three-tier memory architecture.

27