Reproducibility

9 items

ARTICLE↑ trendingReddit r/MachineLearning·4/15/2026

Failure to Reproduce Modern Paper Claims [D]

A user attempted to reproduce 7 claims from modern papers, finding 4 to be irreproducible and 2 with active GitHub issues. This experience raises concerns about the current state of research, particularly regarding reproducibility.

AI research challenges academic integrity open science research quality

ARTICLEDEV.to AI·2d ago

AgentUnit: Shipping AI like Software

AgentUnit addresses the challenges of deploying and managing AI agents by introducing a packaging standard akin to software packages like rpm or deb. It provides discipline around identity, contract, governance, and reproducibility, transforming agents into auditable and production-ready units.

deployment Packaging Reproducibility Software engineering

RESEARCH↑ trendingReddit r/MachineLearning·5/5/2026

Struggling to reproduce paper results before improving them — stuck below reported accuracy [R]

A PhD student in AI/computer vision is struggling to reproduce the reported accuracy of a published paper, consistently achieving ~73% against the paper's ~77% baseline. Despite thorough checks and attempts to contact authors, the student is encountering a reproducibility gap that impedes further research.

research PhD student machine learning computer vision

ARTICLE↑ trendingReddit r/MachineLearning·4/27/2026

Submitting to top ML Conferences without Sharing code [D]

A researcher asks for feedback on whether to stop sharing code in ML conference submissions (e.g., NIPS, ICML) due to concerns about idea theft, suggesting publishing it only after acceptance. They note that while reviewers often expect code, some recent submissions without it haven't been penalized, and other reproducibility aspects could be emphasized.

research ethics academic publishing Reproducibility Intellectual Property

RESEARCHarXiv CS.AI·4/14/2026

Seven simple steps for log analysis in AI systems

This research proposes a standardized pipeline for log analysis in AI systems, addressing the current lack of a common approach. It offers a framework with concrete code examples using the Inspect Scout library, guiding researchers through steps for rigorous and reproducible analysis.

Model Evaluation Log Analysis Reproducibility AI Systems

RESEARCHDEV.to AI·5/7/2026

AI agent logs expose reproducibility gaps

AI agent logs reveal significant reproducibility gaps, where autonomous agents frequently fail even after initial successes, especially in web navigation tasks. Research, including the SWE-chat corpus, highlights that less than half of agent-produced code survives into user commits, exposing a critical discrepancy between benchmark scores and real-world reliability.

Software Development Reliability Reproducibility Benchmarks

RESEARCHarXiv CS.AI·4/27/2026

An Artifact-based Agent Framework for Adaptive and Reproducible Medical Image Processing

This research presents an artifact-based agent framework to enhance medical image processing, focusing on adaptability and reproducibility. It introduces a semantic layer and an artifact contract to enable structured workflow interrogation and goal-conditioned configuration based on dataset-specific conditions.

workflow automation machine learning Reproducibility Medical Imaging

RESEARCHarXiv CS.AI·4/27/2026

Read the Paper, Write the Code: Agentic Reproduction of Social-Science Results

This work introduces an agentic reproduction system that uses LLMs to replicate social science research results, given only a paper's methods description and original data. Evaluating different agents and LLMs across 48 papers, it finds that published results can largely be recovered, though performance varies and failures are traceable to agent errors.

scientific methods social science research LLM Agents Reproducibility

RESEARCHarXiv CS.AI·24d ago

GraphBit: A Graph-based Agentic Framework for Non-Linear Agent Orchestration

Agentic LLM frameworks often suffer from hallucinated routing and non-reproducible execution when relying on prompted orchestration. GraphBit introduces an engine-orchestrated framework that explicitly and deterministically defines workflows as a directed acyclic graph, ensuring reproducibility and auditability with a Rust-based engine and a three-tier memory architecture.

workflow automation Reproducibility LLM Frameworks Graph Orchestration