A case study of evaluating AI agents on a neuroscience data-to-discovery pipeline
This research empirically evaluates general-purpose AI coding agents on a neuroscience data-to-discovery pipeline, assessing their ability to automate complex scientific tasks. It finds agents can solve individual pipeline stages but struggle with scientific judgment in the absence of predefined iteration criteria.
