Verification

12 items

RESEARCHarXiv CS.LG·1d ago

When Should an AI Scientist Stop? Verifiable Experiment Steering and Refusal for Autonomous Discovery

This paper introduces CARTOGRAPH, a verification layer for AI scientists that integrates experiment steering, ambiguity closure, and inadequacy detection. It demonstrates superior performance over raw projection methods and successfully identifies and revokes out-of-library pharmacokinetic mechanisms, enhancing autonomous discovery.

experiment steering machine learning autonomous discovery Verification

ARTICLE↑ trendingHacker News (AI)·14d ago

Agile V: Turning AI Agents into Verifiable Engineering Systems

Agile V proposes a framework to transform AI agents into robust, verifiable engineering systems. It aims to apply traditional software engineering principles to AI development, ensuring reliability and accountability.

Reliability AI Systems Verification Software Engineering

RESEARCHarXiv CS.AI·6d ago

Toward Pre-Deployment Assurance for Enterprise AI Agents: Ontology-Grounded Simulation and Trust Certification

This paper proposes an ontology-grounded verification framework for enterprise AI agents, addressing the critical gap in pre-deployment assurance. The framework includes an Agent Operational Envelope, an ontology-to-scenario generation pipeline, and a Trust Certificate with machine-verifiable attestations for deployment verdicts.

security Trust Verification AI agents

RESEARCHarXiv CS.LG·4/22/2026

The Cost of Relaxation: Evaluating the Error in Convex Neural Network Verification

This paper evaluates the worst-case divergence between original neural networks and their convex relaxations, which are used in verification systems to improve performance at the cost of soundness. The study provides analytical upper and lower bounds for the error, demonstrating it grows exponentially with network depth and linearly with the input's radius.

robustness neural networks mathematical analysis Verification

DOCDEV.to AI·5/1/2026

Stop Your RAG Pipeline From Hallucinating: A 15-Line Fix published

This article presents a 15-line fix to combat hallucinations in RAG pipelines, even when responses appear grounded in retrieved documents. It details a 'retrieve → generate → verify' pattern to catch errors before the AI agent acts.

hallucination AI quality RAG Verification

ARTICLEDEV.to AI·4/20/2026

agent-consistency – a Python consistency layer for multi-agent workflows

The author highlights common issues in AI agent workflows, such as stale states, incomplete handoffs, and unverified task completion. They introduce `agent-consistency`, an MIT-licensed Python package, to address these problems and seek feedback on its approach.

workflow automation consistency Verification Python

RESEARCHarXiv CS.LG·4/27/2026

Kernel Contracts: A Specification Language for ML Kernel Correctness Across Heterogeneous Silicon

This research proposes a specification language for ML kernel contracts to formally define their expected behavior across heterogeneous silicon platforms. It introduces an eight-part contract structure and twelve contract classes to arbitrate disputes arising from precision, ordering, or other failure modes.

machine learning Verification Software Engineering

RESEARCHarXiv CS.LG·28d ago

Vertex-Softmax: Tight Transformer Verification via Exact Softmax Optimization

The paper introduces Vertex-Softmax, a novel method for certified verification of transformer attention by exactly optimizing the softmax function. It proves that the exact optimum is attained at a vertex of the constraint box, yielding a tighter sound bound.

Optimization machine learning Verification AI

RESEARCHarXiv CS.AI·27d ago

Think Twice, Act Once: Verifier-Guided Action Selection For Embodied Agents

This paper proposes Verifier-Guided Action Selection (VegAS), a test-time framework to enhance the robustness of MLLM-based embodied agents. It uses a generative verifier to identify the most reliable action choice from an ensemble of candidates.

robustness MLLM embodied agents Verification

ARTICLEDEV.to AI·9d ago

Stop Building CI Pipelines For Humans. Your AI Agents Need A Harness.

The article argues that traditional CI pipelines, designed for human review, are inadequate for AI agents due to their lack of intuition for potential issues. It proposes a "verification harness" for AI agents, featuring deterministic infrastructure and ephemeral preview environments, to safely integrate them into development workflows.

CI/CD DevOps Verification Software Engineering

RESEARCHarXiv CS.LG·5/6/2026

Delay, Plateau, or Collapse: Evaluating the Impact of Systematic Verification Error on RLVR

This paper examines the impact of systematic verification errors on Reinforcement Learning with Verifiable Rewards (RLVR), a method used to enhance the reasoning capabilities of large language models. Unlike prior analyses that treated errors as random, this work shows that systematic errors can lead models to learn unwanted behaviors. Experiments on arithmetic tasks reveal that systematic false negatives have similar effects to random noise, while systematic false positives can have more complex impacts.

reinforcement learning AI Errors Verification large language models

DOCDEV.to AI·16d ago

Top 5 Best Sites To Buy Google Voice Accounts In Days

The content outlines methods for acquiring Google Voice accounts, including official signup and Google Workspace integration. It discusses the importance of verified accounts and provides a step-by-step guide for creating accounts.

Google Workspace Verification Google Voice Account Acquisition