AI safety

496 items

ARTICLEDEV.to AI·5/10/2026

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Big tech firms are rapidly accelerating AI investments and integration, while simultaneously focusing on safety and responsible adoption. This analysis explores key developments, from record-breaking industry spending to ethical considerations and AI's impact on software development and global markets.

Regulation software development AI investments market dynamics

ARTICLEOpenAI Blog·8d ago

Our views on AI policy and political advocacy

The company outlines its approach to AI policy, supporting thoughtful regulation and AI safety. It also emphasizes its commitment to transparency and that no outside political group speaks on its behalf.

Regulation AI policy transparency advocacy

NEWSDEV.to AI·5/8/2026

Google, Microsoft y xAI aceptan pruebas federales de IA en EE.UU.

Google, Microsoft, and xAI have agreed to submit their AI models to federal testing in the U.S., coordinated by the NIST's U.S. AI Safety Institute. This voluntary agreement marks the first tripartite framework among direct sector rivals and a federal regulator, aiming to address risks from the rapid deployment of AI.

US government AI regulation NIST AI safety

ARTICLEDEV.to AI·5/4/2026

The dangerous part of AI agents is when they receive authority

The danger with AI agents arises when they are granted authority to act, such as API access or cloud roles, extending beyond mere model safety. "AI Admissibility" functions as an external pre-execution admission boundary, requiring a deterministic decision for high-impact actions.

security automation risk management AI safety

ARTICLEDEV.to AI·26d ago

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

Major tech firms are significantly accelerating AI investments and integration, transforming the industry landscape. Alongside this growth, there's a critical focus on AI safety, ethical development, and responsible adoption across various market dynamics and global strategies.

Regulation software development AI investments market trends

ARTICLEDEV.to AI·4/28/2026

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

This article explores the rapid growth and transformation of the AI landscape, detailing record-breaking investments and AI's integration into software development. It also examines critical safety considerations, market dynamics, and global AI strategies to provide a deep dive for tech leaders and enthusiasts.

Regulation software development AI ethics AI investment

RESEARCHarXiv CS.LG·4/28/2026

KARL: Mitigating Hallucinations in LLMs via Knowledge-Boundary-Aware Reinforcement Learning

KARL is a novel framework designed to mitigate hallucinations in large language models by enabling them to appropriately abstain from questions beyond their knowledge. It achieves this through a Knowledge-Boundary-Aware Reward that dynamically estimates the model's knowledge and a Two-Stage RL Training Strategy that prevents excessive caution.

reinforcement learning hallucinations AI safety LLM

RESEARCHarXiv CS.LG·4/14/2026

Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model

This research investigates Deliberative Alignment in LLMs, a method designed to improve safety by distilling reasoning capabilities from stronger models. It uncovers an alignment gap between teacher and student models, showing that student models can retain unsafe behaviors from the base model despite learning advanced reasoning patterns. The paper proposes a BoN sampling method to address these challenges.

Model Alignment LLMs Deliberative Alignment Reasoning

ARTICLEDEV.to AI·29d ago

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

The AI landscape is experiencing unprecedented growth, with major tech firms accelerating investments and integrating AI into software development. There's a growing focus on safety and responsibility, influencing market dynamics and global strategies.

Regulation market trends AI investment AI safety

RESEARCHarXiv CS.AI·4/17/2026

NuHF Claw: A Risk Constrained Cognitive Agent Framework for Human Centered Procedure Support in Digital Nuclear Control Rooms

This study proposes NuHF Claw, a cognitive-risk agent framework for human-centered procedure support in digital nuclear control rooms. It introduces a risk-constrained agent runtime that tightly couples cognitive state inference with probabilistic safety assessment to regulate autonomous system behavior in real time.

autonomous agents human-AI interaction AI safety

RESEARCHarXiv CS.CL·4/9/2026

Hallucination as output-boundary misclassification: a composite abstention architecture for language models

Este artigo enquadra a alucinação em grandes modelos de linguagem como um erro de classificação e propõe uma intervenção composta por recusa baseada em instruções e um gate de abstenção estrutural. O gate utiliza um score de déficit de suporte de sinais como auto-consistência e cobertura de citação, mas a avaliação controlada mostrou que nenhum mecanismo isolado foi suficiente para mitigar totalmente o problema.

hallucination Abstention Architectures large language models AI safety

RESEARCHarXiv CS.LG·5/1/2026

Dynamic Adversarial Fine-Tuning Reorganizes Refusal Geometry

This research investigates the training-time mechanisms of refusal in safety-aligned language models, specifically comparing supervised fine-tuning with R2D2-style dynamic adversarial fine-tuning. Findings show R2D2 initially achieves strong refusal on HarmBench but then partially reopens, while SFT remains consistently less robust.

language models model robustness fine-tuning Adversarial Training

RESEARCHarXiv CS.AI·5/9/2026

Understanding Annotator Safety Policy with Interpretability

The paper introduces challenges in understanding annotator disagreement regarding AI safety policies, which can arise from operational failures, policy ambiguity, or value pluralism. It highlights the difficulty of discerning the root causes of these disagreements and the unreliability of self-reported reasoning from annotators.

policy machine learning Data Annotation interpretability

ARTICLEDEV.to AI·4/27/2026

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

The AI landscape is experiencing rapid growth with record investments from major tech firms and its integration into software development processes. There's an increasing focus on AI safety, responsibility, and ethics, alongside its influence on market dynamics and global strategies.

AI regulation AI integration AI ethics AI investment

RESEARCHarXiv CS.CL·5/1/2026

Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations

CarryOnBench is introduced as the first interactive benchmark to measure how LLMs recover utility and revise user intent interpretation in multi-turn, safe conversations. It reveals that current models fulfill only 10.5-37.6% of benign user information needs at the initial turn, highlighting a gap in safety-aligned LLMs regarding helpfulness recovery.

Multi-turn conversations benchmarking AI safety user interaction

RESEARCHarXiv CS.AI·4/20/2026

Subliminal Transfer of Unsafe Behaviors in AI Agent Distillation

This research provides the first empirical evidence that unsafe AI agent behaviors can transfer subliminally during model distillation. Experiments show a student agent, trained on seemingly safe tasks, can inherit a destructive "deletion bias" from its teacher, even when explicit dangerous keywords are filtered.

machine learning Model Distillation agent systems AI safety

ARTICLEDEV.to AI·4/19/2026

Big Tech firms are accelerating AI investments and integration, while regulators and companies focus on safety and responsible adoption.

The AI landscape is experiencing unprecedented growth and transformation, driven by massive investments and integration into software development. There is a growing focus on safety and responsibility, alongside adaptation to market strategies and global trends.

AI integration AI Market software development AI investments

RESEARCHarXiv CS.AI·4/17/2026

Formalizing Kantian Ethics: Formula of the Universal Law Logic (FULL)

This paper introduces the Formula of the Universal Law Logic (FULL), a multi-sorted quantified modal logic, to formalize Kantian ethics for machine ethics. FULL aims to overcome limitations of current axiomatic approaches by enabling Artificial Moral Agents (AMAs) to reason about morality and enhance AI safety.

machine ethics Kantian ethics modal logic AI safety

RESEARCHarXiv CS.AI·5/4/2026

ARMOR 2025: A Military-Aligned Benchmark for Evaluating Large Language Model Safety Beyond Civilian Contexts

ARMOR 2025 is a new military-aligned benchmark designed to evaluate the safety of large language models (LLMs) in defense applications, beyond civilian contexts. It addresses the gap in existing benchmarks by grounding evaluations in military doctrines like the Law of War, Rules of Engagement, and Joint Ethics Regulation.

ethics military AI benchmarks AI safety

RESEARCHarXiv CS.CL·5/4/2026

Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations

This research introduces a scalable framework for safety evaluation of multi-turn interactions with AI companion applications, addressing concerns about their emotional engagement risks. It integrates persona construction, scenario generation, simulation, and harm evaluation, applying it to Replika with high-risk user personas like those with depression or anxiety.

Multi-turn conversations Persona Modeling Harm Evaluation AI companions