RESEARCH27

ARMOR 2025: A Military-Aligned Benchmark for Evaluating Large Language Model Safety Beyond Civilian Contexts

arXiv CS.AI·May 4, 2026

ARMOR 2025 is a new military-aligned benchmark designed to evaluate the safety of large language models (LLMs) in defense applications, beyond civilian contexts. It addresses the gap in existing benchmarks by grounding evaluations in military doctrines like the Law of War, Rules of Engagement, and Joint Ethics Regulation.

ethics military AI Benchmarks AI safety LLM

Read original ↗