RESEARCH27
ARMOR 2025: A Military-Aligned Benchmark for Evaluating Large Language Model Safety Beyond Civilian Contexts
arXiv CS.AIΒ·May 4, 2026
ARMOR 2025 is a new military-aligned benchmark designed to evaluate the safety of large language models (LLMs) in defense applications, beyond civilian contexts. It addresses the gap in existing benchmarks by grounding evaluations in military doctrines like the Law of War, Rules of Engagement, and Joint Ethics Regulation.
Read original β