content moderation

23 items

RESEARCHarXiv CS.AI·4/25/2026

Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI

This paper proposes a new framework for evaluating rule-governed AI, particularly in content moderation, by moving beyond simple agreement metrics. It introduces the Defensibility Index (DI), Ambiguity Index (AI), and Probabilistic Defensibility Signal (PDS) to assess policy-grounded correctness and reasoning stability, using LLM traces to verify logical derivability from governing rules.

LLMs content moderation AI ethics AI evaluation

ARTICLEDEV.to AI·4/12/2026

We Built an NSFW Detection API That's 2x Cheaper Than AWS — Here's What We Learned

PixelAPI has launched an NSFW content detection API priced at $0.0005 per image, making it twice as cheap as AWS and Google solutions. The article highlights cost savings for platforms with high image volumes and the lessons learned in building the API.

NSFW Detection API Cost Optimization content moderation

NEWSThe Verge AI·4/27/2026

Canva apologizes after its AI tool replaces ‘Palestine’ in designs

Canva has apologized after one of its new AI tools, "Magic Layers," was caught replacing the word "Palestine" with "Ukraine" in user designs. The company states it has resolved the issue, which appeared to be specific to the word "Palestine," and is taking steps to prevent future occurrences.

AI bias security content moderation AI ethics