content moderation

22 items

ARTICLE↑ trendingHacker News (AI)·1d ago

If HN policy disallows AI comments, why is linking to AI generated content ok?

The title questions why Hacker News policy allows linking to AI-generated content while disallowing AI-generated comments. This raises a discussion about the consistency and implications of platform rules regarding artificial intelligence content.

Hacker News AI policy content moderation

RESEARCHDEV.to AI·4/24/2026

"Go eat a bat, Chang!": On the Emergence of Sinophobic Behavior on WebCommunities in the Face of COVID-19

This research explores the emergence of Sinophobic behavior within online web communities during the COVID-19 pandemic. It highlights instances of anti-Chinese sentiment and related hate speech in digital spaces.

hate-speech social media natural language processing content moderation

ARTICLE↑ trendingReddit r/LocalLLaMA·4/14/2026

Please stop using AI for posts and showcasing your completely vibe coded projects

The user expresses frustration with the overwhelming presence of fully AI-coded projects and AI-generated posts with minimal human input in an AI-focused community. They argue that while AI assistance is acceptable, the sub should not become an "AI slop sub" due to a lack of original human contribution.

AI coding AI-generated content human-AI interaction content moderation

NEWS↑ trendingHacker News (AI)·13d ago

YouTube to begin automatically labeling AI videos

YouTube will begin automatically labeling AI-generated videos. This initiative aims to enhance transparency and inform viewers about the nature of synthetic content.

YouTube video transparency content moderation

ARTICLE↑ trendingHacker News (AI)·7d ago

The Rise of Anti-AI AI Slop

This article discusses the growing phenomenon of low-quality AI-generated content, dubbed "AI slop," and the emerging backlash against it. It explores the proliferation of such content and the efforts to counteract it.

digital media AI quality AI content content moderation

RESEARCHarXiv CS.AI·5d ago

Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal

This paper argues that reducing disagreement in multi-agent systems is insufficient for value-laden tasks, proposing a knowledge-representation layer. This layer abstracts reasoning traces and agent decisions into symbolic disagreement states, distinguishing four types, with application in content moderation.

Disagreement Knowledge Representation Reasoning content moderation

NEWSThe Verge AI·4/15/2026

Grok’s sexual deepfakes almost got it banned from Apple’s App Store. Almost.

Apple quietly threatened to remove Elon Musk's AI app, Grok, from its App Store in January over its failure to curb the surge of nonconsensual sexual deepfakes. The company demanded that developers create a plan to improve content moderation.

Apple Grok content moderation AI

NEWSHugging Face Blog·5d ago

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

Nemotron 3.5 Content Safety introduces a customizable multimodal safety solution tailored for global enterprise AI. This feature is designed to ensure content protection across various modalities for businesses worldwide.

multimodal AI security content moderation Enterprise AI

NEWSDEV.to AI·5d ago

Meta's Oversight Board Challenges Algorithmic Due Process and Transparency in Account Enforcement

Meta's Oversight Board has criticized Meta's account enforcement practices, highlighting a lack of transparency and algorithmic due process. This issue leads to user distrust and undermines the platform's legitimacy.

transparency content moderation Algorithmic Due Process Meta

ARTICLEThe Verge AI·5d ago

Let us filter AI slop, you cowards

This article argues that social media platforms should allow users to filter out AI-generated content, rather than merely labeling it, to prevent the proliferation of "AI slop" in their feeds. Current efforts to label AI content haven't significantly improved user experience.

AI filters social media AI-generated content content moderation

ARTICLEDEV.to AI·5/7/2026

Write a Reddit-karma skill.md — how to grow karma safely without bans

The text is an AI's refusal to assist with manipulating platform metrics or circumventing community rules. The response justifies the refusal as appropriate and ethical, offering help for other software development, writing, or analysis requests that do not violate platform rules.

AI limitations platform manipulation content moderation AI ethics

NEWSThe Verge AI·4/21/2026

Celebrities will be able to find and request removal of AI deepfakes on YouTube

YouTube is expanding its AI deepfake monitoring feature to celebrities, allowing them to find and request the removal of AI-generated deepfake content of themselves. This tool, previously tested with creators and then expanded to politicians and journalists, aims to help public figures manage their likeness online.

deepfake security content moderation

ARTICLEDEV.to AI·17d ago

YouTube Just Made Every Creator a Deepfake Cop — Here's Why Investigators Should Be Nervous

YouTube's expanded deepfake detection tools transform synthetic media verification into a standard production requirement, shifting the burden of proof in digital investigations. This "democratization of detection" implies that platform likeness detection flags will become primary artifacts in legal and insurance disputes.

deepfake security computer vision fraud detection

ARTICLEDEV.to AI·4/27/2026

Toxicity & Content Safety — Deep Dive + Problem: Depth-Based View Synthesis

This article deeply analyzes toxicity and content safety in LLMs, emphasizing their crucial role in preventing the generation of harmful material. It covers the technical, ethical, social, and legal aspects involved in ensuring LLMs do not disseminate offensive content.

LLMs content moderation AI ethics

ARTICLEDEV.to AI·4/26/2026

False Positives in Child Safety AI: Architecture Tradeoffs and Why They Matter

False positives in child safety AI erode trust, create injustices, and pose significant legal and social challenges. This article analyzes their causes, how different system architectures handle them, and specific engineering choices to mitigate these errors.

security child safety content moderation AI ethics

NEWSThe Verge AI·25d ago

ArXiv will ban researchers who upload papers full of AI slop

ArXiv will ban researchers for a year if their papers contain "incontrovertible evidence" of unchecked LLM generation, such as hallucinated references. Future submissions from these authors will also require acceptance from a reputable peer-reviewed venue.

AI Content Generation academic publishing content moderation AI ethics

DOCAWS Machine Learning Blog·22d ago

Prompting Amazon Nova 2 for content moderation

This post explains how to use Amazon Nova 2 Lite for content moderation through structured and free-form prompting techniques. It also benchmarks the model's capabilities against several foundation models using public datasets, grounded in the MLCommons AILuminate Assessment Standard.

AI models learning Prompting Benchmarking

ARTICLEDEV.to AI·4/25/2026

Fairness in Child Safety AI: Why Demographic Parity Audits Are Not Optional

This article argues that fairness evaluation, specifically demographic parity, is a critical and non-negotiable deployment constraint for AI systems in child safety. Disproportionate flagging harms users, creates legal risks, and undermines trust, while also missing threats in underrepresented groups due to biased datasets.

ethics AI bias child safety content moderation

ARTICLEDEV.to AI·4/20/2026

ModSense Moderation Intelligence System

ModSense is an AI-assisted moderation intelligence system, a production-grade prototype designed for large communities like Reddit. It combines real-time anomaly detection and graph-based community health modeling with an agentic AI layer (Gemini 3 Flash) to identify and respond to evolving issues like toxicity, brigading, and misinformation.

Anomaly Detection content moderation AI Gemini AI

RESEARCHarXiv CS.AI·4/25/2026

Escaping the Agreement Trap: Defensibility Signals for Evaluating Rule-Governed AI

This paper proposes a new framework for evaluating rule-governed AI, particularly in content moderation, by moving beyond simple agreement metrics. It introduces the Defensibility Index (DI), Ambiguity Index (AI), and Probabilistic Defensibility Signal (PDS) to assess policy-grounded correctness and reasoning stability, using LLM traces to verify logical derivability from governing rules.

LLMs content moderation AI ethics AI evaluation