Automated Content Moderation is an effective way to reduce brand risks by pre-screening content before it goes live. This process also saves time for moderators and ensures that community guidelines are applied consistently.
The sheer quantity of user-generated content (UGC) that human moderators must sift through daily takes a serious toll on their mental health. Constant exposure to graphically detailed images promoting hate, violence and otherworldly forms of harm is draining and stressful.
Machine Learning
Machine learning (ML) is a branch of data science that uses computer algorithms to create models that identify patterns. These models can be used to automate content moderation for text, images, and video.
These tools are trained on labeled data, including web pages, social media postings, examples of abusive speech in different languages and communities, and more. They must also be designed and implemented in compliance with international human rights laws.
Many platforms use this type of software to identify and flag toxic users, spam, and other types of inappropriate content. This allows the platform to send users warnings, or even ban them from the site.
The goal is to reduce the amount of content that human moderators need to review, so they can focus on more challenging cases. These tools often improve through active learning, which involves tuning the AI model using customer feedback, moderator actions (e.g., de-flagging a piece of text that was incorrectly flagged as profanity), and updates to language models with emerging slang and connotations.
Amazon Rekognition
Amazon Rekognition identifies content in images and videos that is inappropriate, unwanted, or dangerous. It can be used to automatically detect a variety of categories including violence, nudity, offensive language, alcohol or drug use, and other factors. Its default moderation confidence level is 0.5, which means that any image classified as containing inappropriate or unsafe material will be rejected unless overridden.
Each detected category comes with a list of labels and a confidence score that developers can use to filter unsuitable images depending on their business needs. For example, an image can be categorized as “Explicit Nudity” or “Graphic Female Nudity.”
Rekognition also recognizes celebrities within supplied images and stored videos, and it provides tracking and face information for each celebrity throughout the entire video. It can also identify text and analyze activities in images and videos. Its results include a timestamp and can be retrieved using the console, API, or CLI. It supports IAM to securely control access to Rekognition capabilities.
Spectrum Labs’ Data Vault
As platforms expand their reach globally and attract users from diverse languages, detecting toxic behavior is increasingly difficult. This is especially true when terms that are harmless in one language can be offensive in another, and translation apps don’t always capture implied meanings.
Spectrum’s patented technology recognizes distinct toxic behaviors like hate speech, violent extremism and child sexual abuse material (CSAM) in any language. Its workflow automation allows for quick action on harmful content to help Trust & Safety teams keep their communities safe and welcoming.
Spectrum Labs’ Behavioral AI can be implemented in real-time or asynchronously through the company’s well-documented API and webhooks. Once the API receives a JSON request, it analyzes the user-generated content to detect behavior and arrives at a boolean (i.e., “true” or “false”) determination for each behavior. Then, based on the scenario and parameters set, the API can take automated action, or flag the content for review by a human moderator.
Spectrum Guardian
With a focus on identifying positive user behavior, Spectrum Labs helps platforms remove toxic users and content at scale while building better communities. Its contextual AI platform keeps billions safe, driving engagement and retention through healthier user experiences.
With Guardian, content moderation is prioritized and ordered according to your community guidelines. You can also track and visualize trends like hate speech incidents so you can proactively take action to protect your community.
While it’s not a direct replacement for human moderators, ML-based systems may allow for more rapid and larger-scale content moderation than manual reviews (Zuckerberg 2017a). They may also enable more sophisticated forms of filtering and censorship (Cambridge Consultants 2019; Bloch-Wehba 2020; Gorwa et al. 2020), including the use of fingerprinting or hash matching (Cambridge Consultants 2019) to identify prohibited material and proactively censor it before it can be posted. This shift towards ex ante censorship raises important ethical and legal concerns (Cambridge Consultants 2019; Llanso et al. 2020).