Content moderation is a longstanding problem that has intensified in the last few years as increasing pressure forces tech giants like Google and Facebook to take harsher action over offensive, hateful, illegal or otherwise harmful material.
Despite content moderation and censorship sharing similar bounds, there is a fairly strong consensus amongst both governments and the public: online content should be moderated, at least to some extent.
As such, Facebook, Google and other key stakeholders have continued to iterate increasingly advanced forms of AI to assist in the battle against harmful content, the most recent and best model being XLM-RoBERTa (XLM-R) which marks significant improvements on former NLP ace BERT.
The paradigm shift that NLP facilitates is the move from reactive moderation techniques to proactive moderation. Proactive pre-moderation is the process of moderating content before it’s posted as opposed to reactive moderation which relies on community reports.
In pre-moderation, AI triages content to pre-filter it prior to manual analysis by a human. This reduces the burden on humans, whilst ensuring that human teams still have the final say over pre-screened content. Facebook’s new content moderation system aims to remove the most damaging content before the least, placing both reactively and proactively screened content in the same queue.
In 2020, Facebook estimated that some 88.8% of hateful content is proactively flagged by AI, representing an increase from 80.2% in the first quarter of 2020. The current content moderation system has been criticised for being too effective, or overdeleting, but its reception has been generally positive.
The content moderation system used by Google is perhaps even more multifaceted as it requires moderation action over Google search as well as Maps, YouTube, Ads and many other products. Google partly moderates for content during the indexing process but also uses an array of proactive approaches to remove user-generated content (UGC) across its other platforms.
In 2020, Forbes reported that Facebook’s content analysis and moderation AI flagged over 3 million pieces of potentially offensive content that violated their community standards every day. Facebook employs some 15,000 content moderators who have to sift through millions of pieces of content, either affirming or denying the AI’s suspicions. Zuckerberg admitted that some 1 out of 10 pieces of content slip through the net and are allowed to go live, but believes the very worst content is filtered out with a near-100% success rate.
Whilst modern moderation AIs are exceptionally efficient at working at scale, this doesn’t alleviate the human burden of content moderation and the reliance on end-point human involvement is likely to persist. The work stressors that those content moderators have to endure whilst working on the ‘terror queue’ (the moderation queue of the very worst content) has also received considerable coverage in recent years.
An Ofcom paper on content moderation highlights the difficulties presented by our ever-expanding content universe. NLP algorithms are now learning to interpret nuances such as sarcasm marked by emoji uses without the need for human data labelling, and can accurately derive meaning from short video clips and gifs, but the development of new offensive slang and hateful imagery presents an ongoing challenge.
Modern content moderation algorithms now have both the depth and breadth to understand advanced multi-modal content types with deep semantic connotations, but false-positive incidence rates remain high, and AI is struggling to keep up with the lengths hate content creators are willing to go to keep their content under the radar.
Moreover, there is considerable diversity in language and imagery itself. The ‘Scunthorpe problem’ and numerous related problems highlight the difficulty of applying simple logic to classify strings of offensive words. The risk of false positives remains high and concern has been raised over whether fully automated moderation would lead to cultural or linguistic discrimination.
There has been major progress in tackling false positives. For example, image and video content moderation APIs like Amazon’s Rekogniton use two-layer video labelling and audio labelling hierarchies to accurately discern harmful images from closely related non-harmful images, e.g. breastfeeding women from sexually suggestive content.
Automated pre-moderation of publicly accessible content imposed by multinationals and governments presents an ethical problem. Corporations like YouTube are tightening their grip on creators that spread misinformation and conspiracy theories, but classifying content as either of these things is likely only possible after a rigorous interrogation of that content rather than a cursory glance.
There has also been considerable debate about where such practices intercept with human rights laws, with critics highlighting how increased false positives - or ambiguous decisions - are robbing people of their public voice and sometimes even their careers. GDPR requires ‘data subjects’ to be informed of when their content is removed via automated procedures - privacy regulations further complicate the bounds between corporate responsibility and censorship.
On a practical or technological level, AI is certainly assisting in filtering out content that the vast majority of individuals consent should not be exposed to the public. However, it is still unclear how AI intends to make the quantum leap from triage to fully-automated pre-moderation with no human involvement.
As it stands, AI is an assistive tool that streamlines the moderation process for those teams of moderators that still bear the burden of having a final say. The system will be imperfect regardless of whether humans are involved or not, but retaining human involvement currently represents the lowest risk option.