Opinion

Can Google detect AI-Generated Content?

AI plagiarism expert Jon Gillam examines the war between search engines and text-generation engines like ChatGPT
By
Jon Gillham
Robot hand types on a keyboard

AI-generated content is becoming immensely popular, with many AI content-generation platforms and programs such as ChatGPT springing up.

As the name implies, AI-generated content refers to content generated by an AI. The content can range from blogs, articles, essays, business plans, and other written content to images and music.

Many times, it’s hard to distinguish AI-generated content from content that humans actually created. And this sometimes can pose some ethical issues.

While there isn’t a body in place to regulate the use of AI, various algorithms and processes are being developed to detect AI-generated content.

Here’s how Google, the search engine giant, is tackling the problem of AI-generated content.

How Google Detects AI-Generated Content

To answer the question, yes, Google can detect AI-generated content… sort of!

To explain this, we’ll be focusing mostly on written content.

Google is constantly developing and tweaking new algorithms to deal with the issue of AI-generated content.

With algorithms, Google can check for how well-written content is, as well as various inconsistencies and patterns that show up in AI-generated content. Google will look for sentences that are meaningless to human readers but contain keywords. The company will also check for content generated through stochastic models and sequences like Markov chains.

Google also checks for content generated from scraping RSS feeds. Content stitched from various web pages without adding any tangible value will also be detected as AI-generated content. Content generated through deliberate obfuscation and like-for-like replacement of words with synonyms will be identified as AI-generated content.

Basically, if it falls within a structure that is identifiable by the algorithm, Google flags it.

That said, while content written by older NLP models like GPT-1 and GPT-2 is easier to detect, the newer GPT-3 is more advanced and is harder to detect, hence the “sort of.”

Google explains that the more they get better at detecting AI-generated content, the more the creators of these tools find ways to get better and circumvent the system. Google’s search advocate, John Mueller, likens this to a “cat and mouse” game.

Tools such as Originality.AI exist that can also detect if content was generated by AI writers like ChatGPT. They have a AI Content Detection Chrome Extension that provides free credits to be able to check if the content you are reading is AI generated.

Importance of detecting AI-generated content

At its very core, the main reason for developing algorithms to detect AI-generated content is ethics. How ethical is it to use content generated by AI? Does AI-generated content fall under plagiarism or copyright laws, or is it actually newly generated data?

Many universities and other educational institutions require students to work on content independently without submitting AI-generated content or outsourcing it. Mainly because these schools believe that if students leave all their essays to AI, they could get duller.

Also, companies and SEO agencies pay copywriters and content writers to write content for them. Sadly, some of these writers use AI to generate content that may not meet the specific needs of their clients. Making it even more important to detect AI-generated content.

Currently, Google penalizes websites and blogs for having AI-generated content. John Mueller, Google’s Search Advocate, confirmed that Google considers all AI-generated content to be spam.

He explained that using machine learning tools to generate content is considered the same as translation tricks, word shuffling, synonym tricks and other similar tricks. He further confirmed that Google would invoke a manual penalty for AI-generated content.

This battle isn't going away

AI-generated content is the latest daily application of machine learning to everyday living. More and more AI-content generations are springing up with their creators hoping to cut out their slice of users and gain some market share as users increase.

But Google will always be there to detect AI-generated content and its users. Google has always found a way to win against Black Hat SEO practices and other nefarious means that people use to circumvent their SEO guidelines, and this won’t be different.

AI-generated content won’t go away. But they will be used ethically. The American political scientist John Mueller predicts that AI content generators will be used ethically for content planning, and to eliminate grammar and spelling issues. This is different from using AI to churn out written pieces within minutes.

The issue of AI-generated content is quite new, but as the company has always done, Google will always innovate and find more accurate ways to identify AI-generated content.  

Written by
Jon Gillham