fbpx

Duplicate Content SEO: Facts, Myths, And Best Practices

There’s a lot of confusion regarding duplicate content in the SEO community. Are duplicate pages penalized? What do you consider as duplicate content? Will you lose your ranking if you decide to copy a section of text from a resource page?

How does it affect your site’s SEO performance?

With the many changes that search engines like Google to their algorithm, it’s getting harder to keep track of what you’re allowed to do. And this gave rise to various myths around duplicate content.

It’s time to set the record straight.

In this post, we’ll go over content duplication and how it affects search engine optimization. Not only will you learn what’s allowed, but you’ll also see some of our tips on avoiding duplicate content issues.

Let’s begin.

What Does Duplicate Content Mean?

Our discussion begins with answering the most obvious question:

What is considered duplicate content?

Fortunately, we won’t have to guess what Google means by duplicate content, for they have defined the term themselves in one of their Search Console support pages.

duplicate content - what does it mean - search console

Google defines duplicate content as “substantive blocks of content within or across domains that either completely [matches] other content or are appreciably similar.”

However:

There is a caveat. Duplicate content should not be deceptive in nature. And for the most part, that seems to be the case for most sites.

Online stores would have the same landing pages for the same product. This happens because some products have multiple variants (or SKUs). A shirt would not only have different sizes, but it could also come in different colors.

Let’s take Los Angeles Apparel as an example. They sell crew neck shirts of different colors and sizes.

duplicate content - what does it mean - los angeles apparel

As soon as you click on any of the variants, the URL changes to reflect the changes made by the user.

This is the original URL:

duplicate content - what does it mean - los angeles apparel orig url

When you change the size, it changes to:

duplicate content - what does it mean - los angeles apparel new url

Notice the addition of the extra set of characters at the end of the new URL. SEOs refer to this as faceted or filtered navigation. It’s commonly used by e-commerce sites to sort their products.

The problem with this system is that each URL is a standalone page in Google’s eyes. Meaning they’re treated differently even though their content is identical to one another.

Obviously, Los Angeles Apparel has no intention of cheating the system by creating multiple pages with the same content. It’s a byproduct of the e-commerce structure that the company built for itself. Google understands this. So while it does have duplicate content, the company isn’t penalized for it.

Duplicate vs Copied Content

While there are SEO and business owners who create duplicate content unintentionally, you can’t say the same for everyone.

There are those who are trying to cheat the system so that their sites gain more visibility in the search engines. They blatantly copy content from other sources and promote them as if they’re original pieces.

Google refers to these instances as copied content.

You might ask yourself: Aren’t duplicate content and copied content basically mean the same thing?

And under normal circumstances, you’d be right. But Google made a distinction between the two terms. So what’s the difference? Intent.

If there’s no intent to manipulate search rankings, what you have is a case of duplicate content. Otherwise, what you have is copied content.

And when you copy content from other sources, a Google penalty isn’t far behind.

Here’s a video where Google’s own John Mueller explaining how they penalize webmasters and authors who purposefully copy content from other sources.

It’s a lengthy video, but it does show you how Google treats duplicate content. It’s worth watching.

Examples of Copied Content

Let’s take this blog post from Neil Patel. In this post, Neil talks about the importance of heading tags and how they can affect your ranking. Pretty compelling stuff.

duplicate content - what does it mean - neil patel

Because Neil is a prominent figure in the SEO community, it shouldn’t surprise you that his content finds its way to other sites. He’s often quoted in SEO blogs and used as a resource when SEO bloggers want to get a point across.

That also means some webmasters will try to take advantage.

We found another site that copied Neil’s work with no attribution to the source. Simply put, the author tried to pass off the content as an original.

Don’t believe us? Just take a look.

duplicate content - what does it mean - neil patel copy

The offending party didn’t even bother changing the featured image.

These are the kinds of posts that Google would normally penalize. There is a blatant effort to copy the original source to rank in the SERPs.

How to Check Duplicate Content on a Website

Since most duplicate pages happen by accident, you really can’t confidently say that your site doesn’t have any. That’s why it’s good to check your site for duplicate content.

But how do you go about doing that?

We have you covered.

In this section, we’ll go over finding duplicate content within your domain. And you’d be surprised by how easy it is. Best of all, you can do it for free.

Find Duplicate Content Using Screaming Frog

Screaming Frog SEO Spider Tool is a program used to check sites for possible SEO-related issues. It is free to use. It’s worth pointing out that the free version is limited as it places a limit on how many pages it can scan. To do more, you’d have to pay for a license.

But on smaller sites (less than 500 pages), we’d argue that the free version is more than enough to get you started.

duplicate content - check duplicate content on website - screamingfrog

Note: You can use the Screaming Frog SEO Spider Tool on Windows, Mac, and Ubuntu.

Once you have the program installed, you’re greeted by the main dashboard. This is where the bulk of the work will take place. To start, enter your domain in the space just below the main menu.

detecting duplicate content using screaming frog

Click Start to begin the process. Screaming Frog will use this time to crawl through your whole site. Not only will it show you a list of all your pages, but it will also tell you other relevant information.

It can show you the meta tags used, the headings, page status, images, response codes, the links inserted, and much more.

duplicate content - check duplicate content on website - screamingfrog results

And while it won’t tell you explicitly which pages have duplicates, the data that the tool provides would give you an idea if you have duplicate pages in your domain.

Take your titles as an example. Having two pages with the same title is unusual. Therefore, you can assume that pages that share the same title are probably duplicate copies.

And if you investigate further and find that even the meta description and subheadings share the same text, you can conclude that it is the case.

To check the titles of your posts, hit the Page Titles tab.

duplicate content - check duplicate content on website - page titles

Click Title 1 to show the titles in alphabetical order. This will make it easier to find pages using the same post/page title.

duplicate content - check duplicate content on website - title 1

You can also look for duplicate pages using metatags and subheadings by clicking the appropriate tabs and arranging the results in alphabetical order.

To make rooting out duplicate pages much easier, we suggest that you export the list as a CSV or Excel file and work on it on a spreadsheet.

duplicate content - check duplicate content on website - screamingfrog export

The Export button is located just below the main menu.

Avoid Duplicate Content Using Plagiarism Checker Tools

Of course, the ideal solution to not having duplicate content is prevention. Before you publish a post, it’s better to run the whole post through a plagiarism checker tool so you don’t run into issues down the line.

We know what you’re thinking:

You didn’t copy anyone’s work. So why should you use a plagiarism checker? Wouldn’t that be a waste of your time?

It’s not.

Why? Because it’s easy to copy someone’s thoughts and words, even by accident. When you write about a topic that others covered before, the content tends to borrow words and phrases from research materials and other sources.

In simpler terms: You might end up writing phrases and sentences that others have used in their work.

Fortunately, there are many plagiarism checker tools on the internet. You can use a tool like Quetext or Copyscape to go over your post prior to publishing. And both tools yield fantastic results.

And what makes them even better is that both of these are free to use.

Note: Both tools are free, but like Screaming Frog, these won’t give you full access. If you want to scan unlimited pages, you’d have to buy the full version.

Let’s check out Quetext.

This online tool is fairly simple to use. Just copy your text into the allocated space for it. Once done, hit the Check Plagiarism button and wait for the results.

duplicate content - check duplicate content on website - quetext results

In the example above, we used text from a blog post on Ahrefs. After a quick scan, Quetext recognized the source of the post and reported it as such.

The underlined sections show which words, phrases, and sentences were lifted from the source. Quetext even tells you how much of your content was stolen. In our case, Quetext sees that we copied 100% of Ahrefs’ post.

Copyscape works the same way. But in their case, you enter the URL of the page you want to check instead of the text itself.

duplicate content - check duplicate content on website - copyscape

This makes Copyscape a great solution to inspect pages that you have already published.

Is Duplicate Content Bad for SEO?

Let’s go back to the question at hand:

Does having duplicate content bad for your site as far as SEO goes? Can we take Google at their word that they won’t punish you for having duplicate content?

Giving an either-or answer won’t satisfy you because—if we’re being honest—it can go both ways.

Sure, Google does not issue manual penalties on domains with duplicate content. Instead, their basis for penalties falls on user intent.

So long as you don’t copy others and pass off their work as your own, there’s no need to worry.

And if others copy your work and publish it on their site, Google will penalize them. Your site will be unaffected by this.

Again, the context behind the duplicate content is what matters here.

Google is pretty consistent in that regard. In fact, this very issue is something they addressed a while back (as far as 2012, in fact).

Matt Cutts, who used to work in Google, posted a video explaining how the search engine sees duplicate content. As he explains:

[If] you’re a regular blogger and you just want to quote an excerpt [from] some blogger you like or some other blogger who has good insight. Just put that in a blockquote, include a link to the original source, and you’re in good shape.

In 2013, Matt posted another video. This time, he answered a question that specifically asked how Google handles duplicate content from an SEO perspective.

Here, Matt even goes on to say that 25% or 30% of web content is duplicate content. Matt further explains:

So duplicate content does happen. People will quote a paragraph of a blog and then link it to the blog. That sort of thing. So it’s not the case that every single time there’s duplicate content, it’s spam. And if we made that assumption, the changes that happened, as a result, would end up probably hurting our search quality rather than helping our search quality.

So what Google actually does when it finds duplicate content is group everything together and look at it as if it’s all one big piece on content. For example, if Google sees two pages that kind of identical, it will select only one of them to show in the search results.

When Does Duplicate Content Become a Problem?

Now that we’re clear on Google’s stance on duplicate content, there should be a need to worry, right?

Not quite.

As we’ve mentioned, it’s not really that simple. Because while there is no penalty for duplicate content, this very issue can cause other problems that will affect other facets of your SEO campaign.

How?

  • Duplicate Content Confuses Google — We mentioned earlier that some duplicate pages share the same URL. Only some have extra characters in them to account for product variations, tracking pixels, and other factors. When this happens, Google might rank the wrong page (the one with extra characters) instead of the right one (the one with the “clean” URL).
  • Duplicate Content Splits Backlinks — It’s important for SEOs to get backlinks to the right pages to get the most out of the link equity it could produce. But when you have two or more pages with the same content, you’re splitting the link equity instead, which would be a waste.
  • Duplicate Content Makes It Harder for Google to Crawl Your Site — Having more pages means Google will have to put in more effort to crawl your site. So getting rid of duplicate pages will only help your cause.

What Can Duplicate Content Do to Your Site?

As we mentioned earlier, a webpage can have duplicate or copied content. Duplicate content—according to Google—is acceptable and won’t result in penalties. However, copied content is not tolerated.

Let’s look at real-world examples of both cases in action.

Hook Agency published a post that accuses Neil Patel of having ghostwriters that “steal” content from other marketing experts. Blocks of text are copied then used in Neil’s blog posts.

duplicate content - is duplicate content bad for seo - alex birkett

Alex Birkett accused Neil Patel of lifting text from his post. It wasn’t long after when other marketers like Kieran Flanagan sounded off and implied that this is the case.

duplicate content - is duplicate content bad for seo - kieran flanagan

Was Neil Patel affected by any of this?

No.

In fact, he seems to be doing just fine. While his reputation might have taken a hit, he was able to walk away without being penalized by Google. That’s because copying text from other sources is not penalized by search engines.

If Neil (or his alleged ghostwriters) copied content from another source and added his thoughts or spin on the topic, he’s not doing anything bad in Google’s eyes.

But what about those who blatantly steal content?

Unfortunately, the victims of stolen content might suffer despite Google’s claims that it knows who the original source of the text is.

Pi Datametrics published a post that details how some of their clients fared against copied content. Despite having great and original content, their clients dropped in the SERPs as soon as other sites copied their posts.

How Much Duplicate Content Is Acceptable?

Google wants you to publish unique and fresh content. That’s why it’s part of Google’s ranking factors. However, we already pointed out earlier that 25% to 30% of the internet is duplicate content.

Raven Tools’ numbers seem to back up Google’s numbers, putting the internet’s duplicate content at 29%.

duplicate content - how much duplicate is acceptable - raven

So when you put that into perspective, it’s not really a realistic goal.

When writing an original piece, there’s a chance that you’ll eventually circle back to the same ideas others have explored before you.

What About Article Spinning?

Many would think that article spinning—or the act of rewriting content (often through software) to make it unique—is the ideal solution.

But that would be wrong.

In a YouTube video, John Mueller explains that spun content goes against Google’s guidelines. So, yes, those pages would have penalties imposed on them. Therefore, it’s definitely something you’d want to avoid.

John also laid out scenarios that Google considers as spam content:

  • Completely auto-generated content where you just have a few seed words and go off to target those keywords
  • Scripts that take existing content on the web and swap out individual words
  • Cases where people take existing content and translate it in another language

And the Google team takes action against these violations, as John further explains:

The webspam team generally when they run across situations like this they do take action. And that action can be anywhere from the site is not showing up so well on search to the site is completely removed from search when we think there’s really no value at all in that site.

Value Over Uniqueness

So how much duplicate content is acceptable?

If we’re being honest, nobody knows for sure.

But we do know one thing: When it comes to content, you should zero in on value over uniqueness. What does that mean?

Even if your post is 100% original, if it fails to provide value to users then there’s a chance that your competitors would rank higher anyway. And yes, that includes those who quoted sections of your post.

Let’s look at another real-world example.

Mark Schaefer coined the term “content shock” a couple of years back. It refers to the influx of new content that marketers face in today’s climate. He correctly predicted that it would become harder to rely on content marketing due to the volume of posts published by businesses and other marketers.

In this post, Mark explains that Harvard Business Review rejected his posts. When he asked the editor why, Mark was simply told that they were getting more submissions — and that the quality of the newer submissions is great.

duplicate content - how much duplicate is acceptable - mark schafer

So the problem isn’t people copying content. The real issue for most content marketers is that writers are now submitting better entries.

If you really want to compete, your focus should be on improving the quality of the content first.

How Do I Stop SEO Duplicate Content?

It’s true — you shouldn’t dwell too much on people who duplicate your content. If they quote you here and there, the SEO ramifications wouldn’t be too hard.

But if it ever comes to a point when a site is ripping off your pages in its entirety, you’d be happy to know that there are steps you could take to stop the offending party dead in their tracks.

Below are a few tips.

Use Feed Delay

A feed delay prevents your post from being added to an RSS until you’re ready. Why is this important? Because Google will index your post as soon as it’s published. And when search engines index your page before you, there’s a chance that it will see your post as a duplicate.

Install Preventive Plugins

WordPress users will find many plugins that would help in their fight against content thieves.

Yoast SEO, for example, has features like attribution links to the RSS feed. Should people steal your content, a link would appear to your post that will show readers who the original author is.

duplicate content - how to stop duplicate - wp content copy

The WP Content Copy Protection is another nifty tool that disables the right-click function and keyboard commands that lets people copy the content.

Disable Copy Image Function

Speaking of disabling the ability to copy text, you can also do the same for images. That means anyone who’d like to use your image won’t be able to copy or save the image.

Note: Users may still be able to screenshot your image, but that would leave them with low-quality images.

Add a DMCA Badge

A DMCA badge is a good deterrent for potential offenders. DMCA is an organization that protects content from being stolen. It’s an online service so you do have to pay for the badge but there is a free version available. DMCA gives paying members recourse when their content is stolen.

duplicate content - how to stop duplicate - dmca

DMCA even provides a service that lets you find duplicates of your content.

How Do I Fix Duplicate Content?

More specifically, how do you fix duplicate content on e-commerce sites?

Here are just some of the ways you could correct duplicate issues within your domain.

Set Up 301 Redirect Pages

When you have multiple variations of the same page, the easiest way to fix that issue would be to redirect the duplicate pages to the original one.

Use Rel=”Canonical”

If redirecting a page is out of the question, you can try using the rel=”canonical” attribute. This code is added in the HTML to tell search engines that the page it’s looking at is a copy and not the original.

Use NoIndex

Just like the rel=”canonical” attribute, you can add the meta robots noindex to the duplicate page. As the command name suggests, it tells search engines to not index the page to avoid duplicate issues.

Explore Google Search Console

Inside Google Search Console, you’ll find an option to select the domain and parameter handling. This means you get to choose how Google crawls URL parameters. For example, you can set your URLs to load as http://www.abcd.com or as http://abcd.com instead.

Why is this important?

Because if you have two pages with slightly different URLs, then it is counted as two different pages. That would mean that they would automatically count as duplicate pages.

So you really want to set your URL parameters to avoid this problem.

Conclusion

Having duplicate content is something you should keep in mind but not panic over. So long as you’re sticking by Google’s guidelines, you shouldn’t run into any problem.

Just don’t steal from other people, especially if you’re not adding any value for your users. That’s when Google will spring into action and penalize you.

seo profile image

Craig Campbell

I am a Glasgow based SEO expert who has been doing SEO for 18 years.

  • social media icon
  • social media icon
  • social media icon

Online Courses