Dealing with duplicate content

Of all of the possible SEO problems to tackle, columnist Ryan Shelley says duplicate content is one of the easiest. Learn why, and get the scoop on what you need to do.

The words “duplicate content” strike fear into the hearts of many webmasters and SEOs. But the truth is, not all duplicate content is created equal.

Since content is a core element of good SEO, many have tried to manipulate the result by using the old “copy and paste” approach. Google punishes this method, so it should strike fear into your heart.

But if you have unintentionally created some duplicate content on your site, don’t freak out. Below, we will look at how Google treats duplicate material, and I’ll share a few tips you can use to ensure that your site’s content is fresh and unique.

To gain a better understanding of how Google treats duplicate content, you need to read their overview here. If you are afraid of getting penalized, let me help you breathe a bit easier with this quote from the article above.

“Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results.” — Google

OK, so now you know Google is not out to get you, but if you do have duplicate content, you should take some time to clean it up. Duplicate content typically falls into one of three categories: exact duplicates, near duplicates and cross-domain duplicates.

  • Exact duplicate: Two URLs have identical content.
  • Near duplicates: Two pieces of content have small differentiators.
  • Cross-domain duplicates: Exact or near duplicate content exists on multiple domains.

Duplicate content can result from a variety of different factors. In some cases, websites license content for use in other places; poor site architecture flaws can play a role, too. Plagiarism results in duplicate content, and the most common cause, in my opinion, is CMS issues.

While all of these can create problems, we must deal with each in a specific way. Before we get into the tips, let’s address the consequences of duplicate content.

Duplicate content consequences

If you posted a piece of duplicate content due to an oversight, the search engines in most cases would simply filter it out and display what they believe to be the best version in the SERPs.

Sometimes, they will just filter it out before indexing the piece at all. Users want diversity in their search results. So the crawlers and engines are doing their best to deliver that. Below are just a few of the common consequences associated with duplicate content.

  • Wasted crawls: A search bot comes to your site with a crawl budget. If you have a lot of duplicate content, its wastes the bot’s crawler budget, and fewer of your unique, good pages will be crawled and indexed.
  • Wasted link equity: Duplicate pages can gain PageRank and link authority, but it won’t help, because Google won’t rank the duplicated content. This means you waste your link authority from those pages.
  • Wrong listing in SERPs: No one knows exactly how the search algorithms work. So if you have multiple pages with exact- or near-duplicate information, you don’t get to decide which pages get filtered and which pages rank. This means the version you want to rank may get suppressed.

How to avoid duplicate content

Having duplicate content on your site is not useful for the search engines or your end users. That said, you can prevent negative impacts by taking care of the issue.

Below are a few ways you can fix any duplicate content issues you come across.

  • Using 301 redirects: This is a useful approach if you plan to remove any duplicate content pieces from your site. Since some of those pages may have received links, it’s a good idea to have those pages redirected permanently to the correct URL. This will tell the user, and, in this case, more importantly, the search bots, where to find the proper content.
  • Blocking with robot.txt: Another option often recommended is using your robots.txt file to block duplicate pages from being crawled. However, Google doesn’t recommend this approach, stating, “If search engines can’t crawl pages with duplicate content, they can’t automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages.”
  • Using rel=“canonical”: If you plan to leave your duplicate content up, using the rel=“canonical” link element is a great option. This step tells the search bots which version of the content is the “true” version. Add this tag to the header of your duplicate piece of content, link as so:
    <link rel=”canonical” href=“https:mytruecontent.com”>

    This code tells the search bots where to find the true piece of content.

While duplicate content is an issue and can harm you in the SERPs, it’s not as scary as many make it out to be. Unless you are maliciously trying to manipulate the SERPs, Google and the other search engines won’t typically penalize you. But, as stated above, there are still negative consequences for having duplicate content on your site. I recommend crawling your site, then doing your best to clean up and resolve all issues. The crawlers and your users will thank you!