Internal Website Duplication
- Duplicate Content & Canonicalisation – Internal Website Duplication
- What is Internal Website Duplication?
- Impact of Issue
- How to Resolve
- Benefit of Resolving
Internal website duplication can be a major problem for some sites. Unlike most canonical duplication problems this is not that a page is accessible from multiple URLs; often it is the case that the content (words on a page) have been duplicated on other pages. This means that the content itself exists in multiple locations and is hence accessible from different URLs.
This article is one of several that fall under the duplicate content and canonicalisation series in the Raptor Knowledge Base. Please the below list for all other articles covering all of the different types of duplicate content and canonicalisation issues that a website can experience below:
- Cached URL
- Canonical Duplication
- HTTP / HTTPS Duplication
- Sub-Domain Duplication
- Lowercase / Uppercase Duplication
- Trailing Slash Duplication
- Session ID Duplication
- External Website Duplication
- Printer-Friendly Page Duplication
- Index Page Duplication
Unlike External website content duplication this issue can be easily resolved by your own web admins. Internal Website Duplication refers to content that is duplicated within your own website. Unlike some of the other issues discussed in this series on duplicate content and canonicalisation this is not where the same content is being served from different URL’s as the result of a technical issue…
Rather this is the result of using the same (or very similar) content on different web pages. Moreover this article does not refer to small pieces of content being duplicated across multiple pages. For example: the menu or footer on a website will usually be present / duplicated on every webpage, this is not a problem from a duplicate content perspective.
Duplicating a tag line or a snippet of text is not going to result in a duplicate content problem; a page should be evaluated by looking at the ratio of unique content to duplicate content. There is no hard and fast rule for this but a good ‘rule of thumb’ is to keep the majority of the content on a page unique.
There are instances where it is a requirement to have chunks of identical text such as disclaimers, or legal information present on multiple pages. Google are fairly adept at identifying this type of content. Another factor is the total volume of content on a page; a page with very little content, a high percentage of which is not unique, will typically have less than value than a page with more content or a better ratio.
The impact of having your content duplicated within your website is that your content may be devalued by Google, this will affect your ability to rank for the target keyword/s. Google will typically only serve unique content within the SERPs so if you have the same content on multiple pages Google will only serve one of these within the SERPs.
One of the more unique components of our SEO Tools is how our machine learning algorithms assess and categorise internal content duplication… We assess how much content is considered to be unique on each page; we identify legal disclaimers and repeated sales messages throughout a site and create a score of how unique your content.
For example, if you have a legal disclaimer, a top and side menu and a footer on each page; Raptor will evaluate how much of the remaining content is unique, this score is represented by a percentage.
Problems can arise where content thin pages with fewer than say 200 words of unique content, sits on a page with 150 words of duplicated content. This would create low percentage, in this example the page in question would contain only 57% unique content, with just 200 words.
As such we provide you with a breakdown of your content, how unique each page is, and this can help to identify problem areas. Although typically we refer to duplicate content as being, Page-A is a duplicate of Page-B, Google can devalue content even if it not an exact duplicate but broadly very similar to another page.
There are really only three ways to resolve this issue; all of the options are preferable to leaving duplicate content on the site.
The first solution is to remove the duplicate and / or replace it with unique content. This is the ideal solution but also requires the most work. This is because it will increase the value of the content and thus the potential to rank for target keywords.
Use our content analysis tool to assess how unique your internal website content is, in contrast to other pages on your site. This can provide a range of recommendations designed to help improve your on-page content strategy.
This is a topic mentioned in greater detail in another article specific to canonical tags but for the purpose of this article; adding a canonical tag to every page of the site (following the guide above) will prevent most duplicate content issues on any site.
This solution will resolve the issue but will not allow the duplicate page to rank for the content posted on it. It will however pass on any authority that the page may have to the canonical page.
If dedicated paid search landing pages are being used on a site, ensuring that they are not indexed.
There are two on-site ways to remove content from Google’s index:
- NOINDEX meta Tag
- Disallow from Robots.txt
Both of these options are discussed in greater depth in separate articles on Canonical Tags, Rel Tags, Meta Tags & HTML Code & Robots. For more information on how to implement these solutions please refer to the articles linked above.
Controlling the URL’s from which your site & pages can be accessed prevents the potential impact from duplicate content issues.