- Duplicate Content & Canonicalisation – Sub-Domain Duplication
- What is Sub-Domain Duplication?
- Impact of Issue.
- How to Resolve.
- Benefit of Resolving
This is a common issue leading up to website migrations or with multi-regional and mobile sites that use sub-domains to target different regions.
There are (as described above) completely legitimate reasons to duplicate your site onto a sub-domain, however problems can occur if this is not handled correctly with ‘rel’ tags. Having your content duplicated and not handled correctly on a sub-domain can lead to a host of unwanted ranking problems.
This article is one of several that fall under the duplicate content and canonicalisation series in the Raptor Knowledge Base. Please the below list for all other articles covering all of the different types of duplicate content and canonicalisation issues that a website can experience below:
- Cached URL
- Canonical Duplication
- HTTP / HTTPS Duplication
- Sub-Domain Duplication
- Lowercase / Uppercase Duplication
- Trailing Slash Duplication
- Session ID Duplication
- External Website Duplication
- Internal Website Duplication
- Printer-Friendly Page Duplication
- Index Page Duplication
Sub domain duplication refers to the duplication of a website or pages of its content on a sub domain. Often this is the result of a development site being held temporarily on a sub domain. In order to check whether a sub domain exists use the following search operators within Google (substituting) the XXX for your domain name.
Google can see the content being presented from URL’s as duplicate content and will decide for itself which one to present to users in the SERPs. Essentially devaluing the content on one URL and promoting the other.
In the case of a whole site or a large portion of a sites content being accessible through a sub domain where it is being used as a development / staging server this could be a critical problem. Without taking the appropriate steps to resolve this issue, unfinished pages on a staging server could be listed in the SERPs.
We describe below the three most common causes of this problem, as each can cause different impacts to your site:
Having a sub-domain that targets a different region but that isn’t setup correctly can result in in the sub-domain not appearing in the SERPs for searches in the target region.
In some cases, this could lead to the wrong domain / sub-domain being shown in each region.
Having a sub-domain that targets a mobile device but that isn’t setup correctly can result in in the sub-domain not appearing in the SERPs for searches on different devices.
In some cases, this could lead to the wrong domain / sub-domain being shown on each device… The obvious impact of this is that the two sites are designed to perform on specific devices.
Having a sub-domain that is being used as a staging server but that isn’t setup correctly could lead to Google showing the staging server in the SERPs rather than the actual domain.
This could cause massive problems if the staging site is not fully functional.
Depending on the nature of the duplicate content issue, there are differing solutions. Using the above example of a new site being developed on a sub domain, use the following steps to resolve the issue:
- Using the Robots.txt file, disallow the sub domain from being indexed and crawled. The code below is an example of how to achieve this (modify the code to include your domain name).
- Add NOFOLLOW and NOINDEX tags to all of the sub-domain’s pages using the code examples below (which should be added to the header section of the source code):
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
- Setup canonical tags on the sub-domain pages that point to the main website pages.
- Remove links to the sub-domain where and if they exist on the main website (top level domain).
This is a topic mentioned in greater detail in another article specific to canonical tags but for the purpose of this article; adding a canonical tag to every page of the site (following the guide above) will prevent most duplicate content issues on any site.
This solution will resolve the issue but will not allow the duplicate page to rank for the content posted on it. It will however pass on any authority that the page may have to the canonical page.
Mobile sites are less common now compared to responsive or adaptive websites, which are far more efficient. However there are still a lot of mobile sites around and they will often sit on a m.domain or some other similar subdomain. Fundamentally this equates to duplicating the majority of your content onto a sub domain.
In order to handle this properly Google have set out how the relationship between the two sites should be structured using the <Link> tag. This means using ‘rel alternate’ in addition to the ‘rel canonical’ tags to define this relationship in-line with Google’s requirements.
On the desktop site, you need to add a ‘rel=alternate’ tag that links to the corresponding page on the mobile site, for example:
On the page: http://www.example.com.au/page-1.html the following code should be used:
<link rel="alternate" href="http://www.m.example.com.au/page-1.html" />
This allows Google’s crawler (Googlebot) to more easily identify the mobile URL version of this page’s content. On the mobile page:
http://www.m.example.com.au/page-1.html the following code should be implemented:
<link rel="canonical" href="http://www.example.com.au/page-1.html" />
It is worth noting that you can still add the rel canonical tag to the desktop version of each page, but avoid the canonical chains described above.
Use the canonicalization analyser in our SEO Tools to identify any opportunities for improvement and sub-domain duplication problems. This can provide a range of recommendations designed to help improve your on-page SEO. Raptor’s SEO Tools, check for, analyse, and make recommendations for every item in this knowledge base.
Removing the chance / opportunity for Google or other search engines to devalue your content whilst controlling what content appears in the SERPs and on which URL.