Non-Indexable | Canonical Pages
In this guide we look at non-indexable canonical pages. This check is performed exclusively on canonical pages (pages with a self-referential canonical tag).
What is a non-indexable page?
A non-indexable page, is a web page that cannot be indexed by Google. This means that it will not (is very unlikely) to appear in the organic search results.
A page can be non-indexable for several reasons, such as having a ‘noindex’ meta robots tag in the source code of the HTML which instructs Google not to index it. It could also be that a page is disallowed from the robots.txt file and so Google are instructed not to crawl the page. Other reasons a page could be non-indexable are largely related to accessibility issues such as the page returning a 404 error.
What is a canonical page?
A canonical page is a page that contains a self-referential canonical tag within the HTML source code. For example, if a page is accessed from https://example.com & the canonical tag has a link which matches this exactly, the page is considered canonical. This essentially means it is the preferred version of a page or content if the page can be accessed from multiple URLs.
If you have duplicated content across multiple pages, you should use canonical tags to advise Google as to which is the preferred (canonical) version. The canonical content / pages on a site are typically the ‘SEO’ pages or pages that you would like to appear in the SERPs (Search Engine Result Pages).
Why is a non-indexable canonical page and error?
If a page is both canonical and non-indexable, there is a conflicting message, and very few legitimate reasons why a page would be configured this way. In 99% of scenarios you will require your canonical pages to be indexed and hence indexable.
We mark this check as a serious error, as it is very likely to affect the organic visibility of a page within the search results.
A page should be either:
- Non-indexable & non-canonical
- Indexable and canonical
What Do We Check & How?
When our SEO web crawler scrapes data from pages, we pull out components like the noindex tag if present, the canonical tag if present and analyse these to asses whether the page is indexable and canonical. When we find pages that meet the criteria for both, it will be flagged as an error.
Our software determines a page to be non-indexable if either or both the following statements are true:
- The page has a noindex meta robots tag present on it
- The page is disallowed from the robots.txt file
What should you do with this data?
Using our software, you can identify all the pages that are canonical and non-indexable. You can use this data for any of the following purposes:
- Resolve indexation issues
- Resolve incorrectly configured canonical issues
- Provide the list of problem pages to a client
- Count the SEO errors on a site