Non-Indexable URLs - Raptor SEO Data
Non-indexable pages are pages that cannot be indexed by Google, in terms of the checks we perform these are pages that either have a noindex tag present and / or are disallowed from the robots.txt file.
Use this data to make sure that the content you want to be indexed can be indexed. Check out the video below for a short summary of what crawl data means.
Why Are URLs Not Indexable
There are several reasons why a page is not indexable, we cover all of these in this section. Pages that were once indexed but have had something implemented to prevent them from being indexed may take some time to be deindexed. Google needs to crawl a page or find a file and determine that it has changed before removing from its index.
NOINDEX TAG (Meta Robots Tag)
A meta robots tag is a page level tag that is added the head section of the source code <head>…</head> and these control aspects of indexation among other things. Adding a noindex meta robots tag to a page with prevent Google from indexing the page. They do however have to crawl the page to identify the tag.
The no index tag looks like this:
<meta name="robots" content="noindex">
To prevent only Google web crawlers from indexing a page, you use the following met robots tag:
<meta name="googlebot" content="noindex">
Disallow in Robots.txt
The robots.txt file primarily controls which content should be crawled by search engines and robots. This is not always obeyed by all web crawlers, but Google typically will obey this file. You can see an example of this on our site here robots.txt
The robots.txt file is just a plain text just file, the format for disallow looks like this:
User-agent: [user-agent name]
Disallow: [URL you don’t want crawled]
The specific code for preventing search engines and crawlers from crawling components of your site looks like this:
There are also many ways to manage this such as using wildcards to prevent any content with a file extension from being indexed.
Canonical Tag (Non-Canonical Page)
Canonical tags determine the canonical or preferred version of content, this is often when content is accessible from multiple URLs and you want to stipulate which is the right one. You can also use a canonical tag to reference another page if that page has been largely or completely copied.
Non-canonical pages, which are pages that have a canonical tag link that references another URL are unlikely to be indexed by Google bit this is not always the case. A canonical tag looks like this:
<link rel="canonical" href="https://raptor-dmt.com/support/non-indexable-pages/">
There are several redirections such as permanent or temporary but in either case the page is inaccessible and as such will not be indexed by Google. Google will typically index the page being redirected to if that page is indexable.
Although we just mentioned redirects which are inaccessible, there are also pages which are inaccessible for other reasons such as pages that are broken. A page that returns a 404 status or a 5XX response code will not be indexed by Google because it cannot be accessed.
Flash content although rare nowadays will not be indexed by Google because they cannot read or understand it.
This is something which is becoming more indexable, but it can still cause the content to be non-indexable.
Content Behind a Login
If you have content that you sell or only want to give access to specific people, putting behind a paywall or a login. Unless the crawler has the login, credentials which Google will not, this content will be non-indexable. A good example of this are the SEO tools that we provide, you need to sign up for this, verify an email and make a password which you use to access the content. This means that all of the tools we have are not indexed by Google.
Why Use Non-Indexable Pages?
There are many reasons why a page is non-indexable. The first is accidental. Misapplying a tag or setting the contents of a tag incorrectly can result in content not being indexed.
Alternatively, you may have content that is used for pad landing pages for AdWords or native advertising, you many have a lot of duplicate content on your site or content accessible from multiple URLs. As such you may want to make this non-indexable to avoid other issues from Google such as duplicate content.
This guide is part of an extensive series of guides covering the data that we show in the summary tab of our SEO reporting feature. The following list of links shows all of the categories of data guides, videos and tutorials that we have. If you have any feedback on this or anything else, please fee free to get in touch:
- Canonical Content
- Content Data
- Linking Data
- Page Speed Data
- Meta Data
- Google Analytics Data