This guide provides a complete description of what canonical content data is and how have summarised that data within the summary tab. Canonical content, is the content found on canonical pages… A page is considered canonical when it has a self-referential canonical tag present in the HTML. Canonical pages are preferred by Google and are the most likely to show in Google’s search results.
The video below shows why canonical content is so valuable and how this data can be used to assess the SEO of your website.
Why is Canonical Content So Important?
Also, these pages are the principle focus of much SEO work, as non-canonical pages are unlikely to show in the search results. This section of the summary data looks at the content (primarily word count) that exists on canonical pages.
A site that has pages that are accessible from multiple URLs may appear to have more content than it really does. For example, if each page is accessible from four URLs, there are 100 URLs, and there is an average of 500 words of content on each page… If you were to look at the raw crawl data, there would appear to be 200,000 words of content on the site.
The reality is that there would only be 50,000 words of content on the site, so this view of the content on a site is the most valuable view.
Content is the primary component of a site that they use to determine the relevance between a page / site and its target keywords.
Pages Missing Canonical Tags
If pages are missing canonical tags, then they cannot be canonical and so you will need to look at the ‘content data’ section which is also included in the summary tab.
Compare Canonical Content Data
Comparing canonical content over time is useful under a range of circumstances, for example; if your site has undergone a sitewide change for any reason you can compare the canonical content data between crawl dates from before and after the change was made.
Identify Canonical Content issues
Sites frequently undergo overhauls to redesign their look and feel, adapt to new technology, or migrate to target new multiple regions. Such changes can have an impact on the content on a site. Using the functionality within our software you can easily compare between crawl dates to see how canonical content has been affected.
You can also use this summarised data to identify thin content pages which may be devalued by Google. This is especially important for canonical pages.
What Canonical Content Summary Data Do We Show?
We show the following canonical content data within the summary tab our SEO tool, where each individual metric can be used to identify some issue. But overall this section shows the distribution of content across a site.
Thin Content Pages
Thin content pages are pages that contain very little content, we classify this as pages with less than 100 words. These can be seen as low quality or providing little value by Google and depending on the volume of these pages as a percentage of the whole site, may cause issues with organic visibility.
Read more about thin content pages.
Nearly Thin Content Pages
These are pages with less the 250 words of content, which is a general rule of thumb for the minimum amount of content required for a page to be fully understood by Google.
Read more about nearly thin content pages.
251 to 500 Words
These pages have a completely reasonable amount of content on them and are not suffering from any type of error. For the most part this and the following metrics are primarily used to show content distribution.
Read more about pages with 251 to 500 words.
500 to 1,000 Words
These pages have a good amount of content on them and are not suffering from any type of error. For the most part this and the following metrics are primarily used to show content distribution.
Read more about pages with 500 to 1,000 words.
These pages have a large volume of content on them and are not suffering from any type of error as a result. For the most part this and the following metrics are primarily used to show content distribution.
Read more about pages with 1001 to 2,000 words.
Over 2,000 Words
These pages have a lot of content on them and are not suffering from any type of error. For the most part this and the following metrics are primarily used to show content distribution.
Read more about pages with over 2,000 words.