Cookies disclaimer

I agree Our site saves small pieces of text information (cookies) on your device in order to deliver better content and for statistical purposes. You can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings you grant us permission to store that information on your device.

Web Crawler Data

Web Crawler Data

Our Web Crawler collects a lot of data, we label this in CSV files, spreadsheets and reports as column headers when you export the data. If you come across a term you don’t understand in a header, this guide should help you out with some definitions.

  • URL – of all pages, images, videos and resources.
  • File Type – Such as HTML, CSS, Jpeg, SWF, etc.
  • Status – The status code returned by a URL, such as 200, 301, etc.
  • Indexable – HTML/Text pages that are not restricted by a robots.txt or meta robots tag from being indexed and have a status code of 200.
  • Non-Indexable – Pages that are not indexable due to robots.txt or meta robots’ tags, or a status code other than 200.
  • Crawlable – Pages and resources that are not disallowed by the robots.txt.
  • Canonical – HTML Pages with a self-referential canonical tag.
  • Non-Canonical – HTML pages with a canonical tag that links to another page / URL.
  • Canonical URL – The URL within the canonical tag.
  • Page Title – The page title or meta title of each page.
  • Page Title (Length) – The number of characters including punctuation and spaces of the page title.
  • Meta Description – This is scraped from every page.
  • Meta Description (Length) – The number of characters including punctuation and spaces of the meta description.
  • Meta Keywords – This is scraped from every page.
  • Meta Keywords (Length) – The number of characters including punctuation and spaces of the meta keywords.
  • Implemented GA Tracking – Whether tracking code is implemented in some form on every HTML page.
  • UA Number (First) – The first UA number (For Google Analytics) identified on each page.
  • UA Number (Second) – The first UA number (For Google Analytics) if present on a page.
  • OG Tags – We scrape all Opengraph Facebook tags on each page.
  • Twitter Cards – We scrape all Twitter Card tags on each page.
  • Google+ Tags – We scrape all Google+ tags on each page.
  • H1 (First) – The first H1 header from each HTML page.
  • H1 (Second) – The second H1 header from each HTML page.
  • H2 (First) – The first H2 header from each HTML page.
  • H2 (Second) – The second H2 header from each HTML page.
  • H2 (Third) – The third H2 header from each HTML page.
  • H2 (Fourth) – The fourth H2 header from each HTML page.
  • H2 (Fifth) – The fifth H2 header from each HTML page.
  • Other H tags – We scrape all header tags on each page.
  • Word Count – Number of words on a HTML page
  • Text Ratio – The ratio of text to code on each HTML page.
  • URL Length – The number of characters in each URL.
  • Page Depth – The depth of a page within the structure of the site.
  • Redirect to – Where a redirect exists, this identifies the URL it redirects too.
  • Linked from XML Sitemap – Yes or No.
  • In links – The number of links pointing to each URL.
  • Unique In links – The number of unique links (one per page) pointing to each URL.
  • Follow In links – The number of ‘follow’ links pointing to each URL.
  • Outlinks – The number of links pointing to other pages on the same domain, for each URL.
  • Unique Outlinks – The number of unique links pointing to other pages on the same domain, for each URL.
  • Follow Outlinks – The number of ‘follow’ links pointing to other pages on the same domain, for each URL.
  • External Links – The number of links pointing to another domain, for each URL.
  • Response Time – Ms (Milliseconds).
  • Size (Kb) – of all pages, images, videos and resources.