SEO Web Crawler
Otherwise known as a website scraper, web scraper, or website crawler they are one of the most commonly used SEO tools. If you’re in SEO, you are likely familiar with web crawlers as they are used to scrape valuable SEO data and often facilitate the first stages of many SEO processes.
Our web crawler (Raptorbot) is cloud based, meaning that it can crawl millions of web pages quickly and efficiently without needing to install any software. All you need to get started is a web browser and access to the internet.
Benefits of Raptor's SEO Web Crawler
Quickly Crawl Any Website
Easily Crawl Multiple Websites
Identify Technical SEO Issues
Cloud-Based Web Crawling
Access Your Data From Any Device
What the Raptor Web Crawler Does for You
Because our tool is developed by SEOs for SEOs, it meets your needs head on. Our web crawler has been developed over several years, and has evolved into a powerful and efficient SEO tool.
We scrape all the SEO data you need to perform a technical or SEO audit. More than this, we perform a range of checks on the data that we scrape saving your valuable time. This enables you to spend less time finding problems, collecting data & analysis, and spend more time fixing issues.
Our SEO web crawler collects a lot of data, but we don’t just dump that on you, we check and analyse then aggregate it into sections within the reporting feature. From there, you can efficiently scroll, click and drill down into clearly labelled sections of data to find issues.
Because we’ve been doing SEO for many years, we know what you’re looking for and have made it easy to identify SEO issues.
Identify Indexation Issues
Identify and solve indexation problems with Raptor’s web crawler, such as pages that are disallowed by the Robots.txt or by a meta robots tag on the page. We group pages that are non-indexable with other components such as canonical pages or pages listed in sitemaps.
These page sets are either by definition an error or have a strong potential to be an error, you can find them within a few clicks from logging in.
Identify Canonical Issues
Our reports delineate which pages are canonical, non-canonical or are missing a canonical tag. You can also see all canonical links for each page amongst a range of canonical errors.
The clearly labelled and clustered datasets make identifying canonical issues easy, even if you’re a complete noob. We also mark up the data with warning or error icons so you can see if they are a problem, a severe problem or just informational.
Analyse Meta Data
Meta data contains some of the most powerful individual on-page components for SEO, optimising these is standard practice in an audit. Ensure that all the page titles and meta descriptions on a site are implemented correctly and optimised for target keywords.
We scrape all the site’s meta data and present you with any issues that we have identified such as:
- Duplicate sets
- Missing meta data
- Multiple iterations on the same page
- Meta data that exceed the character limits
Shore-Up XML Sitemaps
Evaluate XML sitemaps to ensure that only the right pages are listed within them. Listing pages that are broken or redirect to other pages, are non-canonical or not indexable is super easy. We also make it easy to find pages that are not found in any sitemap that should be listed.
Make sure when you submit your sitemap to Google, that you have a concise list of all accessible, canonical and indexable pages.
Identify Broken Links
Inaccessible pages are an issue not just for their organic visibility, but for users who won’t be able to access your content. Broken links easily find their way into sites, it only takes on mis-typed character in a URL to produce one.
Crawl your website whenever you need and immediately identify all broken links, redirects and server errors. Our broken link reports show you where the links are located, allowing you to easily fix them.
Optimise Internal Linking
Evaluate the internal linking structure of your site, identify poorly linked to pages and orphaned pages. Review tables and charts that breakdown the internal links on your site into the categories that affect SEO, such as follow and nofollow, unique links, in links and out links and much more.
Identify Redirects & Redirection Errors
Redirects come in several forms, we identify them all and show you where they’re located. We also check to see if they are in places they shouldn’t be, such as in xml sitemaps and canonical links. Additionally, we identify redirect loops and redirect chains which can cause havoc with site accessibility.
Improve Site Navigation
Providing data on click depth and content that is poorly linked to, it's possible to make changes to your site that improve navigation for users. These improvements also make it quicker and easier for Google to find your content.
Audit Google Analytics & Tag Manager Code
Tracking the users on your site is incredibly important, without proper tracking you can fail to spot user flow issues, count conversions, measure revenue and this can lead to a range of problems. For site owners, this can create confusion as to why things are not working, when in reality they may be working just fine.
For SEOs working for clients, poor tracking can be the difference between demonstrating value and keeping a client happy, or losing that client. Raptor can help ensure you have the data you need to make informed decisions.
Identify Social Media Tag Issues
Social media tags, like OpenGraph (OG) and Twitter Cards, are essential components for ensuring that your content appears the way you want it to when shared on social media. These tags also provide an opportunity to include target keywords and help drive relevance between the page and those keywords.
If you have social media tags on a web page you will also want to know whether they are implemented correctly. Our web crawler finds errors with social media tags such as missing components, multiple occurrences of the same component among other things. Whether you’re auditing these components or collecting the data to optimise them, we can help.
Analyse Page Speed
Page speed is one of the most important algorithmic ranking factors, especially since the incredible rise in mobile web browsing, where load times are even more important. Many factors come into play with page speed such as the size of the page and images.
We provide a suite of page speed data and some interesting analysis that help you identify slow pages and the potential cause of the slowness.
Identify URL Issues
URLs are the address of your content; they need to meet a range of requirements & best practices such as:
- Not being too long
- Not containing spaces
- Not containing underscores
- Shouldn’t have too many slugs (directories)
- Containing keywords
With Raptor you are able to quickly and easily identify all these issues and have the data you need to make optimisations to URLs to improve performance.
Help Optimise Images
There are several components to images that can be utilised for SEO such as the filename and alt attribute, both of which can contain a target keyword. Raptorbot crawls and scrapes all this information and provides a downloadable report that identifies all images, all alt tags and all locations where each image is located on your site.
Image file size is also a major factor when looking to optimise a website’s performance, easily identifying large images can save you time and lets you focus on fixing the issues.
Scrape and Report on All These SEO Components
Raptorbot is a powerful and robust SEO web crawler, capable of crawling sites of any size quickly and easily. We provide some real-time data during crawls and perform a range of analysis techniques to the gathered data to save you time.
We scrape all on page SEO data that allows you to make better decisions, identify issues and make recommendations.
- URL – of all pages, images, videos and resources.
- File Type – Such as HTML, CSS, Jpeg, SWF, etc.
- Status – The status code returned by a URL, such as 200, 301, etc.
- Indexable – HTML/Text pages that are not restricted by a robots.txt or meta robots tag from being indexed and have a status code of 200.
- Non-Indexable – Pages that are not indexable due to robots.txt or meta robots’ tags, or a status code other than 200.
- Crawlable – Pages and resources that are not disallowed by the robots.txt.
- Canonical – HTML Pages with a self-referential canonical tag.
- Non-Canonical – HTML pages with a canonical tag that links to another page / URL.
- Canonical URL – The URL within the canonical tag.
- Page Title – The page title or meta title of each page.
- Page Title (Length) – The number of characters including punctuation and spaces of the page title.
- Meta Description – This is scraped from every page.
- Meta Description (Length) – The number of characters including punctuation and spaces of the meta description.
- Meta Keywords – This is scraped from every page.
- Meta Keywords (Length) – The number of characters including punctuation and spaces of the meta keywords.
- Implemented GA Tracking – Whether tracking code is implemented in some form on every HTML page.
- UA Number (First) – The first UA number (For Google Analytics) identified on each page.
- UA Number (Second) – The first UA number (For Google Analytics) if present on a page.
- OG Tags – We scrape all Opengraph Facebook tags on each page.
- Twitter Cards – We scrape all Twitter Card tags on each page.
- Google+ Tags – We scrape all Google+ tags on each page.
- H1 (First) – The first H1 header from each HTML page.
- H1 (Second) – The second H1 header from each HTML page.
- H2 (First) – The first H2 header from each HTML page.
- H2 (Second) – The second H2 header from each HTML page.
- H2 (Third) – The third H2 header from each HTML page.
- H2 (Fourth) – The fourth H2 header from each HTML page.
- H2 (Fifth) – The fifth H2 header from each HTML page.
- Other H tags – We scrape all header tags on each page.
- Word Count – Number of words on a HTML page
- Text Ratio – The ratio of text to code on each HTML page.
- URL Length – The number of characters in each URL.
- Page Depth – The depth of a page within the structure of the site.
- Redirect to – Where a redirect exists, this identifies the URL it redirects too.
- Linked from XML Sitemap – Yes or No.
- In links – The number of links pointing to each URL.
- Unique In links – The number of unique links (one per page) pointing to each URL.
- Follow In links – The number of ‘follow’ links pointing to each URL.
- Outlinks – The number of links pointing to other pages on the same domain, for each URL.
- Unique Outlinks – The number of unique links pointing to other pages on the same domain, for each URL.
- Follow Outlinks – The number of ‘follow’ links pointing to other pages on the same domain, for each URL.
- External Links – The number of links pointing to another domain, for each URL.
- Response Time – Ms (Milliseconds).
- Size (Kb) – of all pages, images, videos and resources.
Custom Crawl Options
We have now added custom crawl options and settings that allow our users and customers to fine tune the way they crawl their sites. The screenshot below shows what new options we have available:
These options allow you to set crawl settings for each of the following:
- Crawl all sub-domains
- Crawl only starting directory
- Crawl images
- Crawl CSS (Stylesheet) files
- Set a maximum directory depth
The custom crawl options for our web crawler allow you to limit or expand the crawl to any or all areas of a site, its sub-domains, file types, and directories. This can preserve your URLs and provides you with full control over what is crawled.
We often find that with a ‘crawl everything’ approach, a site with 500 indexed URLs will often have over 1,000 URLs crawled once images, CSS and other file types are included. This can be vital for technical and SEO audits but for some customers this is a waste of their crawl budget. These newly added features allow you to choose what you crawl and when.
These settings are set at the site-level giving the most granular control over your website crawling. Choosing these options is part of the setup process for adding a new site or project and can always be accessed by editing the settings for a site or project.
To learn more about the custom crawling options we currently have available in our SEO tools, please click the link.