Cookies disclaimer

I agree Our site saves small pieces of text information (cookies) on your device in order to deliver better content and for statistical purposes. You can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings you grant us permission to store that information on your device.

Web Scraper

Scraping websites for data is most efficient when run from servers rather than your own computer. We have developed a robust cloud-based website scraper that allows you to quickly and easily scrape websites for anything you want.

Layman’s Terms

Our web scraper is an automation tool, the easiest way to explain this is to imagine you wanted to build a list of all the web pages and resources like images and files from a site… This would be a very manual process where you would need to use a web browser to look at each page, copy the URL and put it in a spreadsheet or document.

Now imagine you want to know the number of words on each page, you would need to either count them or use some tool to establish the word count for every page. The more data you want to extract from a page, the bigger the job, on a site with more than a few pages this would become a time heavy and boring process. Our web scraper does all of this for you.

Why Cloud-Based is Best?

Installing a web scraper on your computer comes with various disadvantages over a cloud-based alternative:

Desktop Based Scrapers

  • Any problem with your computer and you can’t scrape
  • You can only scrape from the computer with the program installed
  • All requests come from your computer’s IP address (which can get blocked by anti-scraping technologies)
  • You often need to buy and implement your own proxy servers to scrape from multiple IP addresses
  • Updates to the operating system have the potential to prevent the program from functioning as intended
  • Typically, no analysis is performed on the raw data that you export (less automation = more work for you)
  • You need to have your computer and with internet access for the full duration of a crawl
    • If your computer crashes, the crawl fails
    • If your computer turns of or restarts, the crawl fails
    • If the internet drops out, the crawl fails
    • If the program crashes, the crawl fails
  • Very inefficient on larger sites

Cloud Based Web Scaping

  • Set and forget, setup a crawl and come back when we email you to let you know it’s complete
  • You don’t need to have the web page open for the crawl to complete
  • No processing, memory or hard drive usage for you during crawls
  • All you need is a web browser, no operating system or app conflicts
  • Access your account from multiple devices, including your phone
  • Let us worry about IP addresses
  • The most efficient way to scrape website data on sites of any size
  • Schedule crawls
  • Past crawls are all archived and easy to access at any time and from any location

Scraped Web Data

There are many reasons and situations why you would need to use a website scraper, no matter why you need to scrape data, Raptor provides you with an easy to use web scraping tool. Typically, our users are scraping data for SEO, whether it’s a competitor site or their own, Raptorbot scrapes all the SEO data you need.

The below is an up to date list of the different types of data that we scrape from a site, on a per URL basis:

  • URL – of all pages, images, videos and resources.
  • File Type – Such as HTML, CSS, Jpeg, SWF, etc.
  • Status – The status code returned by a URL, such as 200, 301, etc.
  • Indexable – HTML/Text pages that are not restricted by a robots.txt or meta robots tag from being indexed and have a status code of 200.
  • Non-Indexable – Pages that are not indexable due to robots.txt or meta robots’ tags, or a status code other than 200.
  • Crawlable – Pages and resources that are not disallowed by the robots.txt.
  • Canonical – HTML Pages with a self-referential canonical tag.
  • Non-Canonical – HTML pages with a canonical tag that links to another page / URL.
  • Canonical URL – The URL within the canonical tag.
  • Page Title – The page title or meta title of each page.
  • Page Title (Length) – The number of characters including punctuation and spaces of the page title.
  • Meta Description – This is scraped from every page.
  • Meta Description (Length) – The number of characters including punctuation and spaces of the meta description.
  • Meta Keywords – This is scraped from every page.
  • Meta Keywords (Length) – The number of characters including punctuation and spaces of the meta keywords.
  • Implemented GA Tracking – Whether tracking code is implemented in some form on every HTML page.
  • UA Number (First) – The first UA number (For Google Analytics) identified on each page.
  • UA Number (Second) – The first UA number (For Google Analytics) if present on a page.
  • OG Tags – We scrape all Opengraph Facebook tags on each page.
  • Twitter Cards – We scrape all Twitter Card tags on each page.
  • Google+ Tags – We scrape all Google+ tags on each page.
  • H1 (First) – The first H1 header from each HTML page.
  • H1 (Second) – The second H1 header from each HTML page.
  • H2 (First) – The first H2 header from each HTML page.
  • H2 (Second) – The second H2 header from each HTML page.
  • H2 (Third) – The third H2 header from each HTML page.
  • H2 (Fourth) – The fourth H2 header from each HTML page.
  • H2 (Fifth) – The fifth H2 header from each HTML page.
  • Other H tags – We scrape all header tags on each page.
  • Word Count – Number of words on a HTML/Text page
  • Text Ratio – The ratio of text to code on each HTML page.
  • URL Length – The number of characters in each URL.
  • Page Depth – The depth of a page within the structure of the site.
  • Redirect to – Where a redirect exists, this identifies the URL it redirects too.
  • Linked from XML Sitemap – Yes or No.
  • In links – The number of links pointing to each URL.
  • Unique In links – The number of unique links (one per page) pointing to each URL.
  • Follow In links – The number of ‘follow’ links pointing to each URL.
  • Outlinks – The number of links pointing to other pages on the same domain, for each URL.
  • Unique Outlinks – The number of unique links pointing to other pages on the same domain, for each URL.
  • Follow Outlinks – The number of ‘follow’ links pointing to other pages on the same domain, for each URL.
  • External Links – The number of links pointing to another domain, for each URL.
  • Response Time – Ms (Milliseconds).
  • Size (Kb) – of all pages, images, videos and resources.

 

Why Us?

This scraped data helps you to perform a range of SEO functions:

  • Technical audits
  • Optimisation audits
  • Competitor analysis
  • Keyword Research (Competitive Data)

Unlike some of our competitors we provide some analysis for you, most notably we tell you which pages are indexable based on a range of criteria (see above). We also tell you which URLs are canonical, this is based on whether the URL crawled matches exactly the URL specified within the canonical tag. For example:

Canonical URL = https://example.com/page/

  • https://www.example.com/page/ = Not canonical (uses www)
  • http://example.com/page/ = Not canonical (not https)
  • https://example.com/another-page/ = Not canonical (is a completely different URL)
  • https://example.com/page = Not canonical (not using a trailing slash)
  • https://example.com/Page/ = Not canonical (Uses a capital letter)

Product Features

Cloud-Based Web Scraping

The most efficient way to scrape sites of any size is from cloud-based servers. All you need is a web browser and your logins to access and use Raptor!

Scrape All SEO Data

Scrape all the SEO data you need to perform a range of SEO processes and functions. Export only what you need easily into CSV or XLS formats

Let Us Do Some of the Analysis

We provide you with a range of additional analysis based on the scraped data. This includes whether a page is indexable and canonical.

Multi-Tab Spreadsheets

Delineate all of the data automatically into different tabs of a spreadsheet. No need to filter, copy and paste data to create a beautiful report.

Easy to Use

Raptor is designed to be simple to use, all the complex stuff happens in the background. Adding projects, scraping sites and exporting data couldn't be easier.

Set & Forget

Set a web scrape going and come back once you receive an email notification informing you the scrape is complete.

Sign Up For Early Access
& Earn a Chance to Win 1 Years Free Subscription!

Sign up for early access today!