Cookies disclaimer

I agree Our site saves small pieces of text information (cookies) on your device in order to deliver better content and for statistical purposes. You can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings you grant us permission to store that information on your device.

How to Crawl JavaScript Websites with Raptor

Raptor helps your business increase online sales, revenue and profits through effective SEO.

How to Crawl JavaScript Websites with Raptor

Raptor’s SEO Web Crawler now comes with JavaScript rendering features that allow our crawler to crawl and scrape websites that require JavaScript rendering. In this guide we walk you through some of the history and technical aspects of JavaScript and JS Rendering as well as how we tackle it. 

JavaScript is a complex subject matter, but we have kept this guide as simple as possible in terms of the acronyms and technical language used to make this as easy to understand as possible.

 

A Brief Introduction to JavaScript, HTML and SEO

For many years most websites used HTML alone to build websites, this is ideal for static content such as images and text. With HTML it is easy to specify how a page looks in terms of the colour and size of components, make links and create space for text. However, in order to change the page or add new content based on a users interaction a complete new page needs to be downloaded from the remote server. JavaScript allows the HTML content to be dynamically changed without the need to contact the remote server again. 

The JS scripts that are downloaded with the first request are executed on the clients own machine and hence these interactions load much more quickly. JS can also be used to send remote requests to return just the data required to dynamically update a small portion of content on a page via AJAX requests. Once again, this speeds up the load time of a page as a smaller amount of data is sent with the request.

This gives developers the opportunity to produce web pages that are fast, dynamic and very user friendly. For example on the low end, clicking a button or selecting an option from a drop-down menu could change the content on a page, the data in a table or anything you want. On the high end, it allows single page applications to be developed so that a web app can feel like a desktop app running locally. This is great, as it can help to improve the user experience and delivers a level of interaction that is difficult to impossible to achieve with just HTML.

Nowadays most eCommerce sites use JS to facilitate the functionality required. Updating shopping carts, prices and such like are all controlled by JavaScript. This has produced some extremely cool websites, but initially made some headaches for those of us working in SEO. In the next section, we will talk about the issues that arise when an indexing bot can’t crawl your website because it doesn’t understand JavaScript.

 

JavaScript & SEO

The problem was that crawlers such as Googlebot, which is used to crawl and index your site,  had difficulty crawling sites that use JavaScript (JS) as they needed to render the JS in the same manner as a browser, rather than just download and parse a flat HTML page. This meant that if you wanted to appear in the search results, you had a big problem. Google is now able to render JS sites (for the most part) and thus both crawl and index them, which is great news for JS websites.

This means that most JavaScript sites are now indexable by Google, which is vital for  the SEO of the site. Additionally, most mobile phones now can also render JS, meaning that the site will be accessible from mobile devices; something that was not always the case historically.

If, however you want to audit a JavaScript site, you need a website crawler capable of rendering JS, to enable them to crawl and scrape SEO data from the site. If JavaScript Rendering is not enabled or not available in the web crawler tool you are using this would typically result in a failed crawl, depending on how much of the site requires JS to be rendered. 

Rendering JS is more computationally expensive that just downloading and parsing plain HTML and hence web crawlers often charge a premium for this. Therefore, it’s not a good idea to just leave this option on at all times. Next, we will discuss how to check if you need this turned on for a particular site.

 

How to Detect if a Website Requires JavaScript Rendering

The easiest way to determine if a site requires JS rendering, which is performed by your browser, is to disable JS within your web browser. This will vary depending on the browser, in most cases you simply need to navigate to the settings for your web browser and find the switch to turn off JS. Once you have disabled JS in your web browser reload / load the site you want to check and see what happens.

Try navigating the site if it renders at all and see if the links work, if you can see images, text and check to see if the functionality still works. If you can still see and use the site as you would with JS enabled, you can assume that the site doesn’t use JS for the most part. There may be aspects of the site that use JS such as tracking code, AdSense and other such components. There may even be some pages or directories that use JS but not others, so this method is good to a point but falls over on massive sites.

 

Types of JavaScript Framework

There is no single point of truth for JavaScript, for example, there are many different frameworks and these all work differently with no cross-framework standardisation. This can create issues with cross-browser compatibility as something that works in Chrome may not work in Safari. 

There are multiple JavaScript frameworks that you are likely to find on websites, each framework has its own custom uses, benefits, pros and cons. We won’t dive into the pro’s and cons or features of each framework in this guide, but we have listed the most common JavaScript frameworks below:

  • Angular

  • Aurelia

  • Backbone

  • Embed

  • Ember

  • Meteor

  • Mithril

  • Node

  • Polymer

  • React

  • Vue

 

Server-Side Vs Client-Side Rendering

HTML is rendered server-side, meaning that the rendering is managed on the server (backend) and then sent to the web browser ready for you to see. JavaScript is typically rendered on the client-side (on your computer) and as such there is an additional step to being able to see the content. This is where crawlers would historically fail, as they need to render the content that would otherwise have been pre-rendered for the crawler.

If Google or other web crawlers are unable to render the JS they may not be able to read the content or find and follow the links to other content. If Google cannot see the content, they are unlikely to rank the page in the search results. If Google cannot find and follow links, they will not be able to find the content on your site. In either case, there is the strong potential to encounter indexation problems.

As discussed, Google is now able to render JavaScript locally in order to find content and links for the most part, but there are some instances where they will not. For example, if images are rendered with LazyLoad (Not sure this is still the case?, meaning that they are only loaded when the user scrolls down to reveal the image in the browser, Google will not index those images. This is because the crawler does not scroll down a page in the way that a user does. Consequently, there are potential negative impacts of using types of JS rendering on a site.

 

Why Is JavaScript Rendering Important?

Because the prevalence of JavaScript in websites nowadays, it is important for SEOs to be able to get access to crawl data of a site. In order to perform a technical audit, SEO audit or optimise a site in almost any way, you will need to crawl it and scrape the relevant SEO data. Just because Google can crawl a site doesn’t mean that you have access to the data you need to perform one or any of these SEO tasks.

Hence, using an SEO tool that enables you to crawl and scrape website data even on a site that requires JS rendering is essential for most SEOs.

 

Search Engines and JavaScript Rendering

Google render and crawl JavaScript sites, but Bing do not support this at the moment so you will need to consider this if Bing plays a major part in your channel mix. That said, no software is perfect and that includes Google’s technology. There are still a range of considerations you should take into account with regards to what Google will crawl, render and index.

 

JavaScript Considerations

Google will not do certain things such as render mouseover JS and scrolling, consequently there are a range of JS solutions that will result in components of a page not being indexed. Google recommends that you use a hybrid of HTML and JS to build sites, pure JS sites may still have indexation issues. Some other things to consider are:

  • Google require clean, unique URLs for any page, and links to be in proper HTML anchor tags (it is ok to offer both a static link, as well as calling a JavaScript function)

  • Google don’t click around your site in the way a user does, consequently, loading additional events after the render (click, hover and scrolling for example) are not going to be rendered

  • All the page resources (such as JS, CSS, images, etc) need to be available in order to be crawled, rendered and then indexed

  • The rendered page snapshot is estimated to be taken at around 5 seconds, although there may be some flexibility within this. If a page takes too long to render (typically longer than 5 seconds), there is a risk that some elements won’t be seen, rendered and consequently indexed

  • Images that use LazyLoad will not be indexed as this requires scrolling

  • Google’s rendering is performed separately to the indexing process. Initially Google crawls the static HTML of a website while deferring the rendering until it has resources available to do so. Only after rendering will Google discover further content and links available and this can take days to a week

WRS & PRS

A Web Rendering Service (WRS) and Page Rendering Services (PRS) are what Google refer to when they are rendering website content.

 

Google VS Raptor JavaScript Rendering

We use the latest stable Chrome Driver extension to render website content, which is exactly how Chrome users will render websites and their content. Google tackles this task in the same way and as such, you should get similar to identical results using Raptor as you would if you used Googlebot.

 

How to Crawl a JavaScript Website

Using Raptor to crawl websites is super easy, within the setup options, and within the site settings for each site added to Raptor, is a switch that turns JavaScript rendering on. This is switched off by default but can be turned on with a single click of the mouse. 

You don’t need to know anything about JavaScript or JS rendering in order to render & crawl a JS website with Raptor. All you need to know is whether the site needs to use the JavaScript rendering functionality that we provide. You can determine this by trying to crawl a site without JavaScript rendering first, if it doesn’t work, try enabling JavaScript rendering.

Because this process is more computationally heavy, we remove 4 URLs from your monthly limit for each URL crawled. This means that crawling a site with 10K URLs will remove 40K URLs from the limit, which is refreshed / replenished at the beginning of each new monthly billing cycle. Unlike some of our competitors, we do not withhold this feature from cheaper subscription plans!

To limit the usage, you can turn off the crawling of other file types such as images and CSS files, or you can specify where the crawler will go by limiting the crawl to just the directory / sub-directory specified. It is also possible to prevent the crawler from crawling subdomains to save on your URL usage.

You can check how many URLs you have used and have remaining by going to the usage page within the software. From here you can see where you are using URLs, how frequently, how many you have left, and when they will renew.

 

Limitations of Raptor’s JavaScript Rendering and Website Crawling

Much like Google, we will not scroll down a page when rendering page components, any components that require scrolling will not be seen, rendered or crawled.

We do not render mouseover JS components or clickable components that load additional content on the same page (URL).

Your monthly usage limits will cap crawls if you reach this limit. We add in a small amount of wiggle room for crawls, but this will not allow you to crawl a 100K site if you only have 50K URLs in your plan. Bear in mind that usage is used up 4x faster with JavaScript rendering enabled on all resources.

 

What Happens After a JavaScript Website is Crawled?

After you have crawled a JavaScript website with Raptor, the output is exactly the same as you would get had you crawled a purely HTML website. We still give you all the same data, perform the same checks and analysis and allow for the same reports and downloads.

Fundamentally, the only difference between using JavaScript Rendering and not, is in the crawling and scraping of a site and not in the analysis or reporting.

 

 

SEO WEB CRAWLER - FREE 30-DAY TRIAL!

30-Day Free Trial of our SEO Web Crawler Now Available, sign up with a valid email address and your name below to get instant access. No Credit Cards Required.