Cookies disclaimer

I agree Our site saves small pieces of text information (cookies) on your device in order to deliver better content and for statistical purposes. You can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings you grant us permission to store that information on your device.

Custom Web Crawling Options

Raptor helps your business increase online sales, revenue and profits through effective SEO.

Custom Web Crawling Options

At Raptor we have recently added in more functionality to our web crawler that allows our users to change crawl settings and options. This allows you to create customised crawls for a number of reasons and with significant benefits. Firstly, lets cover the settings themselves.

You will see these options when you:

  • Add a project
  • Edit the settings for a site
  • Edit the settings for a project
  • Add a new site to a project

The aim of these options is to provide you custom control over what gets crawled and what doesn’t get crawled during any crawl for each site. Every site can have its own custom crawl options specifically set to whatever your needs for that site are.

 

Benefits of Our Custom Web Crawler Options

Crawl Only What You Need To

Segment Site by region

Crawl Or Exclude Sub-Domains

Choose File Types to Crawl

Set Your Maximum Directory Depth

 

 

Website Crawl Options

The options look like this this throughout the software, no matter where you are setting them:

 

Custom Crawl Settings

 

They are controlled by switched that show as green when set to ‘on’, in the example above (which are the default settings) the option for ‘Crawl only starting directory’ is set to on. The only exception to this at the moment is the maximum directory depth, which is determined by a number between 1 and 100.

We cover these options in more detail in the sections below.

 

 

Crawl All Sub-Domains

Switching this ‘on’, means that any sub-domains that we find during the crawl will be included in the crawl. If your site has multiple sub-domains, we will crawl them all if this option is turned on.

 

Crawl All Sub Domains

 

 

Crawl Only Starting Directory

This option is set to ‘on’ by default and means that we will not crawl any links found during the crawl that sit outside of the starting directory. For instance, if you have a multi-regional site with the URL of “example.com/uk/” we will only crawl URLs that sit within the UK directory. Meaning that if you had directories for other countries such as “example.com/us/” we would not crawl those when this option is turned on.

Also, worth noting is that if you enter a URL that contains a sub-domain such as “uk.example.com/“ we will crawl the sub domain and anything that sits within it but would not crawl other sub domains.

 

Crawl Only Starting Directory

 

 

Crawl Images

This option is turned off by default but can be easily switched to on by clicking the switch icon. Once switched on, this will instruct our web crawler to crawl any images that we identify during the crawl. This can however be restricted by other options, for example; if your images sit on a sub-domain that is not part of the URL entered and you have opted to not crawl all sub domains or to only crawl the starting directory… We will not crawl these images.

For the most part this option simply allows you to conserve your URL usage by not crawling images, which unless you need to for an SEO audit may not be relevant to you. We find that often each page of a site has at least one image, so you might find that a site of 500 web pages has over 1,000 URLs when you include images.

If you are looking to optimise images for any reason, such as to target keywords for effectively or improve page load times, we suggest crawling them.

 

Crawl Images

 

 

Crawl CSS Files

CSS of Style Sheets are files that control how a site looks, often there are only a handful of CSS files on a site. However, some sites can have hundreds of these files, but rarely are they used in the auditing or SEO analysis of a website. As such this option is set to ‘off’ by default. This means that we will not follow and crawl any URLs that are CSS files.

 

Crawl CSS Files

 

 

Crawl JS Files

JS (JavaScript) files control functionality on a site and the amount a site has can vary greatly. Similar to CSS files, you are unlikely to need to crawl JS file for SEO audits or technical audits. As such by  default this is set to ‘off’.

 

Crawl JS Files

 

 

Render JS Pages

This option differs from the rest in that if sites are run by JS (JavaScript) and require JavaScript rendering in order to crawl them, you will not need this option.

It is important to note that rendering JavaScript sites is much more computationally intensive and so the URL usage is 4-times higher when this option is switched on. What that means is that for each URL crawled, we charge / remove 4 URLs from your usage for the month / billing cycle. Hence, a site of 500 URLs will cost 2,000 URLs to crawl using JavaScript rendering.

 

JavaScript Rendering

Read more about JavaScript rendering by clicking the link.

 

 

Maximum Directory Depth

This custom crawl option determines the maximum number of directories that the software will crawl down to. We set this as a default of 10 and a maximum of 100. For example, the URL below is two directories:

Example.com/dir/

The URL below is five directories deep:

Example.com/dir/category/range/product/model/

Typically, sites have less than 10 directories but if you want to limit your crawl to just down to category pages of a site that you know sit in the 3rd directory, you can specify this but setting the number to three.

 

Crawl Maximum Directory Depth

 

 

Additionally, if you have a known issue with infinite loop issues on your site, this option will prevent a crawl from continuing until all your URLs for the current billing cycle have been used up before stopping.

 

Project Management and Crawl Settings

You can add multiple main sites within a project, this feature or functionality allows you to add multiple sites or multiple directories within a single TLD. The example below shows how you could structure this, with each regional version of a site added separately, using the ‘Crawl only starting directory’ option, each regional site will be crawled without crawling the other regions.

 

Crawl Options For Regional Sites and Directories

 

You can give each site a name that matches the region or language, or if your site is divided by sub domains to target regions, you can set it up this way to the get the same result. Equally if you have a site with multiple unrelated product ranges that are segmented or located in different directories, you can add them separately like this.

This functionality allows for a range of different real-world applications depending on your needs.

 

 

Adding New Sites

You can always add new sites or new variations of a site to a project, whether they are competitors or main sites. When doing this you can set the crawl option for each site as you add them in, the screenshot below shows how this looks when adding either main or competitor sites:

 

Crawl Options for New Sites

 

 

Site-Level Crawl Options

All crawl settings are set at the ‘site level’, meaning that each site can have a unique set of crawl options. These can be amended or changed at any point before a crawl is performed by clicking the site settings icon within a project or the project setting icon from the home page of the software.

The screenshot below shows what this icon looks like throughout the software, by clicking this you can change the crawl options for a crawl. Bear in mind that if you change these options, comparing historical crawl data could produce odd comparative results.

 

Site-Level Crawl Options

 

If you were to choose to crawl an entire site and all sub-domains and compared this to a crawl of a specific directory, the data has the potential to be radically different. This maybe what you want to look at, but it is typically a good idea to keep settings consistent for any one site. This helps when comparing or analysing crawl data over time in any meaningful way.

 

 

How Does This Benefit You?

At Raptor we are customer centric and as such are always looking for ways to improve the experience of using our SEO tools. Our primary focus is on giving our customers the functionality the need while improving the experience. We also want to ensure that our customers are not spending more money than they need to use our services.

These options allow you to limit crawls or open them up as wide as possible. If keeping your usage low is important to you, these crawl options allow you to limit the crawl to specific areas of a site or specific types of files within the site. Hence, this saves your crawl budget and allows you to allocate URLs where they are needed.

We are continually adding in new crawl options to improve the available functionality of our software and the user experience it provides. So, check back in regularly for updates and new releases!

 

 

 

SEO WEB CRAWLER - FREE 30-DAY TRIAL!

30-Day Free Trial of our SEO Web Crawler Now Available, sign up with a valid email address and your name below to get instant access. No Credit Cards Required.