Cookies disclaimer

I agree Our site saves small pieces of text information (cookies) on your device in order to deliver better content and for statistical purposes. You can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings you grant us permission to store that information on your device.

Search Engines

Search Engines

How Do Search Engines Work?

In order for you to understand SEO (Search Engine Optimisation) you need to know a little bit about how Search Engines work. You are probably already aware of Google, which is the most popular search engine outside of China where Baidu is the most popular. We typically refer to Google as it is not only the most popular search engine but it’s so much more popular that they have market shares of 70%-90% in most countries.

Crawling, Indexing and User Agents

Search Engines use ‘user agents’ also referred to as crawlers, robots, and spiders to navigate through a website and log data about what they find. This process is referred to as ‘crawling’ a site with the purpose of ‘indexing’ pages.

Google tries to crawl all web content by default, whether it’s a webpage with text, an image, video or PDF, all of it is added to Google index of web content.


Essentially this involves following links on a webpage… These could be in the main menu, footer or links from within content and they could lead the user agent to other pages on your site or to another website.

Thus one of the objectives of SEO is to make your site as easy to crawl as possible. Because Google’s user agent (Googlebot) will only spend a small amount of time crawling a website, you may want to guide or restrict the user agent to crawling only the webpages and content that will be relevant to it. Use of sitemaps, Google Webmaster Tools and a Robots.txt or Robots Meta Tags assist in this process.

There are a number of things that can prevent a crawler from being able to crawl a website such as:

  • Poor internal linking
  • Lack of, or incorrect sitemaps
  • JavaScript menu navigation and links
  • Incorrect use of code within robots.txt
  • Incorrect use of robots meta tags


What we mean by indexing is that the pages discovered during crawl are added to Google’s index (list) of web pages that it can show in the SERPs. Once a page is indexed it is possible for it to appear in the search results when someone searches for a term relevant to the page (target keyword)… This is given that the page is not breaking Google’s guidelines or has been deliberately set with tags or code to prevent it from being indexed.

Thus there arises a need for SEO to assist in this process… There is a suite of tools available to an SEO such as canonical tags, robots.txt and Meta tags that help to control or influence the indexation of pages. There are also a number of issues that can prevent a page or piece of content from being indexed, for example:

  • Flash content
  • JavaScript menu navigation and links
  • Incorrect use of code within robots.txt
  • Incorrect use of robots meta tags
  • Incorrect use of canonical tags
  • Pages or websites that violate Google’s guidelines can be removed from the index or not added to it

Google’s Algorithm

Google applies its algorithm to indexed pages, for every search performed there are likely thousands to millions of pages within Google’s index that could be shown… Using a combination of around 200 factors or signals, Google’s algorithm determines what pages from its index to show and in what order they appear in the results.

Some of these factors are things like the region and language of the user, search phrase and the webpages. Other factors such as the device on which the search is performed combined with time of day and location can all influence the search results. Typically these signals can be broken down into two main categories described below.


Primarily Google needs it search results to be relevant to the people searching, this is the very foundation of the service that they offer. If their search results are not relevant, users will use another search engine. Consequently it is relevance that Google strive to achieve with their search results.

There are both on-page and off-page factors that can contribute to relevance such as the content on a webpage, usage of keywords in on-page components, etc. We cover these factors in more depth in our complete on-page SEO checklist.

One of the strongest factors in relevance is CTR (Click Through Rate)… Organic CTR is calculated by taking the number of clicks an organic listing received and dividing it by the number of impressions that the listing received. An impression is defined as a person viewing the organic listing or the number of times the organic listing was shown in the SERPS.

A high CTR is a strong signal to Google that the listing being shown is relevant to the searcher. If a webpage received a higher CTR than other listings when in the same position it is likely to be moved up in the rankings. Consequently there are on page components like the Meta description that act as indirect ranking factors as they can influence the CTR.


Because there may be thousands or millions of relevant webpages that could be shown for a search, Google must use another set of signals to determine which ones to show. This set of additional signals contributes to a webpage’s authority and is primarily driven by the volume of backlinks that a webpage or website has.

Page authority is measured in an aggregated metric called PageRank, named after ‘Larry Page’ who was one of the founders of Google. PageRank essentially works by counting the number of links to a page and assessing the quality of those links. PageRank is also distributed throughout a site through internal linking. We discuss backlinks in more detail later in this guide.


30-Day Free Trial of our SEO Web Crawler Now Available, sign up with a valid email address and your name below to get instant access. No Credit Cards Required.