- XML Sitemaps
- XML Sitemap Benefits
- Sitemap Size
- Types of Sitemap.
- Multiple Sitemaps
- Google Webmaster Tools
- Common Mistakes.
Depending on the size of the site and volume of pages contained within each directory you may need to have several XML sitemaps.
XML sitemaps are vital for indexation by search engines, especially after a website migration has occurred having concise XML sitemaps allows search engines to identify the new site structure.
An example of an XML sitemap can be found on our site.
Ensure that the website utilises a XML sitemap and contains entries for all canonical URLs of the website that you want indexed. This is used to help index your site and tell Google specific information about how each page should be treated, how frequently it is updated, how important it is, etc.
Ensure the XML file is properly formatted and passes validation tests. Simply run the file through a sitemap validator to test. Once submitted to Google Webmaster Tools, Google will inform you of problems with the XML sitemap but it is better to pre-empt this with your own tests first.
Below we list the different components or metrics that can be added to an XML sitemap.
The code below is an example of how a single entry should look in an XML sitemap along with the opening code:
<?xml version="1.0" encoding="UTF-8"?>
We describe in more detail what each of the above components means and how it should be used.
This defines the priority of a page relative to the other pages in the sitemap, essentially letting Google know what pages are the most important.
This is expressed by a number between 0 and 1, with 1 most typically being the value of the home page. We show this in an example below:
Specifying the change frequency of URLs within XML sitemaps tells search engines how frequently to come back and check pages for new content. This should be set to reflect how frequently content is updated on any given webpage. This may not match exactly with how
Google crawl this page.
This is defined by a date, expressed in the format:
We show an example of this below:
This component defines when the last time a page was changed, setting this tag can assist in speeding up indexing times when submitting an updated sitemap to Google Webmaster Tools. Google will know when they have last crawled a page and will prioritise pages that have changed since that time… Which is what the ‘last mod’ or ‘last modified date’ assists with.
This is defined by a date, expressed in the format ‘YYYY-MM-DD’, we show an example of this below:
There should be no more than 50,000 links per sitemap and the maximum file size should not exceed 10MB.
If the website contains a lot of quality images that would be ideal for Google Image search, ensure the website contains a XML Image sitemap and adheres to the rules detailed in this article.
This will assist in having your images rank within Google for image based searches. Equally if you do not want your images to appear in the SERPS, do not add a sitemap and instead disallow them from within the robots.txt file.
If the website contains a lot of quality videos that would be ideal for Google Video search, ensure the website contains a XML Video sitemap listing the videos. Like images this will assist in ranking your videos and pointing Google towards them when they crawl your site for content.
If the website contains frequently created quality articles that would be ideal for Google News (usually 3 articles per day from multiple authors), ensure the website contains a XML News sitemap and adheres to the above.
On larger sites there is a greater need for dynamically generated sitemaps. This means that when new pages are created they will be automatically added to the sitemap, reducing maintenance and web development time updating them. Equally when pages are removed, they should also be removed from the sitemap.
If a site has thousands of pages it becomes more important to break these pages up over several sitemaps, primarily this makes it easier to identify indexing issues. By having smaller sitemaps you can see in Google Webmaster Tools where pages are not being indexed more easily.
If the sitemaps are structured based on the site’s structure this should provide even more of an indication as to potential indexation errors.
For example, if you have an XML sitemap that contains all URL’s for the ‘red widgets’ products, and you notice that none of them are indexed… Compared to having one sitemap with 5,000 URL’s, you have a clear idea of what pages are affected and can look at them for on page issues that could be the cause.
There should be no more than 50,000 links per sitemap and the maximum file size should not exceed 10MB, Sitemaps.org confirm this. This helps prevent your web server from being weighed down serving large files. 50,000 URLs is a lot of URLs, see the section on site structure below for more information on how to break this up.
Sitemap structure should reflect site structure, for example:
- Master Category (sitemap) – contains links to all other category sitemaps
- Category 1 (sitemap) – contains all products in this category
- Category 2 (sitemap) – contains all products in this category
Group similar content or content located in a similar location into sitemaps to create consistency and logical structure.
A Sitemap Index is the master XML sitemap that links to all other sitemap files or all of the files that contain links to all the other sitemaps.
We cover this in more detail in another part of the knowledge base that addresses SEO Tools and how to use them. But for now, it is worth noting that the sitemap.xml should be submitted to Google Webmaster Tools.
We list below the most common mistakes we see with XML sitemaps.
Ensure the sitemap is up to date and does not include duplicate entries or links to pages that are no longer present on the site. This can be an issue if you use a tool to create a sitemap and you do not filter out duplicate entries such as with trailing slashes and upper and lower case URL characters.
Do not include pages that you do not want indexed within the sitemap, this wasted Google’s time and provides no benefit to you. In fact if this happens on bulk it can also cause other problems; such as seeing a percentage of pages not being indexed.
Stipulating parameters such as priority or change frequency can slow or impede crawling of the site by Google.
Having too many links in a sitemap makes problem solving difficult when it comes to indexing errors, also if the total sitemap is too big it can cause performance issue.
Internal URL’s should not have custom tracking parameters and neither should internal URL’s when listing them in a sitemap.