XML sitemaps are a great way to ensure your site is crawled and indexed properly. Learn how to take control and build your own!
When it comes to creating an XML sitemap, a car analogy works best. Sure, automatic is great. It’s convenient and affords you an extra hand to turn up that Adele song you love to sing along to terribly. But any driving enthusiast will tell you that a manual shift gives you a closer connection to the vehicle and to the road, and that’s exactly what we’re after – more connection. More control.
These days, there are many options for automating the creation of XML sitemaps, whether through a plugin or an online sitemap generator. Some are better than others (the Yoast plugin for WordPress does a pretty good job), but the machines haven’t replaced us just yet. Automation still does not measure up to a carefully-constructed sitemap by hand. So roll up your sleeves and follow these steps to create and submit custom XML sitemaps that represent your site better than any plugin or tool can.
Step 1: Know What You’re Looking For
An XML sitemap is essentially just a list of the pages that make up your website. But the key thing to remember is that we are only concerned with pages that should be in Google’s index. You don’t want to put a login page or a post-purchase “thank you” page on your sitemap, for instance. Before you set out to gather up the URLs of the pages on a site, let’s ask a simple question:
“Is this a page that should be in Google’s index?”
If you’re a bit more versed in SEO, you can also ask:
“Does the page return a 200 status code?”
“Does the page self-canonical?”
Doing this exercise will give meaning to everything we encounter in Step 2.
Step 2: Collect Your Pages
Now that we know exactly what we’re looking for, let’s go find it! In the first part of this step, we’re going to gather up all of the website’s URLs. The easiest way to do this is with a crawler like Screaming Frog, which can quickly crawl the pages of your site and spit out a list of URLs.
Alternatively, you can simply follow each of the site’s main navigation options down to their deepest level (also known as a human crawl). This is actually the method I prefer. If the site isn’t too big, it’s a great way to learn about the navigational logic and user-friendliness of your site.
Let’s use Go Fish Digital’s site as an example. Before I toss it into a crawler, I’m going to browse it manually and gain some insights. My first takeaway, as is often the case, is from the main navigation.
On the far left, we have a logo and branding, which links to the home page. You guessed it – the home page URL is going in the sitemap.
On the right, we have About, Services, Blog, and Contact.
Right away, I’m going to begin grouping. The About and Contact pages are more general pages, like the home page, so I consider those three URLs as a “General” section of the site.
Next, we have Services and Blog.
Services has a drop-down menu – this is a perfect reason to group these pages together!
Then, the blog. I’ve only displayed 3 posts here, but there are a lot more blog posts on GFD’s site. This is where a crawler would come into play.
Would you look at that? We now have the site sectioned out nicely. With our URLs grouped together like this, we can make a beautifully-organized sitemap!
Step 3: Code Your URLs
If you’ve applied Step 2 carefully to your website’s pages, you now have a list of URLs that need to be formatted with the appropriate tags. XML is a lot like HTML – in fact, the “ML” in both stands for “markup language.”
For this step, you’ll need a text editor so you can create an XML file. I highly recommend Sublime Text. They offer a lifetime license key, and it will serve your SEO and text-editing future better than the finest hound.
a.) Let’s begin with an opening <urlset> tag:
b.) Next, add your first URL with the appropriate <url> and <loc> tags:
c.) When you’ve entered your last URL, simply close the <urlset> tag:
Now that you know the different tags, get your eyes used to looking at a simple XML sitemap. Here is what the finished product would look like:
Step 4: Validate Your Sitemap
Now it’s time to run your sitemap through a validator to make sure all the syntax is correct. Go ahead and save your file and name it sitemap.xml. Then, visit https://validator.w3.org/#validate_by_upload and upload your XML file. Hopefully, you see this message:
If there are any errors, the validator will quote the line that contains the error so you can go back into Sublime Text and easily locate it.
Step 5: Add It To The Root
Next, you’ll want to add your sitemap file (sitemap.xml) to the root folder of your site. This can be done locally, through FTP or (ideally) by a developer. Adding your sitemap file to the root folder means that it will be located at yoursite.com/sitemap.xml. This is true for a lot of sites! Trying picking a couple of sites you regularly visit and type “/sitemap.xml” after the TLD (the “.com,” “.net,” etc.).
Step 6: Add It To The Robots(.txt)
A robots.txt file is a simple text file with instructions for the crawler that is visiting your site. The file exists in the root folder, so you can probably guess where it’s located – yoursite.com/robots.txt. One of the lines you can add to your robots.txt file is the “Sitemap:” line. This will ensure that the crawler goes and checks out your perdy, custom XML sitemap. Here’s how the the line would look, assuming your site is secure (HTTPS):
Apple.com has a number of “Sitemap:” lines in their robots.txt file (https://www.apple.com/robots.txt):
Adding a line to your robots.txt file that points to your sitemap is somewhat debated as effective, but the purpose of this guide is to be thorough, and it is still a best practice I see utilized by many top SEOs and successful websites.
Step 7: Submit Your Sitemap
We gathered, we grouped, we tagged, we validated, and we added to the root. Now we’ll discuss how to submit your sitemap to Google and Bing. Doing so can improve the indexation of your site! Please note that I’m assuming you have Google Search Console and Bing Webmaster Tools accounts set up.
How to submit a sitemap to Google
a.) Sign into your GSC account.
b.) Click Crawl > Sitemaps > Add/Test Sitemap
c.) Enter “/sitemap.xml” into the available field and submit your sitemap!
How to submit a sitemap to Bing
a.) Sign into your BWT account.
b.) Click Configure My Site > Sitemaps
c.) Enter the full URL of your sitemap and submit your sitemap!
Check in periodically (but not obsessively) to ensure your sitemap URLs are being crawled. It is NOT uncommon for only part of your sitemap to be crawled. In fact, we rarely see a sitemap crawled in its entirety. That’s asking a lot and the major search engines love to be coy.
(Bonus) Next-Level Sitemapping: Creating an Index
The whole point of a sitemap is to make the pages of your site as crawler-accessible as possible. To do this, we present them in a simple, organized list. If you want to take order to the next level, you’ll want to create a sitemap index.
A sitemap index is an XML file that refers to a number of individual XML sitemaps. For Go Fish Digital’s site, we could make an individual sitemap for each grouping we created in Step 2:
We would add each of these files to the root folder of the site and point to them within a sitemap index, which uses its own XML tags:
We would then name the sitemap index, validate, add it to the root folder, and submit it within the search engine consoles for Google and Bing – no need to submit each individual sitemap! The index will take care of everything. Additionally, you can add a “Sitemap:” line to your robots.txt file that points to the index, rather than pointing to each individual sitemap (looking at you, Apple).
A sitemap index with individual sitemaps represents the highest level of organization and is a superb way to present the indexable pages of your site to the major search engines.
Make Your Map(s)!
Whether you’re looking at your own site, a friend’s site, or a client’s site, you now have some great guidelines for creating a meaningful XML sitemap or sitemap index. So build your own custom sitemap and take charge of your SEO, learn more about your website, and cut the fat caused by automation.
Follow me on Twitter @briangormanGFD