What are xml sitemaps?
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL... https://www.sitemaps.orgOn the surface this seems to be a great addition to any website's armoury. However, before you rush away and create your sitemap, there are a number of pros and cons you should be aware of.
Benefits to using a xml sitemap
The first set of benefits revolve around being able to pass extra information to the search engines.- Your sitemap can list all URLs from your site. This could include pages that aren't otherwise discoverable by the search engines.
- Giving the search engines priority information. There is an optional tag in the sitemap for the priority of the page. This is an indication of how important a given page is relevant to all the others on your site. This allows the search engines to order the crawling of their website based on priority information.
- Passing temporal information. Two other optional tags (lastmod and changefreq) pass more information to the search engines that should help them crawl your site in a more optimal way. "lastmod" tells them when a page last changed, and changefreq indicates how often the page is likely to change.
- Google Webmaster Central gives some useful information when you have a sitemap. For example, the following graph shows googlebot activity over the last 90 days. This is actually taken from a friend of ours in our building who offers market research reports.
Negative aspects of xml sitemaps
- Rand has already covered one of the major issues with sitemaps, which is that it can hide site architecture issues by indexing pages that a normal web crawl can't find.
- Competitive intelligence. If you are telling the search engines the relative priority of all of your pages, you can bet this information will be of interest to your competitors. I know of no way of protecting your sitemap so only the search engines can access it.
- Generation. This is not actually a problem with sitemaps, but rather a problem with the way a lot of site maps are generated. Any time you generate a sitemap by sending a program to crawl your site, you are asking for trouble. I'd put money on the search engines having a better crawling algorithm than any of the tools out there to generate the sitemaps. The other issue with sitemaps that aren't dynamically generated from a database is that they will become out of date almost immediately.
XML sitemap guidelines
With all of the above in mind, I would avoid putting a sitemap on a site, especially a new site, or one that has recently changed structure. By not submitting a sitemap, you can use the information gathered from seeing which pages Google indexes, and how quickly they are indexed to validate that your site architecture is correct.There is a set of circumstances that would lead to me recommending that you use a sitemap. If you have a very large site and have spent the time looking at the crawl stats, and are completely happy with why pages are in and out of the index, then adding a sitemap can lead to an increase in the number of pages in the index. It's worth saying that these pages are going to the poorest of the poor in terms of link juice. These pages are the fleas on the runt of a litter. They aren't going to rank for anything other than the long tail. However, I'm sure you don't need me to tell you that even the longest of the long tail can drive significant traffic when thousands of extra pages are suddenly added to the index.
One question still in my mind is the impact of removing an xml sitemap from a site that previously had one. Should we recommend all new clients remove their sitemap in order to see issues in the site architecture? I'm a big fan of using the search engines to diagnose site architecture issues. I'm not convinced that removing a sitemap would remove pages that are only indexed due to the xml sitemap. If that is the case, that's a very nice bit of information. *Wishes he'd kept that tidbit under his oh so very white hat*
So I guess let the discussions start: do you follow amazon.co.uk (who does have a sitemap), or are you more of an ebay.co.uk (which doesn't)?
Duncan... your posts really are refreshing. It's great to see you bringing up issues that lean more towards the advanced/technical SEO side of things. I have a few things to add to this discussion. I don't understand how a Sitemap could possibly hide architectural issues. I read Rand's post, but it doesn't make any sense. The part where I get lost is when he/you assume that: 1.) Google will index all URLs submitted in a Sitemap. 2.) It is BAD to have pages indexed "unnaturally" because they were listed in the Sitemap. The only GOOD indexed pages are ones that were discovered naturally during a Web crawl. 3.) A poor site architecture can be diagnosed by researching the SERPs and finding out what pages have not been indexed. I think all of those assumptions are wrong. This makes your first "negative aspect of XML Sitemaps" entirely invalid, in my opinion. As for competitive intelligence... I can't think of a single reason why someone would be interested in a competitor's Sitemap. There's simply nothing valuable in that information. Even if my competitors all handed me a list of URLs on their sites that they are trying to get ranked, and included the keywords that each URL was targeting, what would I do with that information? Maybe I'm missing something here, but I just don't see how that information would affect my SEO strategy whatsoever. In any case, if I am missing something and there really is a reason to "hide" your Sitemap, then here is the easiest solution I can think of: Don't name it "sitemap.xml"! Last I checked, you can submit a Sitemap to Google, using any filename you want. Name it something like "secretlistofurls.xml" and your competitors will never know it exists. The other solution would be to cloak your Sitemap, to allow only the search engines to see it, but honestly... that's just ludicrous. Some of the benefits of XML Sitemaps that weren't mentioned: You can use them to diagnose architectural issues. That's right... I'm suggesting the exact opposite of what Rand's post suggested. The reason is... you can submit multiple Sitemaps for a given site. This allows you to submit a Sitemap for each major section (or tier) of your site, and find out if any particular section is being indexed at a lower rate or ratio than the rest. For example, SEOmoz could submit a Sitemap that lists all the URLs in the /blog/ directory, and one that lists all the URLs in the /ugc/ directory. Then you can compare the data that Google Webmaster Tools gives you for each directory. What data? Well... most importantly, the type of data that uncovers architectural issues: Crawl Errors. While researching SERPs may uncover some architectural issues, it still doesn't provide any information about WHY certain pages are not indexed. That's where a Sitemap has a distinct advantage. Notice this excerpt from Google's Crawl Errors page (see previous link): The Web crawl errors page provides details about the URLs in your site that we tried to crawl but could not access...We provide statistics on two types of URLs: * URLs contained in your Sitemap * URLs found through our regular web crawl I agree with you that being careless with Sitemaps could lead to negative effects, but if we assume that the Sitemap data has been generated and tweaked appropriately, I don't see any reason why we would neglect this potentially useful feature. (BTW... I wrote this comment fairly quickly, so if I came across as being arrogant or overly-critical of other ideas, then I apologize. I just wanted to flush out as much info as possible, and I didn't take the time to analyze how I might sound to other people. For the record... I love these kinds of discussions, and I think this was an awesome post! Keep 'em coming, Duncan!)
Darren - Excellent comment to a great article. And yes what difference does it make if your pages get indexed because of great architecture or because of a sitemap?
And yes again GWT ability to point out sitemap errors is a real aid to less than experts like myself who sometimes do boneheaded things like redirect something twice - It will still display fine, but Googlebot chokes on it. Without the sitemap / webmaster tools I would still be scratching my head (or something) about that.
My completely unqualified opinion would be that if you're an SEO Jedi Master, and have a good reason not to use one - great. Otherwise the benefits seem pretty persuasive.
Darren, Very well said, even without being edited. Good reminder that we can and should have multiple sitemaps for large sites, something I didn't know until SMX Adv.
Duncan, Really appreciate the technical posts and answering questions and bringing up issues we need to understand. Thumbs!
I sat back and watched the discussion but wanted to jump in to ponder about the scenarios when pages get indexed for issues not related to a sitemap or architecture issues. I had a new site that NO ONE knew about, but Google found it via my toolbar. Also, if you link to a page, those nosey SE spiders will follow that link. Another mistake: I had a dev site within a password-protected directory, BUT I forgot about disallowing the directory in my robots.txt files. So, by mistake, two sites were indexed due to my oversight.
On another note...good time to re-read Dr_Pete's article:
Thanks Darren, I think you raise some very valuable points.
As you may have spotted from previous posts I'm a huge advocate of sorting out the site architecture, as I think this is the number one issue with most larger sites. I don't think I worded my argument very well, though I'm also willing to believe I'm completely wrong!
I think my argument is that getting a page indexed by including it in a sitemap hides the fact that it doesn't have enough trust and is unlikely to rank for anything. In the majority of cases you would rather have the page indexed than not, but by having it indexed through a sitemap it becomes harder (not impossible by any means) to spot that the page isn't sending you any traffic because it isn't ranking for anything.
The cases this becomes an issue is on large sites where perhaps you are 5 or 6 clicks from the homepage and then on page 20 of a paginated list. In this scenario the spider could reach the page, and could index it, but personally I like the fact that I can see it decided not to.
Once you know that it could index the page but decided not to, you can do something about it. One very valid option at this stage is to decide your architecture is a good as you need it to be and so adding a sitemap at this stage might be your best option.
Keep the comments coming, I'm more than happy to be proved wrong, its happended numerous times before and it will happen again, so certainly don't worry about disagreeing with me (or anyone else here)
Once you know that it could index the page but decided not to, you can do something about it. This is where you lose me. I don't understand how you would know that Google found the page naturally but didn't index it. What if Google just hasn't found the page yet? What I'm saying is that the only way to know for sure that Google is aware of a page but won't index it, is if you submit the URL in a Sitemap.
You can check your logs for a Googlebot UA (user agent) having accessed your site. When Googlebot visits, check Google and see if you're listed. You'd know whether Googlebot had seen the page or not.
Very much agree with Darren and others. I feel like a few other tools are missing that should be considered.
Creating multiple sitemaps (XML vs user-sitemap like HTML) and then the review of analytics data should be incorporated into frequent monitoring of site architecture and indexed pages. For smaller or newer sites, I do believe XML sitemaps are necessary, it helps the search engines know that you exist and while it is no guarantee the pages will get indexed it does help when submitted regularly. Additionally, it does help the diagnosis process and identify pages with higher priority or errors so that you are a few steps ahead in the game. An XML sitemap is just another tool in the toolbox and while it isn't the universal tool it should not be discarded.
One thing is that running an xml sitemap, then checking the file before I submit it helps me find unknown problems too.
Didn't Eric address these issues as well and come to the conclusion that you should create a custom site map that sort of replicates a siloing function? I.e. List a few key structural pages, like home, and category level, then list 3rd tier pages, skipping over second tier, thus giving more weight to the deeper paegs, but without using the priority tag.
Can you please provide the link to this blog post by Eric?
Thank you.
Pritam.
Yeah... I think it's this one: https://www.seomoz.org/blog/whiteboard-friday-link-building-tactics-from-white-to-black
But vimeo is down at the moment, so you'll have to wait to verify.
Thanks a lot.
Nice post, Duncan!
I'm definitely with you on the site architecture ramifications of submitting. None of my clients' sites is large enough that I feel I need an XML sitemap & I'd much rather let the engines sort out the site architecture for themselves and tell ME what the most important pages are. I can learn a lot more from that strategy.
Honestly, the advantages of an XML sitemap far outweigh the potential disadvantages in my mind. Rand is, as always, correct when he says it could obfuscate link architecture problems. Then again, what sort of architecture are you using to doubt that your pages might be crawlable? If you're using best practices in your links and menus, this should rarely be a problem.
Also, if your site's already out there and your competitor has an analyst smart enough to plumb your XML sitemap for intelligence, you can bet that he won't need it to pick you apart.
Likewise, any site with enough content to require an auto-generated sitemap is likely functioning off of some sort of CMS, making it an easy process for any competent developer.
Yes I agree with you. And I use XML sitemaps for sites maintained by me.
You said,
If in case any of the fellow readers use Drupal as their CMS, then there exists a great module, XML Sitemap which creates the sitemaps automatically for you from the database.
Similarly there is Dynamic gSitemap for Joomla and Google XML Sitemaps for Wordpress. (I exclusively use Drupal for my CMS purposes and haven’t tried the above listed modules, for Joomla and Wordpres. So I can’t vouch for them. Just added them to the comment for the sake of completeness.)
Joomla CMS - XMAP is the best sitemap component, supports both Joomla 1.0 and 1.5 natively. Dynamic gsitemaps is out of date and didn't work very well. I really like the Google XML Sitemaps for wordpress auto updates at each post and notifies the search engines of the update. Both allow you to completely customize yoursite maps.
So how often should an XML site map be updated if you are NOT running a CMS capable of dynamic updating?
That depends on what has changed. For example: If you add a bunch of new pages to your site, update the Sitemap accordingly. If you only modified a single page of content, don't worry about updating the Sitemap to reflect the new "last modified" date. That would be a waste of time.
The principal advantage is to notify to google that your site is update ...
If your site is update then google will pass the robots to index it more quickly
Very informative post.
thanks to Duncan for bringing this issue and then lot of thanks to Darren for throwing so much light on this issue.
I have always been in favour of xml sitemaps but being new in this field I would love to hear from experts.
Very nice article, i was searching on google about this topic but, unfortunity i did not get desire answer but now i have cleared my mind through moz.com . thanks for it,
after registering a clients site with google webmaster tools, and adding the site map, we lost a significant part of our organic traffic due to some old gateway pages that were not included in the site map. it seems that they were soon un-indexed from google.
i have since deleted the site map for the site.
Thanks Duncan, this is a great post for a newbie link me.
Great post. Sitemaps have always been a point of interest to me as my site has over 4 million posts and I want to index as many as I can.
I used to work for a very large, dynamic site with constantly changing content coming from a variety of sources. My boss (our director of SEO) wouldn't implement XML sitemaps because he had a bad experience at his previous employer--something about XML sitemaps not being an ideal solution for huge, enterprise-level sites and actually causing a lot of pages to drop *out* of the index. I'm not actually sure I believe him, but he had a very specific, very bad experience so I couldn't exactly argue with it. Has anyone else had similar experiences with massive, dynamically generated sites and XML sitemaps?
Yeah...I'm dealing with that now.
I just took over managing the SEO for a site that has around 400,000 URLs and had dismal SEO already (1% organic traffic). We've seen our traffic actually drop since I submitted the gigantic sitemap. This could be partly due to other changes around the site, but I don't think it helped us at all.
FYI . . . The priority field is ignored.
Wow, They ignore the field!
Why even give us a field if they ignore it!
Anyways, as time goes on i tend to neglect sitemaps completly...I never found them to do a great deal and the irony is I was deleting the priority field anyway!
Brent... not exactly: World's Greatest Low Priority Page BTW... thanks for making me feel like my site has a purpose after all.
Nice a warning system about a feature that isn't used.
Trust me . . . I seldom state something so flatly . . . Google doesn't use this priority field. At least they don't use it to determine a page's priority.
Brent
" FYI . . . The priority field is ignored."
Oh good. Will update my code to generate my sitemap to
foreach ($urls as $url) {
$priority = rand(0, 1);
echo <<<XML
<url><loc>$url</loc><priority>$priority</priority></url>
XML;
}
BTW.. for those who don't read PHP, this bad techy joke has nothing to do with the CEO of SEOmoz.
Just a quick note to say that Will, Duncan and I are all off on Will's stag do for the weekend so won't be around to comment much. Hopefully Duncan and I will bring Will back in a fit state to comment next week :-)
this is nice information about xml tool, but i m having one query regarding to same. Kindly review this xml file https://www.viteb.com/sitemap.xml
will work same as that other xml format?
i did experiment for this site and it is working fine for me. But i would like to hear from other experts.
This is a great post and super for beginners to advanced users. Thanks for putting the time to put this together :-)
Isn't Google's goal to completely automate rankings without human intervention, if so wouldn't the notion that Google gives a crap about a human generated sitemaps fly directly in the face of their stated goal / mission?
Is that a serious question?