UPDATE: There's now an even better way to tackle the issue I described in this post. I recommend you turn your attention to my newer post on the topic: Creating Crawlable, Link-Friendly AJAX Websites Using pushState().
I'll leave this post here for nostalgia!
This post begins with a particular dilemma that SEOs have often faced:
- websites that use AJAX to load content into the page can be much quicker and provide a better user experience
- BUT: these websites can be difficult (or impossible) for Google to crawl, and using AJAX can damage the site's SEO.
Fortunately, Google has made a proposal for how webmasters can get the best of both worlds. I'll provide links to Google documentation later in this post, but it boils down to to some relatively simple concepts.
Although Google made this proposal a year ago, I don't feel that it's attracted a great deal of attention - even though it ought to be particularly useful for SEOs. This post is targeted to people who've not explored Google's AJAX crawling proposal yet - I'll try to keep it short, and not too technical!
I'll explain the concepts and show you a famous site where they're already in action. I've also set up my own demo, which includes code that you can download and look at.
The Basics
Essentially, sites following this proposal are required to make two versions of their content available:
- Content for JS-enabled users, at an 'AJAX style' URL
- Content for the search engines, at a static 'traditional' URL - Google refers to this as an 'HTML snapshot'
Historically, developers had made use of the 'named anchor' part of URLs on AJAX-powered websites (this is the 'hash' symbol, #, and the text following it). For example, take a look at this demo - clicking menu items changes named anchor and loads the content into the page on the fly. It's great for users, but search engine spiders can't deal with it.
Rather than using a hash, #, the new proposal requires using a hash and an exclamation point: #!
The #! combination has occasionally been called a 'hashbang' by people geekier than me; I like the sound of that term, so I'm going to stick with it.
Hashbang Wallop: The AJAX Crawling Protocol
As soon as you use the hashbang in a URL, Google will spot that you're following their protocol, and interpret your URLs in a special way - they'll take everything after the hashbang, and pass it to the site as a URL parameter instead. The name they use for the parameter is: _escaped_fragment_
Google will then rewrite the URL, and request content from that static page. To show what the rewritten URLs look like, here are some examples:
- www.demo.com/#!seattle/hotels becomes www.demo.com/?_escaped_fragment=seattle/hotels
- www.demo.com/users#!name=rob becomes www.demo.com/users?_escaped_fragment_=name=rob
As long as you can get the static page (the URL on the right in these examples) to display the same content that a user would see (at the left-hand URL), then it works just as planned.
Two Suggestions about Static URLs
For now, it seems that Google is returning static URLs in its index - this makes sense, since they don't want to damage a non-JS user's experience by sending them to a page that requires Javascript. For that reason, sites may want to add some Javascript that will detect JS-enabled users, and take the to the 'enhanced' AJAX version of the page they've landed on.
In addition, you probably don't want your indexed URLs to show up in the SERPs with the '_escaped_fragment_' parameter in them. This can easily be avoided by having your 'static version' pages at more attractive URLs, and using 301 redirects to guide the spiders from the _escaped_parameter_ version to the more attractive example.
E.G.: In my first example above, the site may choose to implement a 301 redirect from
www.demo.com?_escaped_fragment=seattle/hotels to www.demo.com/directory/seattle/hotels
A Live Example
Fortunately for us, there's a great demonstration of this proposal already in place on a pretty big website: the new version of Twitter.
If you're a Twitter user, logged-in, and have Javascript, you'll be able to see my profile here:
However, Googlebot will recognize that as a URL in the new format, and will instead request this URL:
Sensibly, Twitter want to maintain backward compatibility (and not have their indexed URLs look like junk) so they 301 redirect that URL to:
(And if you're a logged-in Twitter user, that last URL will actually redirect you back to the first one.)
Another Example, With Freely Downloadable Code
I've set up a demo of these practices in action, over at: www.gingerhost.com/ajax-demo
Feel free to have a play and see how that page behaves. If you'd like to see how it's implemented from a 'backend' perspective, hit the download link on that page to grab the PHP code I used. (N.B.: I'm not a developer; if anyone spots any glaring errors, please feel free to let me know so I can correct them!)
More Examples, Further Reading
The Google Web Toolkit showcase adheres to this proposal; experimenting with removing the hasbang is left as an exercise for the reader.
The best place to being further reading on this topic is definitely Google's own help pages. They give information about how sites should work to fit with this proposal, and have some interesting implementation advice, such as using server-side DOM manipulation to create the snapshot (though I think their focus on this 'headless browser' may well have put people off implementing this sooner.)
Google's Webmaster Central blog has the official announcement of this, and John Mueller invited discussion in the WMC Forums.
Between Google's blog, forum and help pages, you should find everything you need to turn your fancy AJAX sites into something that Google can love, as well as your users. Have fun!
Great explanation using the Twitter example Rob. I still think Ajax crawling is flawed as the bot and the browser are treated different. I wrote a YOUmoz post back in May with some of my thoughts https://www.seomoz.org/ugc/exploring-googles-ajax-crawling
Thanks for the explanation regarding the hashbang. Now I can look smart if somebody asks about it. :)
haha I was thinking the same thing....
From a usability aspect the developers should be building content that is accessible by both ajax and non-javascript enabled users. It does add to development time and such, but you give the ability to enhance the website to those who are running with JavaScript turned on.
The only time you should potentially decide towards having things run in Ajax only is when you control the users. Such as if you are building a control panel for staff at a business, you can say "you must have JavaScript turned on".
I think it's a good idea in essence on Googles part though.
Good post, but I still feel that ajax and seo are not quite bedfellows yet.
Advocating anything to do with content in AJAX from an SEO perspective is wreckless if you don't also point out the fact that this is still a bad practice. Just because Google can now supposedly index content inside AJAX does not mitigate, in the least, the several flaws with it. Let's separate out the pure usability issues and focus exclusively on the SEO aspects.
Google Only
First, just because Google can, does not mean any other search engines can. So right away, you're excluding all those people who come from other engines.
Topic Optimization Limits
Most AJAX content is presented on a page that has multiple AJAX links, typically tabs. You can NOT properly optimize a page for all the varying content displayed inside AJAX tabs. The topical focus of that page becomes trash.
Designer /Developer Free-For-All
As soon as you tell designers or developers Google can index AJAX by this new method, they completely go off the deep end and overload what should be mission-critical content, into AJAX displays, thus further killing SEO.
The Bottom Line
The bottom line, if you care about proper SEO best practices, is to NOT use the Google protocol for indexing AJAX content.
Well if FB and Twitter are adopting it then it can't be too wreckless.
My understanding was that no matter what you do the other search engines won't pick up AJAX, so any search engine creating strategies to crawl & index AJAX is a breakthrough. I will be the first to admit I am not a developer so my understanding may be elementary at best.
Actually if Twitter and FB choose to adopt it, it just means they can. They don't care about real SEO. neither site ever has. And in fact, because they do, people who don't know better then think exactly what you just stated. So it proves it's wreckless
Good point! They are so big they really do not need to be excruciatingly selective on SEO methods. I suppose it's much more about usability and getting UGC indexed wherever possible.
After a long time i have got to use SEO for ajax and this post helped me a lot to explain to the client and win project. Thanks for the awesome post.
AJAX and SEO was a big problem few year ago! We had doubt that will works with SEO on our shop site(main word is "tepisi")https://www.aloser.rs, but google crawle it good. We will continue using AJAX in purpose of SEO.
very useful post with great examples
I thought I'd share my collection of posts related to use of # tag and Ajax some of them also from SEOmoz
https://goo.gl/8kzL3
https://goo.gl/14XTE
I have created a b2b membership community site and use the the google maps api which contains our membership data in AJAX. My developer is creating a site map and has just created html pages to contain the same data so that it can be crawled (these will be dynamically generated with membership registration). Now he is using ISAPI Rewrite to make the urls friendly and then will make the redirect work and the crawling can begin. Is this the correct approach? Secondly, as a visitor clicks to view a members data there is an on command in the map view which contains the member content and I want this to count as a pageview but no one will agree that this can be accomplished. I need a believer to explain how this could be done?
Hi. After reading your post i thought i had found the solution that i needed. I have had a little test on a sites that simulates google robot view. Unfortunately it seems that this method doesn't work if that simulation trustworthy. the site i tested it, is www.smart-it-consulting [dot] com/internet/google/googlebot-spoofer/index.htm
I hope someone can confirm that i'm wrong. Thanks
hi
I am using Advance Ajax in my website and only # tag is used but not the #!.
So should i use #! in order to make pages crawlable ???or it is fine with # only ?
and some pages does not change URL even after the click of user,as they run ajax but URL remains the same.
So,should i modify them and URL must be changed on every click ??
Please reply ASAP
Hello,I was a doubt after reading this post and after testing it.The demo shown in this post shows how to present content to users (!#) and how to present content for search engines (_escaped_fragment_), however, in both situations the meta tags remain the same.How can we solve this problem to results of searches search engines display different titles?Thank you for your help!
Hi,
I have this site written in AJAX. Recently we started to use hashbang for Google to be crawled. Id like to ask you if my developer did it right. Sorry for urls, but I had to use live example. We can get rid of them later.
1. Main page
https://dyskonti.pl is visible for G. on https://dyskonti.pl/#! (and is being indexed corectly) but on https://dyskonti.pl/#! I can see other things than on pure https://dyskonti.pl - isnt it some kind of cloaking?
2. Categories
https://dyskonti.pl/#!dom-51 should be visible in SERP as https://dyskonti.pl/dom-51 (with its own title and description - urls similiar to Twitter)
So I go to https://dyskonti.pl/dom-51 and it redirects me to https://dyskonti.pl/#!dom-51 and everything seems to be correct. In SERP url is either ok - https://dyskonti.pl/dom-51. But when I check headers for https://dyskonti.pl/dom-51 it says 302. Shouldnt it rather be 301? Inside #! version of dom-51 G. can see canonical tag pointing to https://dyskonti.pl/dom-51. We also use that clean url in sitemap.xml.
Wow its complicated.
In G. SERP it looks like everything is ok, but I am concerned if we did all we could to improve seo.
Thankyou rob, finally i found the problem solver. I'm not expert in programming, how about the URL if has more one parameter? e.g index.php?animal=cat&furit=banana and I want to split value in different div element.
thank you
Wadebatyu,
First you should understand that the AJAX crawling is used only if you have dynamic parts in the page that are not represented in the basic HTML that is loaded in the URL.
If for if for example you have buttons on your site that enable users to choose cat and banana and the content is changing without a roundtrip to the server (meaning no reloading of the page) it means you have AJAX content on your site.
In that case the URL a user see on the page will change to something like index.php#!animal=cat once the user clicked on the cat button and to index.php#!animal=cat&fruit=banana.
This link should be included in the actual HTML of the page and the BOT will get to your site with this URL index.php?_escaped_fragment_=animal=cat&fruit=banana
BTW - Google say they will not include the _escaped_fragment_ in the search results and the clean URL with the #! will be used so I think there is no need for redirects.
I kind of like how the hashbang ("#!") is called "shebang" in Unix. It lends itself to Ricky Martin or William Hung jokes.
It doesn't look like Facebook's #! implementation is actually serving anything for the _escaped_fragment_? The Twitter 301 is nice—I always wondered if doing that or a rel canonical would be better for this stuff? It seems like either way you’re leaking some google juice?
Thanks for the post!
i actually was dig some tatics about tracking for AJAX content, basically required for a form process tracking, i then implemented virtual pageview, which allow you to see each AJAX-subpages performance.
This is also involves 301 redirects, so i guess it can also be used for goal tracking?
Thanks, I was wondering what the hashbang was for...
We tried to get AJAX content indexed when I was at a former company. We thought we had it cracked but it runed out that only some of the content was being indexed and not in any pattern we could tell.
Wish we'd had this article back then!
Seeing as how Google has been promoting Ajax for some time, I am glad to see that they have developed a way to index the content. I can't believe I didn't see this a year ago.
Here you have a similar post in Spanish https://www.bitacoradewebmaster.com/2010/01/03/propuesta-seo-ajax-google/
This is awesome, and actually answers a lot of questions I've seen surrounding the new #! urls on Twitter. Nice job Rob.
Superb post. I have seen the hashbang implemented in lots of website, but i have not noticed this one also did not think of this. Thanks for sharing this..
I was thinking of just catching either the IP or the browser for the common search engines and create a special page that would display all the content accessed through ajax by normal visitors to the same URL. The article will be broke into sub-sections accessed through Ajax with little to no scrolling. The website is Right2Say and plan to actively PR it starting December 2010. Would it be smart to use PHP logic to direct Google and like to a page showing all the content to the same URL?
Nice article! Thank you,
Tim
I think the guys over at Internet Marketing podcast covered this a week or two back. The way I understood it is, the page could still be indexed even without the "hashbang" but would strictly be "as is." With the '#!' you are able to tell Google you want it indexed, correct?
Pretty funny how Google FAQ addresses this.
Question: What if my app doesn't use hash fragments? Maybe it should!
=D
So here's a question:What if your content is completely crawlable but you are using ajax for seamless page transitions, (i.e. page doesn't refresh). So the site is using the hash tag # to allow users to use the back button, and to bookmark content, but really only part of the page is in AJAX. However our URLs in the browser bar are the hash values https://mysite.com/#/custom-url-string. But we also have the SEO Friendly URL we get link credit for, https://mysite.com/custom-url-string.aspx, if we start using the hashbang, https://mysite.com/#!/custom-url-string, we are then going to have a split in link efficacy.
Why can't Google\yahoo\bing just give credit to hashed url values?