Crawling and indexing has been a hot topic over the last few years. As soon as Google launched Google Panda, people rushed to their server logs and crawling stats and began fixing their index bloat. All those problems didn’t exist in the “SEO = backlinks” era from a few years ago. With this exponential growth of technical SEO, we need to get more and more technical. That being said, we still don’t know how exactly Google crawls our websites. Many SEOs still can’t tell the difference between crawling and indexing.
The biggest problem, though, is that when we want to troubleshoot indexing problems, the only tool in our arsenal is Google Search Console and the Fetch and Render tool. Once your website includes more than HTML and CSS, there's a lot of guesswork into how your content will be indexed by Google. This approach is risky, expensive, and can fail multiple times. Even when you discover the pieces of your website that weren’t indexed properly, it's extremely difficult to get to the bottom of the problem and find the fragments of code responsible for the indexing problems.
Fortunately, this is about to change. Recently, Ilya Grigorik from Google shared one of the most valuable insights into how crawlers work:
Interestingly, this tweet didn’t get nearly as much attention as I would expect.
So what does Ilya’s revelation in this tweet mean for SEOs?
Knowing that Chrome 41 is the technology behind the Web Rendering Service is a game-changer. Before this announcement, our only solution was to use Fetch and Render in Google Search Console to see our page rendered by the Website Rendering Service (WRS). This means we can troubleshoot technical problems that would otherwise have required experimenting and creating staging environments. Now, all you need to do is download and install Chrome 41 to see how your website loads in the browser. That’s it.
You can check the features and capabilities that Chrome 41 supports by visiting Caniuse.com or Chromestatus.com (Googlebot should support similar features). These two websites make a developer’s life much easier.
Even though we don’t know exactly which version Ilya had in mind, we can find Chrome’s version used by the WRS by looking at the server logs. It’s Chrome 41.0.2272.118.
It will be updated sometime in the future
Chrome 41 was created two years ago (in 2015), so it’s far removed from the current version of the browser. However, as Ilya Grigorik said, an update is coming:
I was lucky enough to get Ilya Grigorik to read this article before it was published, and he provided a ton of valuable feedback on this topic. He mentioned that they are hoping to have the WRS updated by 2018. Fingers crossed!
Google uses Chrome 41 for rendering. What does that mean?
We now have some interesting information about how Google renders websites. But what does that mean, practically, for site developers and their clients? Does this mean we can now ignore server-side rendering and deploy client-rendered, JavaScript-rich websites?
Not so fast. Here is what Ilya Grigorik had to say in response to this question:
We now know WRS' capabilities for rendering JavaScript and how to debug them. However, remember that not all crawlers support Javascript crawling, etc. Also, as of today, JavaScript crawling is only supported by Google and Ask (Ask is most likely powered by Google). Even if you don’t care about social media or search engines other than Google, one more thing to remember is that even with Chrome 41, not all JavaScript frameworks can be indexed by Google (read more about JavaScript frameworks crawling and indexing). This lets us troubleshoot and better diagnose problems.
For example, we ran into a problem with indexing Polymer's generated content. Ilya Grigorik provided insight on how to deal with such issues in our experiment (below). We used this feedback to make https://jsseo.expert/polymer/ indexable — it now works fine in Chrome 41 and indexes properly.
"If you look at the raised Javascript error under the hood, the test page is throwing an error due to unsupported (in M41) ES6 syntax. You can test this yourself in M41, or use the debug snippet we provided in the blog post to log the error into the DOM to see it."
I believe this is another powerful tool for web developers willing to make their JavaScript websites indexable.
If you want to see a live example, open https://jsseo.expert/angular2-bug/ in Chrome 41 and use the Chrome Developer Tools to play with JavaScript troubleshooting (screenshot below):
Fetch and Render is the Chrome v. 41 preview
There's another interesting thing about Chrome 41. Google Search Console's Fetch and Render tool is simply the Chrome 41 preview. The righthand-side view (“This is how a visitor to your website would have seen the page") is generated by the Google Search Console bot, which is... Chrome 41.0.2272.118 (see screenshot below).
There's evidence that both Googlebot and Google Search Console Bot render pages using Chrome 41. Still, we don’t exactly know what the differences between them are. One noticeable difference is that the Google Search Console bot doesn’t respect the robots.txt file. There may be more, but for the time being, we're not able to point them out.
Chrome 41 vs Fetch as Google: A word of caution
Chrome 41 is a great tool for debugging Googlebot. However, sometimes (not often) there's a situation in which Chrome 41 renders a page properly, but the screenshots from Google Fetch and Render suggest that Google can’t handle the page. It could be caused by CSS animations and transitions, Googlebot timeouts, or the usage of features that Googlebot doesn’t support. Let me show you an example.
Chrome 41 preview:
The above page has quite a lot of content and images, but it looks completely different in Google Search Console.
Google Search Console preview for the same URL:
As you can see, Google Search Console’s preview of this URL is completely different than what you saw on the previous screenshot (Chrome 41). All the content is gone and all we can see is the search bar.
From what we noticed, Google Search Console renders CSS a little bit different than Chrome 41. This doesn’t happen often, but as with most tools, we need to double check whenever possible.
This leads us to a question...
What features are supported by Googlebot and WRS?
According to the Rendering on Google Search guide:
- Googlebot doesn't support IndexedDB, WebSQL, and WebGL.
- HTTP cookies and local storage, as well as session storage, are cleared between page loads.
- All features requiring user permissions (like Notifications API, clipboard, push, device-info) are disabled.
- Google can’t index 3D and VR content.
- Googlebot only supports HTTP/1.1 crawling.
The last point is really interesting. Despite statements from Google over the last 2 years, Google still only crawls using HTTP/1.1.
No HTTP/2 support (still)
We've mostly been covering how Googlebot uses Chrome, but there's another recent discovery to keep in mind.
There is still no support for HTTP/2 for Googlebot.
Since it's now clear that Googlebot doesn’t support HTTP/2, this means that if your website supports HTTP/2, you can’t drop HTTP 1.1 optimization. Googlebot can crawl only using HTTP/1.1.
There were several announcements recently regarding Google’s HTTP/2 support. To read more about it, check out my HTTP/2 experiment here on the Moz Blog.
Googlebot’s future
Rumor has it that Chrome 59’s headless mode was created for Googlebot, or at least that it was discussed during the design process. It's hard to say if any of this chatter is true, but if it is, it means that to some extent, Googlebot will “see” the website in the same way as regular Internet users.
This would definitely make everything simpler for developers who wouldn’t have to worry about Googlebot’s ability to crawl even the most complex websites.
Chrome 41 vs. Googlebot’s crawling efficiency
Chrome 41 is a powerful tool for debugging JavaScript crawling and indexing. However, it's crucial not to jump on the hype train here and start launching websites that “pass the Chrome 41 test.”
Even if Googlebot can “see” our website, there are many other factors that will affect your site’s crawling efficiency. As an example, we already have proof showing that Googlebot can crawl and index JavaScript and many JavaScript frameworks. It doesn’t mean that JavaScript is great for SEO. I gathered significant evidence showing that JavaScript pages aren’t crawled even half as effectively as HTML-based pages.
In summary
Ilya Grigorik’s tweet sheds more light on how Google crawls pages and, thanks to that, we don’t have to build experiments for every feature we're testing — we can use Chrome 41 for debugging instead. This simple step will definitely save a lot of websites from indexing problems, like when Hulu.com’s JavaScript SEO backfired.
It's safe to assume that Chrome 41 will now be a part of every SEO’s toolset.
Since you can't use anymore Chrome 41 (no archives, sorry!) you can get any version of Chromium with instructions from here:
https://www.chromium.org/getting-involved/download...
this is easiest. Or you can get sources for version 41 and build it from source, but this is near impossible for non-devs:
https://chromium.googlesource.com/chromium/src/+/m...
https://chromium.googlesource.com/chromium/src/+/m...
https://chromium.googlesource.com/chromium/src/+/m...
This also require LOT of time (like few hours dependent from processor, can take even day), least 16G RAM and almost 100Gb free disk space.
You can find in last 3 docs interesting link "Are you a Google employee? See go/building-chrome instead.". So main difference between Chrome and Chromium is that first comes with API talks to Google services, little branding and codecs. More information you can find there:
https://en.wikipedia.org/wiki/Chromium_(web_browse...
Thanks to Grigorik and Goralewicz for bringing this topic in light with the experiments. Now we only have to change the plans and wait for what is next.. Sooner the better, Googlebot will support HTTP/2.
Excellent insights into the world of Googlebot. I have noticed several things in the course of my work. certain CMS tend to have multiple urls for the same page. This is specially true of quite a few e-commerce CMS. You can have the original page and then with the category identifier added to the URL for every category the product appears in.
There is also the case of website internal search pages getting indexed. I have observed that traffic has gone back to the original level after de-indexing the internal search pages.
In WordPress there is the problem of category and tag pages getting indexed. There was a case of a WordPress website in Google webmaster forums where a sail manufacturer in UK lost rankings heavily and then magically regained them after correcting the technical flaws.
PWA render's fine in GSC fetch and render, but the same pages in Chrome 41 are not loading all content or rendering other content incorrectly.
So seems there is some other stuff going on in Google-bot that isn't in Chrome 41.
Maximilian, this sounds interesting, can you share the URL? There are few things that differ Chrome 41 from GSC (e.g. Chrome 41 will work with HTTP/2 and GSC wont). We had couple of examples, where GSC would be different from Chrome 41 and every time it was due to CSS issues. It would be great to have a look at the URL :)
Hi Bartosz,
Great post!
Truly your knowledge will help me to understand a lot more on regard crawling issues, not a dev here, only an SEO auditor, though I will be able to direct for the right steps to solve many issues. Thanks.
After reading the article, my conclusion was "Geez, I've to learn more about crawling..." :)
Thank you Bartosz for sharing the insights!
Great article. I've actually come across a few sites that showed as entirely blank after doing a fetch and render. YIKES!
The truth, SEO is the most powerful way to grow up our site or blog. We want to know all about SEO and make our article in the first page on Google. Please help us...
Interesting article. The truth is that the technical SEo is increasingly important and we have to be attentive to the latest news.
Thank you!
thanks for giving complete information about GoogleBot service and web rendering service in detail.
After reading this article I understand that GoogleBot only focus on http/1.1 version not on http/2.
But I am not clear about Chrome 41 and 56. But I know that GoogleBot read Ip Adresses.
This is the kind of posts I love. Thanks for the detailed explanation, Bartosz!
That was really helpful! I have started blog commenting a few days ago for my blog and it is going really well. But my concern is, will Google treat the blog commenting as spam and we get penalized?
Googlebot will support HTTP/2. I think through this website performance will improve amd provide better ranking as well. Thanks for sharing very techinical article about search engine indexing.
great information , but technically difficult , but i think whatever any techniques Google use to crawl and index, the most important and first thing to consider is writing or making content for users
Wow! Google develops impressively rapidly!
Thank you for all the time and effort you put into this piece!
Nothing worse than creating an entire project only to have it be un-readable by Googlebot
Thanks for your input. Next time I will have to go so deep under the hood I now know how to debug it best.
Niice insight directly from the inside, into the 'black box'.. thanks Bartosz. We should set up a bounty site for whoever can successfully prove the release and version of the next update (if it is Chrome 59). :)
Hello Bartosz
I'm not sure whether I will really spend lot of time over this new information, but it is still nice to be informed of these new updates.
Thank you for keeping us updated.
Good Article with Information. Very Detailed. It is to more technically. Not IT can face some problem to understand it.
Keep Posting. Thanks budy.
Hi Bartosz
Interesting to know why Google does not index my crawls all pages. My programming skills are limited and some ideas escape me.
Thank you very much for the information.
Hi Bartosz, thanks for bringing this to light, certainly interesting information and as you say, surprising not more has been made of it. Now Chrome 41 will be another browser to fill up the dock :)
I'm glad you've highlighted that its not a be all and end all, so often we have to remind people that there's no such tool in SEO.
Always stick to best practices, especially with technical SEO, and if you're ever in doubt, go back to the start of this sentence.
Wow, this is a very high-value share, Bartosz. You can only use something best when you fully know it.
Thanks for your share, hope have the posts detail about this issue, this discovery is a revolutionary change
Thank you for sharing!