There are a handful of data sources relied upon by nearly every search engine optimizer. Google Search Console (formerly Google Webmaster Tools) has perhaps become the most ubiquitous. There are simply some things you can do with GSC, like disavowing links, that cannot be accomplished anywhere else, so we are in some ways forced to rely upon it. But, like all sources of knowledge, we must put it to the test to determine its trustworthiness — can we stake our craft on its recommendations? Let's see if we can pull back the curtain on GSC data and determine, once and for all, how skeptical we should be of the data it provides.
Testing data sources
Before we dive in, I think it is worth having a quick discussion about how we might address this problem. There are basically two concepts that I want to introduce for the sake of this analysis: internal validity and external validity.
Internal validity refers to whether the data accurately represents what Google knows about your site.
External validity refers to whether the data accurately represents the web.
These two concepts are extremely important for our discussion. Depending upon the problem we are addressing as SEOs, we may care more about one or another. For example, let's assume that page speed was an incredibly important ranking factor and we wanted to help a customer. We would likely be concerned with the internal validity of GSC's "time spent downloading a page" metric because, regardless of what happens to a real user, if Google thinks the page is slow, we will lose rankings. We would rely on this metric insofar as we were confident it represented what Google believes about the customer's site. On the other hand, if we are trying to prevent Google from finding bad links, we would be concerned about the external validity of the "links to your site" section because, while Google might already know about some bad links, we want to make sure there aren't any others that Google could stumble upon. Thus, depending on how well GSC's sample links comprehensively describe the links across the web, we might reject that metric and use a combination of other sources (like Open Site Explorer, Majestic, and Ahrefs) which will give us greater coverage.
The point of this exercise is simply to say that we can judge GSC's data from multiple perspectives, and it is important to tease these out so we know when it is reasonable to rely upon GSC.
GSC Section 1: HTML Improvements
Of the many useful features in GSC, Google provides a list of some common HTML errors it discovered in the course of crawling your site. This section, located at Search Appearance > HTML Improvements, lists off several potential errors including Duplicate Titles, Duplicate Descriptions, and other actionable recommendations. Fortunately, this first example gives us an opportunity to outline methods for testing both the internal and external validity of the data. As you can see in the screenshot below, GSC has found duplicate meta descriptions because a website has case insensitive URLs and no canonical tag or redirect to fix it. Essentially, you can reach the page from either /Page.aspx or /page.aspx, and this is apparent as Googlebot had found the URL both with and without capitalization. Let's test Google's recommendation to see if it is externally and internally valid.
External Validity: In this case, the external validity is simply whether the data accurately reflects pages as they appear on the Internet. As one can imagine, the list of HTML improvements can be woefully out of date dependent upon the crawl rate of your site. In this case, the site had previously repaired the issue with a 301 redirect.
This really isn't terribly surprising. Google shouldn't be expected to update this section of GSC every time you apply a correction to your website. However, it does illustrate a common problem with GSC. Many of the issues GSC alerts you to may have already been fixed by you or your web developer. I don't think this is a fault with GSC by any stretch of the imagination, just a limitation that can only be addressed by more frequent, deliberate crawls like Moz Pro's Crawl Audit or a standalone tool like Screaming Frog.
Internal Validity: This is where things start to get interesting. While it is unsurprising that Google doesn't crawl your site so frequently as to capture updates to your site in real-time, it is reasonable to expect that what Google has crawled would be reflected accurately in GSC. This doesn't appear to be the case.
By executing an info:https://concerning-url query in Google with upper-case letters, we can determine some information about what Google knows about the URL. Google returns results for the lower-case version of the URL! This indicates that Google both knows about the 301 redirect correcting the problem and has corrected it in their search index. As you can imagine, this presents us with quite a problem. HTML Improvement recommendations in GSC not only may not reflect changes you made to your site, it might not even reflect corrections Google is already aware of. Given this difference, it almost always makes sense to crawl your site for these types of issues in addition to using GSC.
GSC Section 2: Index Status
The next metric we are going to tackle is Google's Index Status, which is supposed to provide you with an accurate number of pages Google has indexed from your site. This section is located at Google Index > Index Status. This particular metric can only be tested for internal validity since it is specifically providing us with information about Google itself. There are a couple of ways we could address this...
- We could compare the number provided in GSC to site: commands
- We could compare the number provided in GSC to the number of internal links to the homepage in the internal links section (assuming 1 link to homepage from every page on the site)
We opted for both. The biggest problem with this particular metric is being certain what it is measuring. Because GSC allows you to authorize the http, https, www, and non-www version of your site independently, it can be confusing as to what is included in the Index Status metric.
We found that when carefully applied to ensure no crossover of varying types (https vs http, www vs non-www), the Index Status metric seemed to be quite well correlated with the site:site.com query in Google, especially on smaller sites. The larger the site, the more fluctuation we saw in these numbers, but this could be accounted for by approximations performed by the site: command.
We found the link count method to be difficult to use, though. Consider the graphic above. The site in question has 1,587 pages indexed according to GSC, but the home page to that site has 7,080 internal links. This seems highly unrealistic, as we were unable to find a single page, much less the majority of pages, with 4 or more links back to the home page. However, given the consistency with the site: command and GSC's Index Status, I believe this is more of a problem with the way internal links are represented than with the Index Status metric.
I think it is safe to conclude that the Index Status metric is probably the most reliable one available to us in regards to the number of pages actually included in Google's index.
GSC Section 3: Internal Links
The Internal Links section found under Search Traffic > Internal Links seems to be rarely used, but can be quite insightful. If External Links tells Google what others think is important on your site, then Internal Links tell Google what you think is important on your site. This section once again serves as a useful example of knowing the difference between what Google believes about your site and what is actually true of your site.
Testing this metric was fairly straightforward. We took the internal links numbers provided by GSC and compared them to full site crawls. We could then determine whether Google's crawl was fairly representative of the actual site.
Generally speaking, the two were modestly correlated with some fairly significant deviation. As an SEO, I find this incredibly important. Google does not start at your home page and crawl your site in the same way that your standard site crawlers do (like the one included in Moz Pro). Googlebot approaches your site via a combination of external links, internal links, sitemaps, redirects, etc. that can give a very different picture. In fact, we found several examples where a full site crawl unearthed hundreds of internal links that Googlebot had missed. Navigational pages, like category pages in the blog, were crawled less frequently, so certain pages didn't accumulate nearly as many links in GSC as one would have expected having looked only at a traditional crawl.
As search marketers, in this case we must be concerned with internal validity, or what Google believes about our site. I highly recommend comparing Google's numbers to your own site crawl to determine if there is important content which Google determines you have ignored in your internal linking.
GSC Section 4: Links to Your Site
Link data is always one of the most sought-after metrics in our industry, and rightly so. External links continue to be the strongest predictive factor for rankings and Google has admitted as much time and time again. So how does GSC's link data measure up?
In this analysis, we compared the links presented to us by GSC to those presented by Ahrefs, Majestic, and Moz for whether those links are still live. To be fair to GSC, which provides only a sampling of links, we only used sites that had fewer than 1,000 total backlinks, increasing the likelihood that we get a full picture (or at least close to it) from GSC. The results are startling. GSC's lists, both "sample links" and "latest links," were the lowest-performing in terms of "live links" for every site we tested, never once beating out Moz, Majestic, or Ahrefs.
I do want to be clear and upfront about Moz's performance in this particular test. Because Moz has a smaller total index, it is likely we only surface higher-quality, long-lasting links. Our out-performing Majestic and Ahrefs by just a couple of percentage points is likely a side effect of index size and not reflective of a substantial difference. However, the several percentage points which separate GSC from all 3 link indexes cannot be ignored. In terms of external validity — that is to say, how well this data reflects what is actually happening on the web — GSC is out-performed by third-party indexes.
But what about internal validity? Does GSC give us a fresh look at Google's actual backlink index? It does appear that the two are consistent insofar as rarely reporting links that Google is already aware are no longer in the index. We randomly selected hundreds of URLs which were "no longer found" according to our test to determine if Googlebot still had old versions cached and, uniformly, that was the case. While we can't be certain that it shows a complete set of Google's link index relative to your site, we can be confident that Google tends to show only results that are in accord with their latest data.
GSC Section 5: Search Analytics
Search Analytics is probably the most important and heavily utilized feature within Google Search Console, as it gives us some insight into the data lost with Google's "Not Provided" updates to Google Analytics. Many have rightfully questioned the accuracy of the data, so we decided to take a closer look.
Experimental analysis
The Search Analytics section gave us a unique opportunity to utilize an experimental design to determine the reliability of the data. Unlike some of the other metrics we tested, we could control reality by delivering clicks under certain circumstances to individual pages on a site. We developed a study that worked something like this:
- Create a series of nonsensical text pages.
- Link to them from internal sources to encourage indexation.
- Use volunteers to perform searches for the nonsensical terms, which inevitably reveal the exact-match nonsensical content we created.
- Vary the circumstances under which those volunteers search to determine if GSC tracks clicks and impressions only in certain environments.
- Use volunteers to click on those results.
- Record their actions.
- Compare to the data provided by GSC.
We decided to check 5 different environments for their reliability:
- User performs search logged into Google in Chrome
- User performs search logged out, incognito in Chrome
- User performs search from mobile
- User performs search logged out in Firefox
- User performs the same search 5 times over the course of a day
We hoped these variants would answer specific questions about the methods Google used to collect data for GSC. We were sorely and uniformly disappointed.
Experimental results
Method | Delivered | GSC Impressions | GSC Clicks |
---|---|---|---|
Logged In Chrome | 11 | 0 | 0 |
Incognito | 11 | 0 | 0 |
Mobile | 11 | 0 | 0 |
Logged Out Firefox | 11 | 0 | 0 |
5 Searches Each | 40 | 2 | 0 |
GSC recorded only 2 impressions out of 84, and absolutely 0 clicks. Given these results, I was immediately concerned about the experimental design. Perhaps Google wasn't recording data for these pages? Perhaps we didn't hit a minimum number necessary for recording data, only barely eclipsing that in the last study of 5 searches per person?
Unfortunately, neither of those explanations made much sense. In fact, several of the test pages picked up impressions by the hundreds for bizarre, low-ranking keywords that just happened to occur at random in the nonsensical tests. Moreover, many pages on the site recorded very low impressions and clicks, and when compared with Google Analytics data, did indeed have very few clicks. It is quite evident that GSC cannot be relied upon, regardless of user circumstance, for lightly searched terms. It is, by this account, not externally valid — that is to say, impressions and clicks in GSC do not reliably reflect impressions and clicks performed on Google.
As you can imagine, I was not satisfied with this result. Perhaps the experimental design had some unforeseen limitations which a standard comparative analysis would uncover.
Comparative analysis
The next step I undertook was comparing GSC data to other sources to see if we could find some relationship between the data presented and secondary measurements which might shed light on why the initial GSC experiment had reflected so poorly on the quality of data. The most straightforward comparison was that of GSC to Google Analytics. In theory, GSC's reporting of clicks should mirror Google Analytics's recording of organic clicks from Google, if not identically, at least proportionally. Because of concerns related to the scale of the experimental project, I decided to first try a set of larger sites.
Unfortunately, the results were wildly different. The first example site received around 6,000 clicks per day from Google Organic Search according to GA. Dozens of pages with hundreds of organic clicks per month, according to GA, received 0 clicks according to GSC. But, in this case, I was able to uncover a culprit, and it has to do with the way clicks are tracked.
GSC tracks a click based on the URL in the search results (let's say you click on /pageA.html). However, let's assume that /pageA.html redirects to /pagea.html because you were smart and decided to fix the casing issue discussed at the top of the page. If Googlebot hasn't picked up that fix, then Google Search will still have the old URL, but the click will be recorded in Google Analytics on the corrected URL, since that is the page where GA's code fires. It just so happened that enough cleanup had taken place recently on the first site I tested that GA and GSC had a correlation coefficient of just .52!
So, I went in search of other properties that might provide a clearer picture. After analyzing several properties without similar problems as the first, we identified a range of approximately .94 to .99 correlation between GSC and Google Analytics reporting on organic landing pages. This seems pretty strong.
Finally, we did one more type of comparative analytics to determine the trustworthiness of GSC's ranking data. In general, the number of clicks received by a site should be a function of the number of impressions it received and at what position in the SERP. While this is obviously an incomplete view of all the factors, it seems fair to say that we could compare the quality of two ranking sets if we know the number of impressions and the number of clicks. In theory, the rank tracking method which better predicts the clicks given the impressions is the better of the two.
Call me unsurprised, but this wasn't even close. Standard rank tracking methods performed far better at predicting the actual number of clicks than the rank as presented in Google Search Console. We know that GSC's rank data is an average position which almost certainly presents a false picture. There are many scenarios where this is true, but let me just explain one. Imagine you add new content and your keyword starts at position 80, then moves to 70, then 60, and eventually to #1. Now, imagine you create a different piece of content and it sits at position 40, never wavering. GSC will report both as having an average position of 40. The first, though, will receive considerable traffic for the time that it is in position 1, and the latter will never receive any. GSC's averaging method based on impression data obscures the underlying features too much to provide relevant projections. Until something changes explicitly in Google's method for collecting rank data for GSC, it will not be sufficient for getting at the truth of your site's current position.
Reconciliation
So, how do we reconcile the experimental results with the comparative results, both the positives and negatives of GSC Search Analytics? Well, I think there are a couple of clear takeaways.
- Impression data is misleading at best, and simply false at worst: We can be certain that all impressions are not captured and are not accurately reflected in the GSC data.
- Click data is proportionally accurate: Clicks can be trusted as a proportional metric (ie: correlates with reality) but not as a specific data point.
- Click data is useful for telling you what URLs rank, but not what pages they actually land on.
Understanding this reconciliation can be quite valuable. For example, if you find your click data in GSC is not proportional to your Google Analytics data, there is a high probability that your site is utilizing redirects in a way that Googlebot has not yet discovered or applied. This could be indicative of an underlying problem which needs to be addressed.
Final thoughts
Google Search Console provides a great deal of invaluable data which smart webmasters rely upon to make data-driven marketing decisions. However, we should remain skeptical of this data, like any data source, and continue to test it for both internal and external validity. We should also pay careful attention to the appropriate manners in which we use the data, so as not to draw conclusions that are unsafe or unreliable where the data is weak. Perhaps most importantly: verify, verify, verify. If you have the means, use different tools and services to verify the data you find in Google Search Console, ensuring you and your team are working with reliable data. Also, there are lots of folks to thank here -Michael Cottam, Everett Sizemore, Marshall Simmonds, David Sottimano, Britney Muller, Rand Fishkin, Dr. Pete and so many more. If I forgot you, let me know!
Honestly, I was one of those sheep that assumed whilst all others were incorrect Google had it perfect. I guess I should start questioning things from the big G more often... :P
You weren't a sheep :-) My guess is that Google doesn't test the validity of this information in the way that we do - they simply provide the best information they have at the time. The question still remains whether their best is good enough for our purposes. It appears that sometimes it isn't.
I don't think Google will invest too much in improving GSC because there's no business benefit. Analytics is used by marketers who spend money with Google, but GSC is mostly used by SEOs who don't spend that much or anything at all. I have a feeling that GSC is just a Project they work on occasionally and don't invest to much in it.
"I don't think Google will invest too much in improving GSC"
If they did, and they did an awesome job, there could be quite a few SEO tools out of business over night. A bit of a scary business model for some of them.
Google will, and also investing their time to improve GSC. They make a model for marketers. If you are best, you will be successful in internet marketing as well.
disavow for e.g. they say only take into account the links they show not other sources Moz index etc which have been shown to be more reliable.
I have a rule I follow and impart to those I teach. Analytics platforms are the most accurate when compared it to itself. That rule saves a lot of heart-ache when I'm comparing data via different sources..
Dear Russ,
A very insightful peace of content this is, I appreciate for the efforts you have put behind making this post.
There are 50% of the SEO's using GWC mainly to check the clicks on search queries. Keeping this in mind, I would like to add that sometimes it's hard to find what search query exactly a visitor used to land on our site organically. Because, Google analytics will show "Not Provided" and even search console won't display those search queries (as you know the problem of Logged in, etc)
In such case I suggest, we should check the "Landing pages" in the organic section of Google analytics, filter those Landing pages, then use Moz/Ahrefs/Majestic tool to track how many keywords ranks for that particular landing pages. I know this won't give you the exact idea but based on the search volume and keywords position, we can assume what search queries those would be that visitors came from.
I hope this would help to people !
This is a good tip. A lot of us correlate rankings->keywords but it isn't an exact science. Thanks!
Yes absolutely, there is a lot more things to focus on than just rankings-> keywords. The proper optimization of a landing page will take you to the next level, specially for an eCommerce business where we target so many keywords on a single landing page. So, now we should think wise, not be limited to the keywords ranking only.
Good luck !
Hi Russ,
I enjoyed your article. I can only imagine how much work went into it. Respect!
Our software, SearchConsoleHelper.com, works with GSC's API data, so the topic is very relevant to us. I have a couple of insights for you regarding the Search Analytics section:
1. The reason why your "nonsensical" experiment with clicks went wrong, is most likely due to the fact that Google removes "very rare queries" from the data set (source: https://support.google.com/webmasters/answer/6155685?hl=en#aboutdata). In other words, if Google has never encountered a query before, it will likely not report it in GSC. That would explain the 0 clicks.
This non-reporting of "very rare queries" partially influences also the GSC vs GA comparisons, because in GSC you never work with the full data set.
The other reason for differences between GSC clicks and GA sessions comes from the fact that Google measures clicks differently than it measures sessions. In GA, Google uses last non-direct attribution to measure "clicks." This includes the true number of Google organic sessions plus the number of direct sessions that followed it. There are other minor issues, too (such as GA's need to execute Javascript). But, overall, comparing clicks to sessions is like comparing apples to oranges. Lunametrics published a good article about this back in 2015: https://www.lunametrics.com/blog/2015/08/05/google-...
That said, your comparative analysis is intriguing and I would like to know more about it. Would you be willing to discuss it over the phone or Skype?
2. I do not subscribe to your conclusion about impressions in GSC (misleading at best, false at worst), once you understand how Google calculates impressions and positions.
GSC has two different ways to aggregate impresions, none of which is intuitive: by site and by page. The default is by site, which means that you can have multiple links on the results page (incl. knowledge graph, site links, multiple positions, etc.), GSC will record only 1 impression and the average position will be equal to the higest placed link. Aggregating by page, means that each unique result will count as a separate impression and the position will be calculated as a true average. Sources: https://support.google.com/webmasters/answer/70428... and https://support.google.com/webmasters/answer/61556...
This clearly complicates the interpretation of the data, but it doesn't mean that the data is incorrect.
Unfortunately, the data interpretation is further complicated by the fact that, for an impression to occur, the results don't need to be actually scrolled into view or otherwise visible. The only condition is that the result has to be on the page viewed.
Overall, I believe the GSC data is correct, but it needs to be properly understood.
Btw, we wrote an article that touches on some of these issues here:
https://searchconsolehelper.com/google-search-conso...
Awesome, thank you for your response...
1. Regarding GSC and rare keywords: Great find, but it still means that GSC's data is incomplete (albeit intentionally). This could be particularly problematic for a site with a good deal of long tail traffic. If most of their keywords are rarely searched, but there are a sufficient number of them, they could appear to have little to no traffic in GSC when actually accumulating a decent stream of visitors via organic.
2. Clicks in GSC vs GA: No disagreement here, as I concluded "After analyzing several properties without similar problems as the first, we identified a range of approximately .94 to .99 correlation between GSC and Google Analytics reporting on organic landing pages. This seems pretty strong." The differences here really are trivial at the aggregate level.
3. Conclusion about impressions in GSC: I stand by this conclusion. Our experiments clearly demonstrated that at least some impressions are ignored (which you admit as much when it is a rare keyword) and that the collection methods render the data rather meaningless. This is why I said "misleading at best, false at worst".
Thanks for your thoughts and the links!
Hi Russ,
Thanks for your reply. The fact that GSC won't give us all the queries is upsetting, I agree. Especially, because the number of the hidden queries could be really significant (30% is totally normal in our experience).
At the same time, however, if we trust Google that these are truly unique queries, their value for SEO is limited, at best. Yes, they can perhaps help us better understand user intent, but it still won't make sense optimizing for those queries.
So, all considered, not much is lost.
Thanks for the great article Russ which I´m sure took you a lot of time to write up. I´m glad to have seen a post about the validity of GSC data as at the beginning I completely trusted it 100% and as time has gone on, it's been less and less. I now use it for indexing my new posts; totalping which is a free service which allows you to "ping" articles or URL´s which can help Google find them and index them.
Hi Russ,
Nice tests. I absolutely love stuff like this and I have a few comments/questions for you about the Search Analytics section:
About your test on how Google is tracking impressions and clicks - how big the site was that you tested it on? If you tested this on a large, high traffic site, I think it would also be interesting to test on a smaller, low traffic site to see if there are any differences.
Maybe a small site where GSC is recording less data would record these impressions?
On your correlation of Rank+Impressions to Clicks – did you set the country when looking at GSC average rank? Your rank tracker is probably looking at Google US so I was just wondering if you were only looking at US ranking data in GSC? Was mobile or desktop specified in GSC to match your rank tracker data?
I would imagine the rank tracker figures were from a single day. Were your GSC rankings also from that same day? Or were you using average rank over a longer time frame (eg. the standard 28 days report in GSC)?
Also, I’m not sure your example of GSC average rank data is fair. We know that Google only records the ranking position when it gets an impression, so in your first example with the piece of content moving from position 80, to 70, to 60, an eventually to position 1, it would obviously be getting a lot more impressions at position 1 than position 80. This would heavily weight the average rank towards position one.
Eg. Maybe at position 80, 70, and 60 it gets one impression, while at position 1 it gets 20 impressions (for a very low volume keyword). That would give it an average ranking position (per impression) of 10.
For reporting purposes, I much prefer to use GSC’s average rank figures than point-to-point figures in a keyword tracking tool because I believe this paints a fairer picture of overall monthly keyword performance.
I hope you have time to respond :)
Cheers,
David
Great Questions
About your test on how Google is tracking impressions and clicks - how big the site was that you tested it on? If you tested this on a large, high traffic site, I think it would also be interesting to test on a smaller, low traffic site to see if there are any differences.
The experiment was done on a low traffic site, but the comparative method was tested on sites ranging from a few visits a week to hundreds of thousands per week. I think it was a fairly diverse set.
On your correlation of Rank+Impressions to Clicks – did you set the country when looking at GSC average rank? Your rank tracker is probably looking at Google US so I was just wondering if you were only looking at US ranking data in GSC? Was mobile or desktop specified in GSC to match your rank tracker data?
Good question. I did not differentiate. However, it would be even more bizarre if desktop, US tracking did a better job of predicting GSC click numbers (which are blended desktop, country, etc.) than the appropriate blended ranking of GSC. However, it is worth taking another look! Great catch!
We know that Google only records the ranking position when it gets an impression, so in your first example with the piece of content moving from position 80, to 70, to 60, an eventually to position 1, it would obviously be getting a lot more impressions at position 1 than position 80.
I would agree with you if it were not for the experiment we ran which showed that Google doesn't count a large number of impressions (out of 84 impressions delivered, it only showed 2). Bizarrely, it showed hundreds of impressions for those same landing pages except for keywords that ranked in the 80+ position!
At any rate, I think the critique is fair to consider, and perhaps my explanation of why GSC is untrustworthy in this regard isn't quite correct, but it still stands that GSC is untrustworthy in this regard, it is just yet explained.
Thanks for the really bright and thoughtful critiques / insights. You have a fantastic mind for this sort of thing, I'm impressed!
Very detailed and insightful article. I'm new to Google Analytics and Search Console. I use the latter to improve my SERPs by optimizing my keywords. Guess that's not enough?
Hello Adonis,
You are doing the right things but that is not enough. For you information, if a user is logged into any google product (like youtube, gmail, etc) while searching, then it would be conducted over SSL by Google. So, such kind of "search queries" will not available in search console and Google analytics.
So, most probably you will not get enough data in search console to identify what's actually happening with your site in Google. So, I guess we should use GSC but don't relay on it 100% (as suggested by Russ as well in this post). You may use Moz/ahrefs tools for some more information about your site's performance and improvement.
Good luck !
Google wants you to invest money into Adwords. You'll get better search query data from PPC that will help your SEO. The only catch is that it aint free. Welcome to digital marketing! :-)
Comparison was carried to show Google tools trust worthiness. But i want to say that Google provides best resources absolutely free. You can not compare Paid Software's with Free version.
Where to begin: I think the entire approach toward GSC in this article is just wrong. GSC doesn't necessary reflect what is, it reflects the current snapshot of what Google believes to be the state of your website. Let's start with what I feel are a few misconceptions:
(1) HTML Improvements - these are only there to be used as pointers. I seriously doubt anyone at Google thinks you should be using this tool determine whether you have valid HTML (or CSS or JS). But it is an indicator about whether or not your HTML is clean and well formed. If you really want to get your HTML clean and well formed, there are several really good tools out there like those provided by W3C. The real question here is WHY should your HTML be clean and up to date? Because it makes the job of the crawler easier (e.g. more efficient). I know from my companies experience, we are in the process of dramatically re-writing our HTML from scratch with a focus on having as simple a DOM as possible and making as few calls as possible. The net result is our page speed is dramatically improved - probably because the crawlers, which are really just modifications of browsers are able to interpret our webpages much more quickly. Our full page load times are now 700-800ms vs. 1.8 seconds. By having clean code, you also are insuring a greater likelihood of your content being interpreted correctly.
(2) Index Status - I think this is confusing indexing vs. crawling. They aren't the same thing. You CAN actually validate whether or not a page is indexed externally. Again, tools like URL Profiler can help you to see if your pages are in the index. GSC is not going to be perfect here either, but I have found it to be pretty darn close. Unfortunately they don't provide solid details to compare what is in the index vs. what comprises the count of indexed pages in GSC.
(3) Internal Links - I think this is the MOST important report in all of GSC. From experimentation, I have found that the order and relative count of internal links has a very strong correlation to relative SERP rank. This should come as no surprise, the pages listed first in this report have the most internal links and thus are being signaled to Google that these are the most important pages on your website. Again, by looking at what pages are appropriately or inappropriately ranked highly in the report one can quickly set to modify the internal architecture to adjust. We have made significant adjustments to our internal architecture, reducing navigation, reducing number of non-valuable links on a page and promoting our 'money pages' with the net result that we have seen our rankings improve for the keywords associated with these money pages dramatically over the year - almost to the point of domination.
The bottom line is GSC should be thought of more like a medical diagnostic tool. Its not a data warehouse.
Thanks for your response. A couple of thoughts...
"HTML Recommendations: I seriously doubt anyone at Google thinks you should be using this tool determine whether you have valid HTML"
I agree, but certainly many Webmasters think they should use this tool to do just that. My intent is to inform webmasters why the data isn't wholly trustworthy. I say just as much in my conclusion to that section... "Given this difference, it almost always makes sense to crawl your site for these types of issues in addition to using GSC"
"Index Status: GSC is not going to be perfect here either, but I have found it to be pretty darn close"
I'm not sure what you are complaining about here. I said just the same thing... "I think it is safe to conclude that the Index Status metric is probably the most reliable one available to us in regards to the number of pages actually included in Google's index."
"Internal Links: I think this is the MOST important report in all of GSC"
Good, so do I. Again, I refer you to the conclusion of the section on internal links in which I write... "As search marketers, in this case we must be concerned with internal validity, or what Google believes about our site. I highly recommend comparing Google's numbers to your own site crawl to determine if there is important content which Google determines you have ignored in your internal linking."
I have to admit, I am really confused. I agree with everything you have said and my research as presented above does as well. Did you read the article? Perhaps I didn't make my conclusions clear enough.
Fantastically fascinating summary of GSC/WMT's reliability! Especially the experimental results of Search Analytics. I've always taken everything in Search Console with a pinch of salt... Even when it's "straight from the horse's mouth", it's hard not to get suspicious when Google's own numbers don't always align. Fantastic write-up.
Thanks for going the extra mile on this article, Russ. I've always been wondering about the validity of GSC data and this gives me more of an idea of what I can trust. I just don't understand why Google can't give us data that we can trust and that correlates with GA better. Hopefully someone over there reads your article and decides its time for a change! (Not holding my breath on that though.)
I have always been cautious with the GSC data because much of it can be cross referenced with GA and other analytic tools and the data just doesn't jive. However it is obliquely valuable for trends so we incorporate it in the decision process. The real question is WHY IS IT INACCURATE? Is it intentional? I can't imagine a company that hangs its hat on data allowing something that has so many inaccuracies in the output to continue unless it is what they want. Google continually hides behind data privacy as the reasoning for obscuring data but it rings hollow when you see how they are using it. Any thoughts? Conspiracy theories (wink)
I think the inaccuracies are often just byproducts of the data collection and presentation method. Take, for example, the HTML Recommendations. Chances are, at some regularity, Google produces a list of issues and ports it into GSC for you. However, less often does GSC prune those examples which are no longer an issue. So you end up with out-dated material, both in reference to the web and Google's current index.
I don't think any of these are malicious on Google's behalf by any stretch of the imagination.
I like to check indexation level based on numbers of links indexed in GSC sitemap section. It gives better results for me than "sites indexed". Usually these results are similar to site: ones.
Good tip!
I've always felt the GSC and GA search data for my site is at least partially inaccurate. Sometimes GA records more clicks, sometimes GSC records more clicks. It's a bit all over.
I rely on GSC for search "trends" only. Are my impressions increasing? Good. Is my average position increasing? Good. Position is decreasing? Good, now let's find out why and try to fix that.
Hi Russ,
A very well detail article on GSC. And I loved to read Google Search Analytics part. You Cleared a lot of things which i was curious to know. As for me i was not cleared why i receive different results form Google Analytics and GSC. Thanks for that Gem.
Regards
There are many cased the site owner don't want to change URL structure, especially for e-commerce website in that cases we can set the URL parameter by GSC to the prevent the duplicate issue. By this, you say Google how to crawl URLs with the given parameter.
Hi Russ,
thanks for great insight! I agree with everything except the end of 5th section about prediction of given clicks.
Based on your description under graph I assume that you used average position which is metric based on average position from previous days (according to your example with new article position change in time, you didn't used just one day). But Rank Tracker or any other tool you use provide data from current SERP. Correct methodic in this case should be an average position from specific country using specific device with data for specified day. For example if you use Rank Tracker today then after few days you can check same day with same conditions (device, country, search type, etc.) for that keyword in GSC and compare.
I'm creating now similar research as you did and I see a little bit different results for some separate topics so if you want to, I can provide you (or anybody) results after I finish that :-)
Thanks, Filip
My intent here was simply to show that rank tracking, with all its flaws, still predicts actual clicks better than the "average position" in GSC. You can get better data in GSC by clicking into a keyword and seeing the number of impressions in individual position, but even then the data can be problematic. Thanks for the comment!
Hi!
Thanks for the article!
English is not my first language, so I didn't catch some things.
In GSC Section 5: Search Analytics (Experimental analysis) you said: >>impressions and clicks in GSC do not reliably reflect impressions and clicks performed on Google.
And later there were: >> Click data is proportionally accurate.
So, should I trust click data? Or not? =)
Many thanks Russ for the Post, I am still inexperienced in many SEO topics and you have clarified a lot of doubts I had. Although in the summary you say, and with reason, that we should remain skeptical of this and other data sources, I strongly state and agree that GSC should be a basic and essential tool for monitoring our work.
Many thanks for your advice Russ !!
Google Search Console is a great tool that we should all take into account in addition to having the advantage for small web developers to be a free tool.
Google makes it very easy by telling us what we are doing right and what we are doing wrong. If Google likes other search engines as well. I am a tool that I analyze and observe constantly and with which I also learn a lot.
Regarding page indexing and crawl rates. One thing I've found to be helpful is using structured data appropriately but also as a gauge to crawl rates and indexing.
Imagine a site that is rolling out a whole new products section. When you add them you'll start to see breadcrumb entries in the structured data reporting area increase which ultimately give you a second check on how quickly the pages are getting picked up. It's interesting to see how these numbers differ from page index counts week to week.
Great work! Answers some questions we had and raises others. Puts us on the right track to where we need to be heading.
Hi Russ
Thanks for great write-up on a topic that has been the subject of much internal discussion where I work.
A couple of things:
GSC Section 4: Links to Your Site. How can we be sure that the backlinks from Moz, Majestic, AHRefs or any other source is a accurate picture of the number backlinks. Even if those tools are correct, if Google isn't aware of those backlinks, they aren't going to much use to you anyway. I guess what I am saying is that GSC backlink data may be the most useful to SEO's even if it isn't the most accurate.
In my own very basic research, the reliability of the search analytics data has been questionable based on changing the filters used. For example if you switch from queries to pages you often get a very different number of impressions. There are some explanations for this but none of them explain (to me at least) why there would be such a difference.
Regarding ranking, again this seems off a lot of the time. If a brand term is used and we know that the brand is always P1, it doesn't make sense that GSC provides a lower average rank.
Saying all that, even if the data is slightly questionable, surely google is going to be the best source of their own data?
Hi Russ, Thanks for this post, really interesting.
Honestly the only part of GSC I do trust is the analysis of keywords and impressions, and maybe the statistic crawl as well. You can't get better insight of your results if you don't connect your GSC and GA. regarding the improvement of the HTML, my point of view is GSC is not fully equipped to analyse all components of a page or to get the recmmandation to improve your page.
One more question appeared after second reading:
>GSC's averaging method based on impression data obscures the underlying features too much to provide relevant projections.
Are you saying that GSC shows not actual positions but projections based on impressions? Did I get it right?
This is cool! Thanks a lot for sharing Russ!
Hi, I am new here. My site had a redirect over a year ago from .html to .com and it seems like I have been invisible to google ever since. You mention under the Reconciliation section that perhaps google hasn't picked up on the redirect and that may be the cause of #'s being off vs. GA. How do I get google to recognize the redirect? Ask it to recrawl my site? I am not that great with all this technical stuff so I apologize if I sound clueless!-ha. BUT I do know something is wrong, just not sure how to fix it. Thank you :)
Nice one Russ, missing however the check on the reliability of the crawl stats. I always get the impression that the crawl stats indicated by the Search Console are not linked to the actual crawl data you can find in the log files of your site. Any thoughts on that?
I tested that briefly but didn't complete in time for launch. I found that the numbers were actually fairly accurate IF you grepped your log files carefully. You have to exclude any non-HTML entry in the logs, you have to remove redirects, you have to sync the dates appropriately to GMT, etc. It took several steps, but eventually the numbers came out fairly close. That being said, this was for smaller sites and not enough for me to include in my evaluation above.
Maybe you could write a post on that ;-)
After reading this post, I realized that I'm still 'a frog in a well' and knowing nothing about GSC tool. I got very valuable and new facts through this post. Points 1 and 2 are quite useful for me, thanks for sharing it.
@Russ What are your thoughts on how the Search Analytic data can/should be used?
Good question.
1. I like comparing landing pages from GSC to GA because discontinuity means that there are redirects Google's Index hasn't picked up yet.
2. I think it is valuable for finding new keywords
At this point, if you can afford a 3rd party rank tracking solution, I would recommend it for just about everything else.
For me, while I agree, data should always be validated and understood etc. the impression data is really valuable, sure it's not always going to be counted but in some cases it's a better indicator of Adwords data.
I've seen instances where Search Analytics data has shown me keywords with nice little volumes, where as Adwords shows these as 0 searches per month.
I think your post about Keyword Planner's Dirty Little Secrets was concluded better, essentially, the data is not 100% accurate, however, use it along with other data and variables to make decisions.
Keywords are also highlighted through impression data, i.e. which appear to be most searched for - again, further investigation and testing can confirm this during your developments.
Position data is a tricky one, due to personalisation and localisation, however, again, this is where further investigations into your data can help.
What would be a good 3rd party rank tracking solution?
Thank You Russ for sharing detailed research
It was a good read I was skeptical about the data which was provided in GSC, like many other, and glad to know that I was right. I just have one query with GSC Section 5: Search Analytics, where you have shown the result of no clicks no impressions. Is there anything to do between the time frame you have done your search and checked the results/impressions/clicks. As a couple of time, I have tested the same thing did not find what I was looking for if I check the impressions and clicks immediately or even after hours of the search. Correct me if I am wrong.
Thank You once again for confirming on my knowledge ;) by providing the detailed search. Much appreciated.
Google Search Console is fundamental to working in digital marketing, but some of its features are too advanced. Thanks for this article, Russ. It has helped me a lot to understand GSC a little better! :)
I admit that I prefer Average Position as an indicator of overall page success over more straightforward keyword rankings. I like the weighted averages and find they generally correlate to what I see when I spot check, so I'm good with that.
Where I get tripped up in GSC is when I look at Average Position by Device Type compared to the catchall Landing Page. Impressions don't add up, average positions can be wildly different in ways that don't make sense (like a mobile at 5, desktop at 6, and somehow Landing Page, which should be a combined bucket, cites it as 8. I would expect to see it between 5 and 6 after weighting, no?) If you've got any insight on why that happens, and which dataset is more trustworthy, I'd love to hear about it.
Solid work, Russ...thanks for putting this all together! Despite some shaky data in GSC, I find it's tremendously valuable for keyword research--finding terms that a page might rank just off page 1 for, and with a little tweaking, we can push that page up onto page 1 and start getting some more traffic. I was surprised to find that the Index Status numbers are relatively accurate--wasn't it back in August when Google admitted that was broken? I guess it's fixed now. Now, if they would fix the initial "About xxxxx results" at the top of the SERPs. That is still often wrong by orders of magnitude!
+1 Michael's comments. Search Analytics is amazingly powerful if you take the time to understand its strengths and weaknesses. Keyword expansion / discovery especially.
Being totally honest, Search Console is a tool I do a daily drive-by with - especially on sites I'm executing brand new content strategy on or for sites in start up mode. Along with screaming frog, Ahrefs, SEMrush and occasionally keywordtool.io this is one of the few valuable tools I'm in constant contact with.
Google's Search Console team deserves our gratitude. It must be, a challenge, getting things done on behalf of the SEO community in that team.
Thanks for the great in-depth article. If you work with GSC for a while, it becomes obvious a lot of the data there is stale and can be misleading if you take it too seriously. I think it's particularly dangerous to rely on the HTML Improvements data to tell you what needs to be fixed on your site. By the time you see it in GSC you're doing damage control. Proactive monitoring with a tool like Screaming Frog will give you more complete and timely info.
As far as the query data, it's great as a starting point for keyword research and rank improvement, but as an analytics tool you're better off using a combo of landing pages and keyword groups as previously commented on.
Thanks for putting the work in, Russ. This is exactly where Moz as an organization shows their incredible value.