If you’ve ever compared two analytics implementations on the same site, or compared your analytics with what your business is reporting in sales, you’ve probably noticed that things don’t always match up. In this post, I’ll explain why data is missing from your web analytics platforms and how large the impact could be. Some of the issues I cover are actually quite easily addressed, and have a decent impact on traffic — there’s never been an easier way to hit your quarterly targets. ;)
I’m going to focus on GA (Google Analytics), as it's the most commonly used provider, but most on-page analytics platforms have the same issues. Platforms that rely on server logs do avoid some issues but are fairly rare, so I won’t cover them in any depth.
Side note: Our test setup (multiple trackers & customized GA)
On Distilled.net, we have a standard Google Analytics property running from an HTML tag in GTM (Google Tag Manager). In addition, for the last two years, I’ve been running three extra concurrent Google Analytics implementations, designed to measure discrepancies between different configurations.
(If you’re just interested in my findings, you can skip this section, but if you want to hear more about the methodology, continue reading. Similarly, don’t worry if you don’t understand some of the detail here — the results are easier to follow.)
Two of these extra implementations — one in Google Tag Manager and one on page — run locally hosted, renamed copies of the Google Analytics JavaScript file (e.g. www.distilled.net/static/js/au3.js, instead of www.google-analytics.com/analytics.js) to make them harder to spot for ad blockers. I also used renamed JavaScript functions (“tcap” and “Buffoon,” rather than the standard “ga”) and renamed trackers (“FredTheUnblockable” and “AlbertTheImmutable”) to avoid having duplicate trackers (which can often cause issues).
This was originally inspired by 2016-era best practice on how to get your Google Analytics setup past ad blockers. I can’t find the original article now, but you can see a very similar one from 2017 here.
Lastly, we have (“DianaTheIndefatigable”), which just has a renamed tracker, but uses the standard code otherwise and is implemented on-page. This is to complete the set of all combinations of modified and unmodified GTM and on-page trackers.
Overall, this table summarizes our setups:
Tracker |
Renamed function? |
GTM or on-page? |
Locally hosted JavaScript file? |
---|---|---|---|
Default |
No |
GTM HTML tag |
No |
FredTheUnblockable |
Yes - “tcap” |
GTM HTML tag |
Yes |
AlbertTheImmutable |
Yes - “buffoon” |
On page |
Yes |
DianaTheIndefatigable |
No |
On page |
No |
I tested their functionality in various browser/ad-block environments by watching for the pageviews appearing in browser developer tools:
Reason 1: Ad Blockers
Ad blockers, primarily as browser extensions, have been growing in popularity for some time now. Primarily this has been to do with users looking for better performance and UX on ad-laden sites, but in recent years an increased emphasis on privacy has also crept in, hence the possibility of analytics blocking.
Effect of ad blockers
Some ad blockers block web analytics platforms by default, others can be configured to do so. I tested Distilled’s site with Adblock Plus and uBlock Origin, two of the most popular ad-blocking desktop browser addons, but it’s worth noting that ad blockers are increasingly prevalent on smartphones, too.
Here’s how Distilled’s setups fared:
(All numbers shown are from April 2018)
Setup | Vs. Adblock | Vs. Adblock with “EasyPrivacy” enabled | Vs. uBlock Origin |
---|---|---|---|
GTM | Pass | Fail | Fail |
On page | Pass | Fail | Fail |
GTM + renamed script & function | Pass | Fail | Fail |
On page + renamed script & function | Pass | Fail | Fail |
Seems like those tweaked setups didn’t do much!
Lost data due to ad blockers: ~10%
Ad blocker usage can be in the 15–25% range depending on region, but many of these installs will be default setups of AdBlock Plus, which as we’ve seen above, does not block tracking. Estimates of AdBlock Plus’s market share among ad blockers vary from 50–70%, with more recent reports tending more towards the former. So, if we assume that at most 50% of installed ad blockers block analytics, that leaves your exposure at around 10%.
Reason 2: Browser “do not track”
This is another privacy motivated feature, this time of browsers themselves. You can enable it in the settings of most current browsers. It’s not compulsory for sites or platforms to obey the “do not track” request, but Firefox offers a stronger feature under the same set of options, which I decided to test as well.
Effect of “do not track”
Most browsers now offer the option to send a “Do not track” message. I tested the latest releases of Firefox & Chrome for Windows 10.
Setup | Chrome “do not track” | Firefox “do not track” | Firefox “tracking protection” |
---|---|---|---|
GTM | Pass | Pass | Fail |
On page | Pass | Pass | Fail |
GTM + renamed script & function | Pass | Pass | Fail |
On page + renamed script & function | Pass | Pass | Fail |
Again, it doesn’t seem that the tweaked setups are doing much work for us here.
Lost data due to “do not track”: <1%
Only Firefox Quantum’s “Tracking Protection,” introduced in February, had any effect on our trackers. Firefox has a 5% market share, but Tracking Protection is not enabled by default. The launch of this feature had no effect on the trend for Firefox traffic on Distilled.net.
Reason 3: Filters
It’s a bit of an obvious one, but filters you’ve set up in your analytics might intentionally or unintentionally reduce your reported traffic levels.
For example, a filter excluding certain niche screen resolutions that you believe to be mostly bots, or internal traffic, will obviously cause your setup to underreport slightly.
Lost data due to filters: ???
Impact is hard to estimate, as setup will obviously vary on a site-by site-basis. I do recommend having a duplicate, unfiltered “master” view in case you realize too late you’ve lost something you didn’t intend to.
Reason 4: GTM vs. on-page vs. misplaced on-page
Google Tag Manager has become an increasingly popular way of implementing analytics in recent years, due to its increased flexibility and the ease of making changes. However, I’ve long noticed that it can tend to underreport vs. on-page setups.
I was also curious about what would happen if you didn’t follow Google’s guidelines in setting up on-page code.
By combining my numbers with numbers from my colleague Dom Woodman’s site (you’re welcome for the link, Dom), which happens to use a Drupal analytics add-on as well as GTM, I was able to see the difference between Google Tag Manager and misplaced on-page code (right at the bottom of the <body> tag) I then weighted this against my own Google Tag Manager data to get an overall picture of all 5 setups.
Effect of GTM and misplaced on-page code
Traffic as a percentage of baseline (standard Google Tag Manager implementation):
Google Tag Manager |
Modified & Google Tag Manager |
On-Page Code In <head> |
Modified & On-Page Code In <head> |
On-Page Code Misplaced In <Body> |
|
---|---|---|---|---|---|
Chrome |
100.00% |
98.75% |
100.77% |
99.80% |
94.75% |
Safari |
100.00% |
99.42% |
100.55% |
102.08% |
82.69% |
Firefox |
100.00% |
99.71% |
101.16% |
101.45% |
90.68% |
Internet Explorer |
100.00% |
80.06% |
112.31% |
113.37% |
77.18% |
There are a few main takeaways here:
- On-page code generally reports more traffic than GTM
- Modified code is generally within a margin of error, apart from modified GTM code on Internet Explorer (see note below)
- Misplaced analytics code will cost you up to a third of your traffic vs. properly implemented on-page code, depending on browser (!)
- The customized setups, which are designed to get more traffic by evading ad blockers, are doing nothing of the sort.
It’s worth noting also that the customized implementations actually got less traffic than the standard ones. For the on-page code, this is within the margin of error, but for Google Tag Manager, there’s another reason — because I used unfiltered profiles for the comparison, there’s a lot of bot spam in the main profile, which primarily masquerades as Internet Explorer. Our main profile is by far the most spammed, and also acting as the baseline here, so the difference between on-page code and Google Tag Manager is probably somewhat larger than what I’m reporting.
I also split the data by mobile, out of curiosity:
Traffic as a percentage of baseline (standard Google Tag Manager implementation):
Google Tag Manager | Modified & Google Tag Manager | On-Page Code In <head> | Modified & On-Page Code In <head> | On-Page Code Misplaced In <Body> | |
---|---|---|---|---|---|
Desktop | 100.00% | 98.31% | 100.97% | 100.89% | 93.47% |
Mobile | 100.00% | 97.00% | 103.78% | 100.42% | 89.87% |
Tablet | 100.00% | 97.68% | 104.20% | 102.43% | 88.13% |
The further takeaway here seems to be that mobile browsers, like Internet Explorer, can struggle with Google Tag Manager.
Lost data due to GTM: 1–5%
Google Tag Manager seems to cost you a varying amount depending on what make-up of browsers and devices use your site. On Distilled.net, the difference is around 1.7%; however, we have an unusually desktop-heavy and tech-savvy audience (not much Internet Explorer!). Depending on vertical, this could easily swell to the 5% range.
Lost data due to misplaced on-page code: ~10%
On Teflsearch.com, the impact of misplaced on-page code was around 7.5%, vs Google Tag Manager. Keeping in mind that Google Tag Manager itself underreports, the total loss could easily be in the 10% range.
Bonus round: Missing data from channels
I’ve focused above on areas where you might be missing data altogether. However, there are also lots of ways in which data can be misrepresented, or detail can be missing. I’ll cover these more briefly, but the main issues are dark traffic and attribution.
Dark traffic
Dark traffic is direct traffic that didn’t really come via direct — which is generally becoming more and more common. Typical causes are:
- Untagged campaigns in email
- Untagged campaigns in apps (especially Facebook, Twitter, etc.)
- Misrepresented organic
- Data sent from botched tracking implementations (which can also appear as self-referrals)
It’s also worth noting the trend towards genuinely direct traffic that would historically have been organic. For example, due to increasingly sophisticated browser autocompletes, cross-device history, and so on, people end up “typing” a URL that they’d have searched for historically.
Attribution
I’ve written about this in more detail here, but in general, a session in Google Analytics (and any other platform) is a fairly arbitrary construct — you might think it’s obvious how a group of hits should be grouped into one or more sessions, but in fact, the process relies on a number of fairly questionable assumptions. In particular, it’s worth noting that Google Analytics generally attributes direct traffic (including dark traffic) to the previous non-direct source, if one exists.
Discussion
I was quite surprised by some of my own findings when researching this post, but I’m sure I didn’t get everything. Can you think of any other ways in which data can end up missing from analytics?
Nothing like a link to sweeten a public call-out for cocking up an analytics implementation....
I will at some point fix this.
Indeed Great post Tom,
Have been figuring out for the past few years, the robot data is missing. Google analytics shows limited data and hides the bots data. The same happens for Google Adsense wherein the data is missing in some segments.
Helpful Topic..
Dang I had no idea that Ad Blocker extensions could block Analytics data and that Google Tag Manager generally leads to less data being reported when compared to having a Analytics tag in the <head>. Very insightful findings here, thank you for putting this together Tom!
If you are not at the parrot of the new update you will lose many valuable data, so you have to give all the data appear, and avoid the new analytics guidelines, which leave you without these.
Great post. What about nofollow and logged-in user data? Are they tracked in GA? Is it possible that one of our client couldn't see that traffic is coming if the user is logged in his profile on our website when going to our client's website? Also, how are nofollow links tracked in GA?
If you have tracking code in place, I don't see why this data would be missing. Do you have a mechanism in mind?
If fact this is aexactly the situation I have: Traffic from website A is seen in GA of website B as direct traffic. In website a the link is nofollowed. Could this be the reason of the blackhole in data?
There are a few reasons this could happen, depending how you've tested it. However, I'm not aware of any special treatment by Google Analytics of nofollow links, and I'm not sure how Google Analytics would even be able to identify whether that was the case - you're not marking up the nofollow tag in the referrer, and GA doesn't have the capacity to go back and crawl referring pages.
Thanks Tom for sharing with us the detailed post. Its very helpful for us to understand what is the going in Google Analytics and our minor mistakes in one page to add code on right place.
Thank you for the detailed post!
What experience do you have with traffic coming from social campaigns? I believe I read an article in the recent past that said something about traffic coming from an app (such as Facebook) not properly being tracked in GA due to an issue with JS not being tracked when traffic is coming from an app.
I've run many FB paid campaigns that have clicks to the website but in many cases are 0:00 time on site with avg page of 1. I know that there will be some people who accidentally click on an ad and then will click back before the page even loads, but I would think that there's something not being tracked on GA.
Thoughts?
Any ad blocker using a a list like EasyPrivacy will block the measurement protocol requests made to google-analytics.com directly, so there's definitely no point in hosting/renaming the analytics.js file alone. It looks like the article you link also does proxy their measurement protocol requests which would get around that block.
I'd be interested to read more about what you think is going on with GTM on IE.
GTM itself is blocked by EasyPrivacy and others, but I would expect the measurement gap has more to do with chaining GA via GTM than the rare instance of someone that has GTM blocked but not GA.
Great comment - I forgot to mention that proxying the hits would be needed for a full blooded avoidance setup. I think that part was missing from the original article I read a couple of years ago, hence not being reflected in my experiment.
I think given the difference between smartphone and desktop, as well as IE Vs Chrome, it could be a simple performance issue - perhaps people are giving up on pages before everything has run on slower systems.
Thanks Tom, this has really got my brain ticking! Therefore would you recommend to implement the GA code directly on page and have all the other tags in GTM to get the most accurate figures?
I think, to be fair, that if you want a totally exhaustive setup, any form of GA is going to be inferior to a server side solution.
Personally I think the flexibility of GA outweighs the traffic difference on older systems.
Hey Tom Capper,
Thanks for bringing this topic!
I have one of the most recent case where I saw that one of my clients traffic reduced by 49% Globally! The final out result of this issue was they have added Filter in one of their most Important Google analytics View and due to this they were loosing this data. We were fortunate enough that this data loss was of just 3-4 day, and no major impact happen.
But I really Like your suggestion of "Un-filtered" Master View and to be honest I have just implemented this to all my client GA accounts, just before appreciating you via this comment!
Thanks for a Great help
Cheers
Great breakdown of these 4 reasons for missing Analytics data Tom. I've never seen this topic covered in such detail, nor did I know there was some much opportunity for GA to miss data! Crazy stuff, thanks for sharing your data and insight here sir.
Hi Tom,
Nice to see some of our assumptions back by data!
I was wondering though, how come that GTM GA reports less traffic than the standard on page code?
Cheers, Ivo
I suspect it has to do with performance and/or support for complex JavaScript on older and slower systems.
Wow, so this is pretty much all news to me. I had no idea if you setup Analytics via GTM that it would lead to reporting less data in Analytics (when compared to directly installing the Analytics code in the <head> section. The impact of people using Ad Blocker extensions was also unknown to me! Thanks for laying all this out and sharing Tom!
Congratulations for the good job! I had no idea that Ad Blocker extension could block analytics data, thanks for sharing this with us. I found it really useful, I am going to start working on this right now.
Thank your for the amazing tips!
Very nice article. I'm very frustrated when I see my metrics fluctuate, specially when I have a day where I lot of people access the site but the analytics can't retrieve their data (suck as time on page, for instance). Now I understand a bit more of what is happening behind the codes.
The deeper into the weeds I go, the less credence I give to the efficacy of GA data. All relative of course. We make the best of it.
A daunting task to communicate to clients, but content like this makes it a bit easier. Thanks Tom.
Wow Tom! What a wonderful and well thought out experiment. I just forwarded this to my clients. Thank you for all of your hard work on this! I'm grateful that we have analysts like you in the community!
One follow-up question: Did you group Internet Explorer and Edge together in the results above? I ask because I think it is probable that Internet Explorer would perform worse than Edge as IE is older and now obsolete software.
Hi Danny, thanks for the kind words. I didn't group IE with Edge - it still gets decent traffic levels on both of the sites in the experiment, and I thought it was interesting as an extreme example. Hope that helps!
Thank you for doing this research. It's a very interesting article on something I had never considered. I've been drinking the Google Kool-Aid when it comes to GTM, and am now questioning my decision to use it for GA implementation.
Man! Every time I turn on the computer I come across new things that I need to learn in order to stay abreast of what goes on in the SEO world! Thanks for all these helpful articles and great information on this particular one!
I find impressive how people here on Moz can come with new approaches and tests to backup assumptions with very consistent data. It's a quite difficult job, very analytical!
It's more impressive when you remember they are not google, so they only have clues of what is happening and have to figure everything by themselves. Goof Job Tom and Moz Team!
Excellent post Tom! As always ... I have learned things that I did not take into account when working and that from now on they will help me to improve insurance.
It is normal that google hides some data in the statistics to make it more difficult to get the SEO. Thank you for finding the lost data of the analytics and for the work done. A greeting.
So i'm not going to lie, the fact that Analytics is missing data to this degree is shocking. I had no idea that ad blocker extensions and using GTM had that much of an impact. Awesome topic and examples Tom!
Hi tom very good post and the valid and many unknown points are mentioned in the post, i think google analytics does not give a detailed information regarding the analytics report of website traffic from social networking sites
Thanks for the post but need to see if there are any implications in the future due to GDPR. Google might remove some other important features also.
This is an excellent article and I agree with the statements that you make about ad blockers. Our company, Adtoniq, is currently implementing a technology that creates a win-win situation for ad blocker users and websites that use Google Analytics and other software that could be disrupted by ad blockers. The data that we have collected from our product users is consistent with your statements.
I think that analytics will never stop advancing.
Hi Tom, very good post and very interesting without a doubt. In my case I have never considered making measurements that were not with Google Analytics. Here I see that you do it with other meters and with different parameters. It can be very interesting for large companies, but it can also be interesting for companies that are starting and that still does not have millions of visits, but thousands.
A greeting from Spain!
Thanks for this post! But I think it's time to update it, taking into account GDPR compliance changes. As for now almost all the data on EU residents can't be processed without their consents... So probably it can affect a lot on all the analytics you had before. By the way, have you guys seen how Chicago Tribune acts? They closed access for them :) Probably, not the best idea, but I think in nearest future we all will comply with GDPR regulations, as it's a cool thing to have in internet. So hope to see updated topic with research on GDPR results for analytics
Hi Kosta, I don't see how gdpr could cause any sessions to go missing altogether, unless you mean due to deletion by Google? I think it's a bit of a separate topic to be honest, but there are lots of other good posts out there, including two recently here on Moz and one over on the Distilled blog.
Thanks for your answer, Tom!
No, I meant a little bit another thing. You were talking on how adblocks change analytics results (impressive research), while I was just curious what a great influence GDPR makes on analytics. As for now, it's not enough to avoid adblocks and browser setting 'do not track', you should also have a user consent to track activity, so it's a huge number of missed analytics.
I think, I should say in other way my previous message. I'm not waiting for 'updated topic' but for a 'new topic' on GDPR results for analytics.
Thanks very much for your contribution!