Digital marketing is a proudly data-driven field. Yet, as SEOs especially, we often have such incomplete or questionable data to work with, that we end up jumping to the wrong conclusions in our attempts to substantiate our arguments or quantify our issues and opportunities.
In this post, I’m going to outline 4 data analysis pitfalls that are endemic in our industry, and how to avoid them.
1. Jumping to conclusions
Earlier this year, I conducted a ranking factor study around brand awareness, and I posted this caveat:
"...the fact that Domain Authority (or branded search volume, or anything else) is positively correlated with rankings could indicate that any or all of the following is likely:
- Links cause sites to rank well
- Ranking well causes sites to get links
- Some third factor (e.g. reputation or age of site) causes sites to get both links and rankings"
~ Me
However, I want to go into this in a bit more depth and give you a framework for analyzing these yourself, because it still comes up a lot. Take, for example, this recent study by Stone Temple, which you may have seen in the Moz Top 10 or Rand’s tweets, or this excellent article discussing SEMRush’s recent direct traffic findings. To be absolutely clear, I’m not criticizing either of the studies, but I do want to draw attention to how we might interpret them.
Firstly, we do tend to suffer a little confirmation bias — we’re all too eager to call out the cliché “correlation vs. causation” distinction when we see successful sites that are keyword-stuffed, but all too approving when we see studies doing the same with something we think is or was effective, like links.
Secondly, we fail to critically analyze the potential mechanisms. The options aren’t just causation or coincidence.
Before you jump to a conclusion based on a correlation, you’re obliged to consider various possibilities:
- Complete coincidence
- Reverse causation
- Joint causation
- Linearity
- Broad applicability
If those don’t make any sense, then that’s fair enough — they’re jargon. Let’s go through an example:
Before I warn you not to eat cheese because you may die in your bedsheets, I’m obliged to check that it isn’t any of the following:
- Complete coincidence - Is it possible that so many datasets were compared, that some were bound to be similar? Why, that’s exactly what Tyler Vigen did! Yes, this is possible.
- Reverse causation - Is it possible that we have this the wrong way around? For example, perhaps your relatives, in mourning for your bedsheet-related death, eat cheese in large quantities to comfort themselves? This seems pretty unlikely, so let’s give it a pass. No, this is very unlikely.
- Joint causation - Is it possible that some third factor is behind both of these? Maybe increasing affluence makes you healthier (so you don’t die of things like malnutrition), and also causes you to eat more cheese? This seems very plausible. Yes, this is possible.
- Linearity - Are we comparing two linear trends? A linear trend is a steady rate of growth or decline. Any two statistics which are both roughly linear over time will be very well correlated. In the graph above, both our statistics are trending linearly upwards. If the graph was drawn with different scales, they might look completely unrelated, like this, but because they both have a steady rate, they’d still be very well correlated. Yes, this looks likely.
- Broad applicability - Is it possible that this relationship only exists in certain niche scenarios, or, at least, not in my niche scenario? Perhaps, for example, cheese does this to some people, and that’s been enough to create this correlation, because there are so few bedsheet-tangling fatalities otherwise? Yes, this seems possible.
So we have 4 “Yes” answers and one “No” answer from those 5 checks.
If your example doesn’t get 5 “No” answers from those 5 checks, it’s a fail, and you don’t get to say that the study has established either a ranking factor or a fatal side effect of cheese consumption.
A similar process should apply to case studies, which are another form of correlation — the correlation between you making a change, and something good (or bad!) happening. For example, ask:
- Have I ruled out other factors (e.g. external demand, seasonality, competitors making mistakes)?
- Did I increase traffic by doing the thing I tried to do, or did I accidentally improve some other factor at the same time?
- Did this work because of the unique circumstance of the particular client/project?
This is particularly challenging for SEOs, because we rarely have data of this quality, but I’d suggest an additional pair of questions to help you navigate this minefield:
- If I were Google, would I do this?
- If I were Google, could I do this?
Direct traffic as a ranking factor passes the “could” test, but only barely — Google could use data from Chrome, Android, or ISPs, but it’d be sketchy. It doesn’t really pass the “would” test, though — it’d be far easier for Google to use branded search traffic, which would answer the same questions you might try to answer by comparing direct traffic levels (e.g. how popular is this website?).
2. Missing the context
If I told you that my traffic was up 20% week on week today, what would you say? Congratulations?
What if it was up 20% this time last year?
What if I told you it had been up 20% year on year, up until recently?
It’s funny how a little context can completely change this. This is another problem with case studies and their evil inverted twin, traffic drop analyses.
If we really want to understand whether to be surprised at something, positively or negatively, we need to compare it to our expectations, and then figure out what deviation from our expectations is “normal.” If this is starting to sound like statistics, that’s because it is statistics — indeed, I wrote about a statistical approach to measuring change way back in 2015.
If you want to be lazy, though, a good rule of thumb is to zoom out, and add in those previous years. And if someone shows you data that is suspiciously zoomed in, you might want to take it with a pinch of salt.
3. Trusting our tools
Would you make a multi-million dollar business decision based on a number that your competitor could manipulate at will? Well, chances are you do, and the number can be found in Google Analytics. I’ve covered this extensively in other places, but there are some major problems with most analytics platforms around:
- How easy they are to manipulate externally
- How arbitrarily they group hits into sessions
- How vulnerable they are to ad blockers
- How they perform under sampling, and how obvious they make this
For example, did you know that the Google Analytics API v3 can heavily sample data whilst telling you that the data is unsampled, above a certain amount of traffic (~500,000 within date range)? Neither did I, until we ran into it whilst building Distilled ODN.
Similar problems exist with many “Search Analytics” tools. My colleague Sam Nemzer has written a bunch about this — did you know that most rank tracking platforms report completely different rankings? Or how about the fact that the keywords grouped by Google (and thus tools like SEMRush and STAT, too) are not equivalent, and don’t necessarily have the volumes quoted?
It’s important to understand the strengths and weaknesses of tools that we use, so that we can at least know when they’re directionally accurate (as in, their insights guide you in the right direction), even if not perfectly accurate. All I can really recommend here is that skilling up in SEO (or any other digital channel) necessarily means understanding the mechanics behind your measurement platforms — which is why all new starts at Distilled end up learning how to do analytics audits.
One of the most common solutions to the root problem is combining multiple data sources, but…
4. Combining data sources
There are numerous platforms out there that will “defeat (not provided)” by bringing together data from two or more of:
- Analytics
- Search Console
- AdWords
- Rank tracking
The problems here are that, firstly, these platforms do not have equivalent definitions, and secondly, ironically, (not provided) tends to break them.
Let’s deal with definitions first, with an example — let’s look at a landing page with a channel:
- In Search Console, these are reported as clicks, and can be vulnerable to heavy, invisible sampling when multiple dimensions (e.g. keyword and page) or filters are combined.
- In Google Analytics, these are reported using last non-direct click, meaning that your organic traffic includes a bunch of direct sessions, time-outs that resumed mid-session, etc. That’s without getting into dark traffic, ad blockers, etc.
- In AdWords, most reporting uses last AdWords click, and conversions may be defined differently. In addition, keyword volumes are bundled, as referenced above.
- Rank tracking is location specific, and inconsistent, as referenced above.
Fine, though — it may not be precise, but you can at least get to some directionally useful data given these limitations. However, about that “(not provided)”...
Most of your landing pages get traffic from more than one keyword. It’s very likely that some of these keywords convert better than others, particularly if they are branded, meaning that even the most thorough click-through rate model isn’t going to help you. So how do you know which keywords are valuable?
The best answer is to generalize from AdWords data for those keywords, but it’s very unlikely that you have analytics data for all those combinations of keyword and landing page. Essentially, the tools that report on this make the very bold assumption that a given page converts identically for all keywords. Some are more transparent about this than others.
Again, this isn’t to say that those tools aren’t valuable — they just need to be understood carefully. The only way you could reliably fill in these blanks created by “not provided” would be to spend a ton on paid search to get decent volume, conversion rate, and bounce rate estimates for all your keywords, and even then, you’ve not fixed the inconsistent definitions issues.
Bonus peeve: Average rank
I still see this way too often. Three questions:
- Do you care more about losing rankings for ten very low volume queries (10 searches a month or less) than for one high volume query (millions plus)? If the answer isn’t “yes, I absolutely care more about the ten low-volume queries”, then this metric isn’t for you, and you should consider a visibility metric based on click through rate estimates.
- When you start ranking at 100 for a keyword you didn’t rank for before, does this make you unhappy? If the answer isn’t “yes, I hate ranking for new keywords,” then this metric isn’t for you — because that will lower your average rank. You could of course treat all non-ranking keywords as position 100, as some tools allow, but is a drop of 2 average rank positions really the best way to express that 1/50 of your landing pages have been de-indexed? Again, use a visibility metric, please.
- Do you like comparing your performance with your competitors? If the answer isn’t “no, of course not,” then this metric isn’t for you — your competitors may have more or fewer branded keywords or long-tail rankings, and these will skew the comparison. Again, use a visibility metric.
Conclusion
Hopefully, you’ve found this useful. To summarize the main takeaways:
- Critically analyse correlations & case studies by seeing if you can explain them as coincidences, as reverse causation, as joint causation, through reference to a third mutually relevant factor, or through niche applicability.
- Don’t look at changes in traffic without looking at the context — what would you have forecasted for this period, and with what margin of error?
- Remember that the tools we use have limitations, and do your research on how that impacts the numbers they show. “How has this number been produced?” is an important component in “What does this number mean?”
- If you end up combining data from multiple tools, remember to work out the relationship between them — treat this information as directional rather than precise.
Let me know what data analysis fallacies bug you, in the comments below.
Great article Tom! Especially point 3 about trusting the tools you use!
Totally agree!
The text color in the email version of this article sent by Moz is such a light gray, and hard to read, that I have to come to the website to read the article. Maybe it's that way to drive traffic to the site?
Great article by the way, especially the correlation and causation parts.
This is definitely something to be careful of. We have multiple clients, with each different goals. So, each of them need different reports. The hardest part is to standardize analytics and SEO report, so that we can get an accurate picture of how our marketing strategies went; but also to compare from client to client to learn from our strategies.
I'd love to see examples of marketing reports, distributed by company goals, that talks, without being tainted by the person who worked on the report.
Thanks Tom for this article!
First off, excellent article, and I totally agree with all your cautionary tales.
However, I feel compelled to say a few words regarding our Stone Temple link study that you cited early in the post. I feel that the casual reader might infer (not necessarily your intention) that Eric had been guilty of the types of erroneous conclusions you cite in this post. Certainly it is possible that others drew false conclusions from Eric’s study, but I want to make clear that he did no such thing.
On the contrary, anyone who reads Eric’s full study will see that he was very careful to make clear that links are not the be-all and end-all of ranking. The main point and conclusion of the study is simply that by use of a more accurate statistical analysis Eric showed that links are actually more highly correlated with ranking than other studies had shown.
However, you also made an error of excluding an important premise of link correlations studies that sets them apart from many other correlated factors: Google has always been and continues to be very explicit and forthright that links to a page ARE a ranking factor. Not the only factor, and not even a necessary one in all cases, but still a factor. So the question with links is not “are they a factor?” but simply, “how much of a factor are they in aggregate?”” Which is all Eric’s study set out to show.
This sets links apart from many other studies that boldly proclaim things like CTR or amount of direct traffic as actual direct factors simply because they correlate highly, and despite the fact that Googles have explicitly stated they are not direct factors. The reader may decide Google is lying in those cases, but then the burden of proof is on them to demonstrate that the thing MUST be a factor.
Again, a wonderful post, and I’m delighted you are calling out these common fallacies. I just think there are many better examples of studies making unjustified claims you could have cited. (But hey, thanks for the Moz link! ;-
Hi Mark,
Yep - absolutely not criticising correlation studies in general, and I suggested the same caveats in my own earlier this year (https://moz.com/blog/rankings-correlation-study-do...).
I only picked yours and SEMRush's because they're examples I've seen frequently cited recently, so take it as a compliment!
As for links as a ranking factor, I absolutely do think that they can influence rankings, but the picture is complex. I've written about this, too (https://moz.com/blog/state-of-links).
Thanks for the considered comment!
Tom
Thanks for the fast reply, Tom.
I certainly agree that when it comes to any ranking factor, the Facebook relationships status is "It's complicated" ;-) I only jumped in because I think Eric makes that abundantly clear in his study, stating explicitly that while links remain a powerful influence on ranking, they are far from the only important factor. He uses the example of the increasing importance of content quality, and that if your content sucks compared to your competition, links aren't going to help you all that much in a competitive space. Google isn't foolish enough to make anything hinge on just one factor these days, and as you point out in your other post, they certainly have the capacities to make many other judgments about what should rank.
My only purpose in commenting was not to accuse you of besmirching our study (I know that wasn't your intent!), but to make clear for the casual reader that our study was indeed very careful about the pitfalls you cite (including not jumping to conclusions), and was measured and balanced in its claims.
As I said earlier, all that aside, your post is extremely valuable, because even the most carefully researched, analyzed, and explicated studies can still be misinterpreted (or over-interpreted) by others once they ae out in the wild. Thanks again for providing this valuable resource to our community!
Great article Tom but a bit complex
muy buen post !!!
I always learn new things from you. Many times completely new things, many other details that happened on high but that immediately show me start to put into practice.
Great article,
I would be lying if in the beginning i didn't jump to conclusions when analyzing all the data for a site.
Analyzing to see if I can explain the changes as coincidences, reverse causation, or joint causation is a great tip that I will add to my analysis technique
Great guide
And I emphasize the importance of point 3. Many times, depending on where we evaluate the results will be different in some places than in others and the most normal thing is that in no case they are telling us the whole truth
Great articles on being careful about collecting and presenting wrong data. Sometimes we do look on the wrong directions in certain circumstances and overlook some facts.
No. 3 is my favorite because we always tend to do that but your idea was indeed very strong.
Great article on listing ways to avoid these data analysis pitfalls that are endemic.