In an industry where knowing exactly how to get ranked on Google is murky at best, SEO ranking factors studies can be incredibly alluring. But there's danger in believing every correlation you read, and wisdom in looking at it with a critical eye. In this Whiteboard Friday, Rand covers the myths and realities of correlations, then shares a few smart ways to use and understand the data at hand.
Video Transcription
Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we are chatting about SEO ranking factors and the challenge around understanding correlation, what correlation means when it comes to SEO factors.
So you have likely seen, over the course of your career in the SEO world, lots of studies like this. They're usually called something like ranking factors or ranking elements study or the 2017 ranking factors, and a number of companies put them out. Years ago, Moz started to do this work with correlation stuff, and now many, many companies put these out. So people from Searchmetrics and I think Ahrefs puts something out, and SEMrush puts one out, and of course Moz has one.
These usually follow a pretty similar format, which is they take a large number of search results from Google, from a specific country or sometimes from multiple countries, and they'll say, "We analyzed 100,000 or 50,000 Google search results, and in our set of results, we looked at the following ranking factors to see how well correlated they were with higher rankings." That is to say how much they predicted that, on average, a page with this factor would outrank a page without the factor, or a page with more of this factor would outrank a page with less of this factor.
Correlation in SEO studies like these usually mean:
So, basically, in an SEO study, they usually mean something like this. They do like a scatter plot. They don't have to specifically do a scatter plot, but visualization of the results. Then they'll say, "Okay, linking root domains had better correlation or correlation with higher organic rankings than the 10 blue link-style results to the degree of 0.39." They'll usually use either Spearman or Pearson correlation. We won't get into that here. It doesn't matter too much.
Across this many searches, the metric predicted higher or lower rankings with this level of consistency. 1.0, by the way, would be perfect correlation. So, for example, if you were looking at days that end in Y and days that follow each other, well, there's a perfect correlation because every day's name ends in Y, at least in English.
So search visits, let's walk down this path just a little bit. So search visits, saying that that 0.47 correlated with higher rankings, if that sounds misleading to you, it sounds misleading to me too. The problem here is that's not necessarily a ranking factor. At least I don't think it is. I don't think that the more visits you get from search from Google, the higher Google ranks you. I think it's probably that the correlation runs the other way around — the higher you rank in search results, the more visits on average you get from Google search.
So these ranking factors, I'll run through a bunch of these myths, but these ranking factors may not be factors at all. They're just metrics or elements where the study has looked at the correlation and is trying to show you the relationship on average. But you have to understand and intuit this information properly, otherwise you can be very misled.
Myths and realities of correlation in SEO
So let's walk through a few of these.
1. Correlation doesn't tell us which way the connection runs.
So it does not say whether factor X influences the rankings or whether higher rankings influences factor X. Let's take another example — number of Facebook shares. Could it be the case that search results that rank higher in Google oftentimes get people sharing them more on Facebook because they've been seen by more people who searched for them? I think that's totally possible. I don't know whether it's the case. We can't prove it right here and now, but we can certainly say, "You know what? This number does not necessarily mean that Facebook shares influence Google results." It could be the case that Google results influence Facebook searches. It could be the case that there's a third factor that's causing both of them. Or it could be the case that there's, in fact, no relationship and this is merely a coincidental result, probably unlikely given that there is some relationship there, but possible.
2. Correlation does not imply causation.
This is a famous quote, but let's continue with the famous quote. But it sure is a hint. It sure is a hint. That's exactly what we like to use correlation for is as a hint of things we might investigate further. We'll talk about that in a second.
3. In an algorithm like Google's, with thousands of potential ranking inputs, if you see any single metric at 0.1 or higher, I tend to think that, in general, that is an interesting result.
Not prove something, not means that there's a direct correlation, just it is interesting. It's worthy of further exploration. It's worthy of understanding. It's worthy of forming hypotheses and then trying to prove those wrong. It is interesting.
4. Correlation does tell us what more successful pages and sites do that less successful sites and pages don't do.
Sometimes, in my opinion, that is just as interesting as what is actually causing rankings in Google. So you might say, "Oh, this doesn't prove anything." What it proves to me is pages that are getting more Facebook shares tend to do a good bit better than pages that are not getting as many Facebook shares.
I don't really care, to be honest, whether that is a direct Google ranking factor or whether that's just something that's happening. If it's happening in my space, if it's happening in the world of SERPs that I care about, that is useful information for me to know and information that I should be applying, because it suggests that my competitors are doing this and that if I don't do it, I probably won't be as successful, or I may not be as successful as the ones who are. Certainly, I want to understand how they're doing it and why they're doing it.
5. None of these studies that I have ever seen so far have looked specifically at SERP features.
So one of the things that you have to remember, when you're looking at these, is think organic, 10 blue link-style results. We're not talking about AdWords, the paid results. We're not talking about Knowledge Graph or featured snippets or image results or video results or any of these other, the news boxes, the Twitter results, anything else that goes in there. So this is kind of old-school, classic organic SEO.
6. Correlation is not a best practice.
So it does not mean that because this list descends and goes down in this order that those are the things you should do in that particular order. Don't use this as a roadmap.
7. Low correlation does not mean that a metric or a tactic doesn't work
Example, a high percent of sites using a page or a tactic will result in a very low correlation. So, for example, when we first did this study in I think it was 2005 that Moz ran its first one of these, maybe it was '07, we saw that keyword use in the title element was strongly correlated. I think it was probably around 0.2, 0.15, something like that. Then over time, it's gone way, way down. Now, it's something like 0.03, extremely small, infinitesimally small.
What does that mean? Well, it could mean one of two things. It could mean Google is using it less as a ranking factor. It could mean that it was never connected, and it's just total speculation, total coincidence. Or three, it could mean that a lot more people who rank in the top 20 or 30 results, which is what these studies usually look at, top 10 to top 50 sometimes, a lot more of them are putting the keyword in the title, and therefore, there's just no difference between result number 31 and result number 1, because they both have them in the title. So you're seeing a much lower correlation between pages that don't have them and do have them and higher rankings. So be careful about how you intuit that.
Oh, one final note. I did put -0.02 here. A negative correlation means that as you see less of this thing, you tend to see higher rankings. Again, unless there is a strong negative correlation, I tend to watch out for these, or I tend to not pay too much attention. For example, the keyword in the meta description, it could just be that, well, it turns out pretty much everyone has the keyword in the meta description now, so this is just not a big differentiating factor.
What is correlation good for?
All right. What's correlation actually good for? We talked about a bunch of myths, ways not to use it.
A. IDing the elements that more successful pages tend to have
So if I look across a correlation and I see that lots of pages are twice as likely to have X and rank highly as the ones that don't rank highly, well, that is a good piece of data for me.
B. Watching elements over time to see if they rise or lower in correlation.
For example, we watch links very closely over time to see if they rise or lower so that we can say: "Gosh, does it look like links are getting more or less influential in Google's rankings? Are they more or less correlated than they were last year or two years ago?" And if we see that drop dramatically, we might intuit, "Hey, we should test the power of links again. Time for another experiment to see if links still move the needle, or if they're becoming less powerful, or if it's merely that the correlation is dropping."
C. Comparing sets of search results against one another we can identify unique attributes that might be true
So, for example, in a vertical like news, we might see that domain authority is much more important than it is in fitness, where smaller sites potentially have much more opportunity or dominate. Or we might see that something like https is not a great way to stand out in news, because everybody has it, but in fitness, it is a way to stand out and, in fact, the folks who do have it tend to do much better. Maybe they've invested more in their sites.
D. Judging metrics as a predictive ranking ability
Essentially, when I'm looking at a metric like domain authority, how good is that at telling me on average how much better one domain will rank in Google versus another? I can see that this number is a good indication of that. If that number goes down, domain authority is less predictive, less sort of useful for me. If it goes up, it's more useful. I did this a couple years ago with Alexa Rank and SimilarWeb, looking at traffic metrics and which ones are best correlated with actual traffic, and found Alexa Rank is awful and SimilarWeb is quite excellent. So there you go.
E. Finding elements to test
So if I see that large images embedded on a page that's already ranking on page 1 of search results has a 0.61 correlation with the image from that page ranking in the image results in the first few, wow, that's really interesting. You know what? I'm going to go test that and take big images and embed them on my pages that are ranking and see if I can get the image results that I care about. That's great information for testing.
This is all stuff that correlation is useful for. Correlation in SEO, especially when it comes to ranking factors or ranking elements, can be very misleading. I hope that this will help you to better understand how to use and not use that data.
Thanks. We'll see you again next week for another edition of Whiteboard Friday.
Video transcription by Speechpad.com
The image used to promote this post was adapted with gratitude from the hilarious webcomic, xkcd.
I often feel these studies do more harm than good. Too many people just read these reports and then change their site based on the findings.
Even the term, "ranking factors study" is misleading and unhelpful. The dictionary definition of "factor" is "a circumstance, fact, or influence that contributes to a result". These studies show nothing of the kind and should be more accurately named as a "Ranking Correlation Study".
It doesn't help to reduce the confusion when SEO tool companies describe their methodology as "... unique to the field of SEO studies — we traded correlation analysis for the Random Forest machine learning algorithm".
Once you understand that US sales of ice cream are closely correlated to the US murder rate, you can then view these studies with the appropriate perspective.
The 10 Most Bizarre Correlations:
https://www.buzzfeed.com/kjh2110/the-10-most-bizar...
To benefit from correlation you need to understand what it is. It's very clear that this study is based on correlations in the SERP to highlight differences. Some of those might be direct ranking factors, others could be indirect ranking factors and others should be more or less ignored. But you might also find other interesting data.
If you get correlation data from a research you have interesting data to look at. If you just put together random data to find a correlation (ice cream sale vs. murder rate) you have meaningless data.
If you don't have enough knowledge to actually benefit from the data, then it's recommended to ignore it all together. It's the same in other fields. I would never judge correlation data of medicine because I lack enough knowledge to benefit from it.
Edit: The point is that they do not do more harm than good. They are actually quite helpful and interesting.
They may be of some help to those that understand the distinction.
Unfortunately, I talk to far too many business owners and marketing execs who do not understand this distinction. Most of them have a very limited understanding of statistics.
They see these reports posted on social media, believe that these are causation factors and then act on them.
I am not suggesting that these reports should not be published, but it should be explained in much clearer terms what they actually represent.
Agreed Danny - "Correlation Study" or "Correlated Metrics" would be far better titles, but also less sexy (and less keyword optimized, since all the search volume is for "ranking factors").
Intersting..
Thanks for watching WBFriday all! I had a good email exchange with Ted Kubaitis of SEOToolLab.com that he's given permission for me to share here (and I think is worth a read):
From Ted:
I watched your recent video on correlation and feel you got a few things wrong. If you like I would be happy to come on WBF and discuss.
Reply from Rand:
Follow-Up from Ted:
Hope this is as useful to y'all as it was to me!
Thanks Rand for sharing this. It's interesting and useful to read conversations like this.
Well I still believe relevance has to do great part when it comes to ranking. Traffic is obviously important as well, because without traffic, website is dead like a body without blood.
If you ask me you need to do proper on-page optimization, have the right keyword density, write valuable unique content that is for humans to read which is obvious. Also do geo-tagging if you want to rank locally and include NAP for specific location.
Then it comes to off-page and link building, I've noticed that some people that had very similar links and power on the website, but other person had more Facebook shares and social signals in general ranked higher on google. So you are right this is definitely something you should look for. Also it would be unrealistic to have all the links and social signals but no traffic.
Not sure what other people think, would like to hear other opinions as well. :)
I completely agree nickey without traffic corresponding to links you will never rank because you site is seen as spam by google....
I also think relevance is king without proper content with your keywords in mind (on site seo) is the best ranking method you dont even need links!
Good WBF Rand! It will sort out a lot of misunderstandings about correlations.
Correlations are interesting and it requires at least a decent understanding of the subject to be able to sort out the good parts. You need to be able to argue logically about a correlation to know the meaning of it. If you can't do that you lack information to jump to any conclusions at all. To understand when you lack information can often be a problem.
An interesting thing about collective correlation data like this is to find different sites/pages that based on the data should rank equally but are not. Comparing cases like that can open new doors or identify differences that is really hard to collect (like UI/UX).
Agreed! Always hard to know what pages are actually "similar" enough to warrant that investigation, but for testing purposes, a new domain or multiple new pages on the same domain linked internally in similar ways can certainly help.
Solid overall analysis and breakdown of the difference between causation & correlation. I've seen alot of "ranking factors" throughout the years that were more correlation, but people treated as 100% causation.
Another great WBF to kickoff 2018! Looking forward to more awesome ones this year as well Rand.
Want to reference the ranking factors post we put out a couple of days ago if I may (https://www.branded3.com/blog/seo-ranking-factors-...): we didn't analyse lots of queries...we took the analysis of lots of queries that Moz and its contributors put out 2 years ago and, over those 2 years, attempted to influence those factors with the actions we took (we regularly write about what those actions are too).
For me that's the point of these studies: not "what is Google doing?" because I doubt even Google really knows, but "what can I do?" and this is where you're
Appreciate you linking to that! I didn't know you'd have it out when we filmed this or I would have mentioned and included it.
excellent post Rand !! I did not think the FB shares were so strong. This shows and reinforces the attention we must give to our FB fanpage and get followers segmented and interested in our subject.
Great WBF as usual Rand.
In my opinion if you want to be successful in SEO you have to be agile and be always open to change your approach as if more competitors are actually implementing similar techniques you might rank lower in time. Having said that it is also important to be relevant with your copy rather then just pack loads of keywords in.
As Google algorithm changes fast and towards the direction of AI I guess in 5-6 years time we won't necessarily have to use any keywords in a copy and still being able to rank high (remember that we can have a chat on a topic without mentioning "keywords" and still knowing and understanding what we're talking about).
I agree with you, the use of keywords is becoming less and less important, so in a few years' time it may not be necessary to include them.
I'm going to disagree for three reasons:
1) I think smart, high quality content creators will still bias to use keywords intelligently, and thus Google's ML algorithms will notice that content which tends to be more relevant/useful also tends to have smart keyword targeting
2) Keywords are a user signal too -- when you search for something and see a result that has words that generally mean the same thing but aren't exactly the same, it takes some extra processing power in your brain to connect those up vs. very close keyword matches that require less of that. The "processing fluency" bias is well observed in human beings and I think we'll keep seeing it in searcher behavior.
3) For a lot of search queries, especially in the mid and long tail (though also present in the head), keyword use is hard to separate from relevance in any way. A query for "Lin Manuel Miranda" that results in pages that don't use his name? Or even pages that do but aren't specifically titled with it? Just doesn't make sense. Same in the tail -- a query for "Excel macro to average all columns in a row" that doesn't have at least "Excel" and "Macro" and "columns in a row" just won't be relevant.
I agree that in some cases, especially around news topics or research topics, there will be increasing flexibility on exact match vs. intent match of keyword use. But I don't think keywords are going away even in the long term. They're how we think, how we search, and how we determine if the information we find is relevant.
These "correlations" and "ranking factors" annoy me to no end and are a huge disservice to the SEO community, I wish MOZ would stop publishing them. They are bad applications of statistics, even more so now with the changes in how the algorithm functions, machine learning /A.I. makes it all that much harder to ascertain what element gets how much weight and consideration. This is far too a simplistic approach, especially when you know that rankings are a composite of many, many, many more considerations.
My advice and what we do in my group is think of how a human being would process and absorb information/content and optimize page structure, text, copy, etc. accordingly.
Agree with you 100% Jose. I think they get people focused on all of the wrong things.
When it comes to content creation, we need to be doing more search user optimization and less search engine optimization. That, after all, is exactly what Google is optimizing for!
LOVED this!
Thank you Rand for tanking us back to statistics 101.
It's super important to understand that a lot of factors affect correlation and it shouldn't be taken as one-to-one.
Super-minor point, but I assume the green number in the X/Y chart should be 0.31 not 0.39.
Thanks for Fridays.
People have been looking for things that correlate with Google search results for a long time. Back in about the late 1990s Web Position Gold looked at onpage optimization elements and recommended changes to your page. That was ingenious at the time, but Google had already moved on to links.
Today we might look at lots of different factors that include "observable" information like onpage factors, links, social, etc. However, Google has probably moved on to "unobservable" information or difficult to observe information such as visitor behaviors in the SERPs, visitor behavior on the site, assessments of E-A-T, complex content assessment and more. So, correlations that we are able to identify are only a fraction of what Google has to work with - and the unobservable is probably the more valuable, I am guessing.
Totally agree that the un-observable play a big role, but we also keep getting access to more and more of the kinds of data Google gets (in particular, clickstream data from providers like SimilarWeb and Jumpshot). I think those types of factors will be really fascinating to watch and hopefully some of the big SEO firms (Moz included) will make some efforts to do so.
Good stuff. Most of those that I've seen citing the Moz/SEMrush correlation studies appear to be misusing them; sharing and using as flat fact. This should probably be a disclaimer alongside future annual studies.
We chose patent filings + direct statements + experimental studies for our list (https://northcutt.com/wr/google-ranking-factors/), requiring 2 of the 3 to gain significant confidence in a factor. But if you understand that there are shades of uncertainty in almost any source of information about SEO, correlation is at least an interesting starting place.
A Great start for 2018, Rand!
In my opinion same correlation can sometime gets change with "Target Audience, Target Location" too. As some things are more likely to get better rank because Factor "X" in Location "A" but may not do well with Location "C or D".
For me this is something which I keep testing and analyzing with every certain interval to see, what factors are booming and how can we take advantage in our SERP
Thanks
Hi Ankit, Totally Agreed with you. As Rand said, these studies trigger ideas and you have to keep testing which ones work for you.
Totally agree. We've seen correlations look pretty different in the US vs. UK and even more different in non-English-languages. Makes sense to watch correlations on as precise a scale as you can (one reason I keep hoping Moz will invest in showing you correlations for just the keyword set you're tracking in the product -- I think that would be a killer feature).
Point 7 is one that has interested me since these started appearing, particularly in regards with content and attributes such as the keyword appearing in the title - they always seemed to underestimate the ranking ability of keywords.
Is there another way these studies could be done though? Perhaps compare the top 10 / 20 pages that rank against a wider set of pages that don't rank? It would seem impossible to me though to get a data set big enough to avoid bias without being Google!
Rand, this is a great post and video!! Thank you for taking the time to put it all together.
I agree with most of what you have here. I think that your comment that introduces the arguments from Ted helps to shed light on the areas I was not quite on board with. However I feel that you may have benefited from including the most important aspect of Google Search Algorithms, the human factor.
This is a post about correlation and I understand that but with the inclusion of the human factor your argument would have been increased. The human mind is sporadic at best. We search for one thing, see a result that is interesting and go a completely different direction. Google is trying to track our decisions and decipher it so that they can sell more AdWords. The more we like our search results, the more we use Google over others, the more impressions they sell for AdWords, the more money Google makes.
As a SEO professionals, whatever that means these days, we all have to remember that at the end of the day, our websites are focused on spreading a message or providing a service. If you have clearly defined your website's purpose and know why you have created it, creating content and then sharing that content with your target audience will always be better than any SEO tricks you can learn from SEO correlation data.
Once again great food for thought, and this time it is quite heavy to digest. This is actually an overhauling WBF to make me have a look back at the SEO ranking factors and correlation.
Joseph Dyson
Awesome Rand. I had this idea in mind for a while and kept discouraging people who used correlations as their best practices.
Seems like it's corrently getting proved.
Apologies for being late to the conversation but this same topic has been hotly discussed at our agency in recent weeks.
The general consensus here is that although correlation studies can be interesting, the fact that we're now dealing with a much more sophisticated algorithm which includes machine learning means that over time, we're seeing the value of these studies decline.
Searchmetrics started to do sector specific studies which are slightly more useful but internally, we did a study of a particular sector in the UK and found that for some of the metrics we looked at, the correlation to increased rankings were up to 3x stronger when compared to the 'all industry' data we collect. Whilst the variances in correlation scores appear significant, the actual correlation scores were still low, usually between 0.1 to 0.25.
Statistically, this is a very low correlation and we ended up concluding that, in this study, an aggregated analysis of ranking variances provided insufficient evidence for actionable insight.
We continue to explore ways of unearthing insight and are making some good progress with alternative methods but ultimately, the truth is that Google has access to data that we can only dream of as SEOs and in a world of a multi-layered algorithm with potentially thousands of signals and machine learning in the mix, we're always a few steps behind. Of course, that's what makes it interesting!
We put together a little write up of the study mentioned above if anybody is interested in reading more about it: https://www.searchlaboratory.com/2018/01/how-to-identify-the-right-seo-strategy-for-your-industry/
Ayshwarya Syndicate Souharda Credit Co-Operative Ltd is a true credit co-operative society. started its operations in the year 2017 as a co-operative society, we primarily deal with financial services in Karnataka and Maharashtra and help you save and grow your wealth without confusing you with complex financial terms. Whether you are a seasoned investor or a beginner, we help you invest better with clear options and powerful tools to grow your wealth on the fairer way.
Incredible post.
Interesting content, but a bit thick for me. Long time since the last time I studied statistics at the Universiyty :-)
It is very true that it all depends on what the other pages/competitors are doing.
What you mention about https being a great factor for fitness and not for news (because all of them have it) it is deffinetly something to have in mind.
If you could find a great factor like that in your niche, its an awesome, fast and cheap way to get higher in the rankings.
A wonderful post. I usually used to see the social shares of the top organic results along with social platforms.But after this WBF, you just cleared my many doubts. Thanks
Great post Rand!
I would task each and every one of us to sincerely ponder correlation with each assumption about SEO we make. At best, it will save you time and energy; at worst, it’ll force you to fully understand all the angles of a situation before tackling it.
What a good post, I have cleared many doubts I had about the factors that really affect SEO positioning.
Thank you.
Thanks for this post Rand,
Do you have any studies that present ranking factor correlations? I would love to apply your teaching to a real world example!
Regards!
Sure - there's a good number out there:
https://www.searchmetrics.com/knowledge-base/ranki...
https://www.semrush.com/blog/semrush-ranking-facto...
https://backlinko.com/search-engine-ranking
and of course Moz's https://moz.com/search-ranking-factors/correlation...
Thank you so much. I will look into these.
It is the same with economics or anything else, most of people think causality is correlation. So as Moz (and Moz is ranking at the highest position for google algorithm changes) tells: "Each year, Google changes its search algorithm around 500–600 times" I would say it is important to have a strategy, adapt it of course, but not changing it every day. Or if there is a major breakthrough like for example google bots will be able to crawl the voice. Soon soon :-)
Again a thought-provoking topic for WBF! Do you have any other post or document available on the same? I'm curious to research a lil more and analyse your theory.
I'm starting to think that Google purposely tweaks the ranking algorithm just to mess with our theories about causation and correlation. It gives me anxiety.
This SEO factors really change over last two years, people who addicted to SEO can know how the Friday board Important for SEO people and today we know the latest trending of SEO in 2018 which implies to everyone ! Again linkbuilding and HTTPS still important in 2018 SEO factors.
Great post and as always also a great video Rand!!
Very useful and interesting article ;)
Thanks Rand.