In an industry where knowing exactly how to get ranked on Google is murky at best, SEO ranking factors studies can be incredibly alluring. But there's danger in believing every correlation you read, and wisdom in looking at it with a critical eye. In this Whiteboard Friday, Rand covers the myths and realities of correlations, then shares a few smart ways to use and understand the data at hand.

SEO Ranking Factors and Correlation

Click on the whiteboard image above to open a high-resolution version in a new tab!

Video Transcription

Howdy, Moz fans, and welcome to another edition of Whiteboard Friday. This week we are chatting about SEO ranking factors and the challenge around understanding correlation, what correlation means when it comes to SEO factors.

So you have likely seen, over the course of your career in the SEO world, lots of studies like this. They're usually called something like ranking factors or ranking elements study or the 2017 ranking factors, and a number of companies put them out. Years ago, Moz started to do this work with correlation stuff, and now many, many companies put these out. So people from Searchmetrics and I think Ahrefs puts something out, and SEMrush puts one out, and of course Moz has one.

These usually follow a pretty similar format, which is they take a large number of search results from Google, from a specific country or sometimes from multiple countries, and they'll say, "We analyzed 100,000 or 50,000 Google search results, and in our set of results, we looked at the following ranking factors to see how well correlated they were with higher rankings." That is to say how much they predicted that, on average, a page with this factor would outrank a page without the factor, or a page with more of this factor would outrank a page with less of this factor.

Correlation in SEO studies like these usually mean:

So, basically, in an SEO study, they usually mean something like this. They do like a scatter plot. They don't have to specifically do a scatter plot, but visualization of the results. Then they'll say, "Okay, linking root domains had better correlation or correlation with higher organic rankings than the 10 blue link-style results to the degree of 0.39." They'll usually use either Spearman or Pearson correlation. We won't get into that here. It doesn't matter too much.

Across this many searches, the metric predicted higher or lower rankings with this level of consistency. 1.0, by the way, would be perfect correlation. So, for example, if you were looking at days that end in Y and days that follow each other, well, there's a perfect correlation because every day's name ends in Y, at least in English.

So search visits, let's walk down this path just a little bit. So search visits, saying that that 0.47 correlated with higher rankings, if that sounds misleading to you, it sounds misleading to me too. The problem here is that's not necessarily a ranking factor. At least I don't think it is. I don't think that the more visits you get from search from Google, the higher Google ranks you. I think it's probably that the correlation runs the other way around — the higher you rank in search results, the more visits on average you get from Google search.

So these ranking factors, I'll run through a bunch of these myths, but these ranking factors may not be factors at all. They're just metrics or elements where the study has looked at the correlation and is trying to show you the relationship on average. But you have to understand and intuit this information properly, otherwise you can be very misled.

Myths and realities of correlation in SEO

So let's walk through a few of these.

1. Correlation doesn't tell us which way the connection runs.

So it does not say whether factor X influences the rankings or whether higher rankings influences factor X. Let's take another example — number of Facebook shares. Could it be the case that search results that rank higher in Google oftentimes get people sharing them more on Facebook because they've been seen by more people who searched for them? I think that's totally possible. I don't know whether it's the case. We can't prove it right here and now, but we can certainly say, "You know what? This number does not necessarily mean that Facebook shares influence Google results." It could be the case that Google results influence Facebook searches. It could be the case that there's a third factor that's causing both of them. Or it could be the case that there's, in fact, no relationship and this is merely a coincidental result, probably unlikely given that there is some relationship there, but possible.

2. Correlation does not imply causation.

This is a famous quote, but let's continue with the famous quote. But it sure is a hint. It sure is a hint. That's exactly what we like to use correlation for is as a hint of things we might investigate further. We'll talk about that in a second.

3. In an algorithm like Google's, with thousands of potential ranking inputs, if you see any single metric at 0.1 or higher, I tend to think that, in general, that is an interesting result.

Not prove something, not means that there's a direct correlation, just it is interesting. It's worthy of further exploration. It's worthy of understanding. It's worthy of forming hypotheses and then trying to prove those wrong. It is interesting.

4. Correlation does tell us what more successful pages and sites do that less successful sites and pages don't do.

Sometimes, in my opinion, that is just as interesting as what is actually causing rankings in Google. So you might say, "Oh, this doesn't prove anything." What it proves to me is pages that are getting more Facebook shares tend to do a good bit better than pages that are not getting as many Facebook shares.

I don't really care, to be honest, whether that is a direct Google ranking factor or whether that's just something that's happening. If it's happening in my space, if it's happening in the world of SERPs that I care about, that is useful information for me to know and information that I should be applying, because it suggests that my competitors are doing this and that if I don't do it, I probably won't be as successful, or I may not be as successful as the ones who are. Certainly, I want to understand how they're doing it and why they're doing it.

5. None of these studies that I have ever seen so far have looked specifically at SERP features.

So one of the things that you have to remember, when you're looking at these, is think organic, 10 blue link-style results. We're not talking about AdWords, the paid results. We're not talking about Knowledge Graph or featured snippets or image results or video results or any of these other, the news boxes, the Twitter results, anything else that goes in there. So this is kind of old-school, classic organic SEO.

6. Correlation is not a best practice.

So it does not mean that because this list descends and goes down in this order that those are the things you should do in that particular order. Don't use this as a roadmap.

7. Low correlation does not mean that a metric or a tactic doesn't work

Example, a high percent of sites using a page or a tactic will result in a very low correlation. So, for example, when we first did this study in I think it was 2005 that Moz ran its first one of these, maybe it was '07, we saw that keyword use in the title element was strongly correlated. I think it was probably around 0.2, 0.15, something like that. Then over time, it's gone way, way down. Now, it's something like 0.03, extremely small, infinitesimally small.

What does that mean? Well, it could mean one of two things. It could mean Google is using it less as a ranking factor. It could mean that it was never connected, and it's just total speculation, total coincidence. Or three, it could mean that a lot more people who rank in the top 20 or 30 results, which is what these studies usually look at, top 10 to top 50 sometimes, a lot more of them are putting the keyword in the title, and therefore, there's just no difference between result number 31 and result number 1, because they both have them in the title. So you're seeing a much lower correlation between pages that don't have them and do have them and higher rankings. So be careful about how you intuit that.

Oh, one final note. I did put -0.02 here. A negative correlation means that as you see less of this thing, you tend to see higher rankings. Again, unless there is a strong negative correlation, I tend to watch out for these, or I tend to not pay too much attention. For example, the keyword in the meta description, it could just be that, well, it turns out pretty much everyone has the keyword in the meta description now, so this is just not a big differentiating factor.

What is correlation good for?

All right. What's correlation actually good for? We talked about a bunch of myths, ways not to use it.

A. IDing the elements that more successful pages tend to have

So if I look across a correlation and I see that lots of pages are twice as likely to have X and rank highly as the ones that don't rank highly, well, that is a good piece of data for me.

B. Watching elements over time to see if they rise or lower in correlation.

For example, we watch links very closely over time to see if they rise or lower so that we can say: "Gosh, does it look like links are getting more or less influential in Google's rankings? Are they more or less correlated than they were last year or two years ago?" And if we see that drop dramatically, we might intuit, "Hey, we should test the power of links again. Time for another experiment to see if links still move the needle, or if they're becoming less powerful, or if it's merely that the correlation is dropping."

C. Comparing sets of search results against one another we can identify unique attributes that might be true

So, for example, in a vertical like news, we might see that domain authority is much more important than it is in fitness, where smaller sites potentially have much more opportunity or dominate. Or we might see that something like https is not a great way to stand out in news, because everybody has it, but in fitness, it is a way to stand out and, in fact, the folks who do have it tend to do much better. Maybe they've invested more in their sites.

D. Judging metrics as a predictive ranking ability

Essentially, when I'm looking at a metric like domain authority, how good is that at telling me on average how much better one domain will rank in Google versus another? I can see that this number is a good indication of that. If that number goes down, domain authority is less predictive, less sort of useful for me. If it goes up, it's more useful. I did this a couple years ago with Alexa Rank and SimilarWeb, looking at traffic metrics and which ones are best correlated with actual traffic, and found Alexa Rank is awful and SimilarWeb is quite excellent. So there you go.

E. Finding elements to test

So if I see that large images embedded on a page that's already ranking on page 1 of search results has a 0.61 correlation with the image from that page ranking in the image results in the first few, wow, that's really interesting. You know what? I'm going to go test that and take big images and embed them on my pages that are ranking and see if I can get the image results that I care about. That's great information for testing.

This is all stuff that correlation is useful for. Correlation in SEO, especially when it comes to ranking factors or ranking elements, can be very misleading. I hope that this will help you to better understand how to use and not use that data.

Thanks. We'll see you again next week for another edition of Whiteboard Friday.

Video transcription by Speechpad.com

The image used to promote this post was adapted with gratitude from the hilarious webcomic, xkcd.