Do more tweets of a URL lead to higher search rankings on Google? Do longer articles get more shares on Facebook? Do emails that contain images have lower open rates?
These, and hundreds of other questions marketers are constantly asking, can be answered mathematically through correlation data. Yet, it seems there's an unfortunate bias against correlations, specifically in the SEO community. Part of this has to do with the well-known maxim "correlation is not causation." This is eminently true.
However, I LOVE to know correlation, even when it's wholly disconnected from causation, and I'm surprised more marketers rail against the acquisition of this knowledge. After all, we constantly use correlation-based observations in our everyday lives, scientists use it frequently to discover potential hypotheses and put forward experiments to test them.
For example, I personally care less about what Google actually uses as ranking elements in their massive algorithm than on what kinds of sites and pages tend to perform well. To my mind, it's much more fascinating to learn, that, for example, stories that appear in the Google News results are much more likely to have images originally sourced by the news publisher than it would be to find out that the algorithm uses an exponential decay factor on freshness based on inputs from a certain set of trusted account usage. The former is actionable; the latter much less so.
We can apply this to email outreach, public relations, talks at conferences, conversion rate optimization (a practice based almost entirely on correlation), and virtually any other quantifiable practice in our work.
Here are just a few examples of great work in the field of marketing that leverage correlation data:
- Dan Zarrela's series on the Science of Social Media, Science of Retweets, Science of Timing and Science of Facebook Marketing
- The Open Algorithm Project from Mark Collier
- Correlation data in SEOmoz's own Ranking Factors Study
- What Mike King learned analyzing 300K outreach emails w/ Buzzstream
I fail to understand why this work is criticized as being "just correlation; doesn't mean anything" rather than embraced as "awesome; new correlation data on which to form testable hypotheses." Yes - correlation does not imply causation. But it does show a relationship, and those relationships can form the basis of guesses and tests. I find it challenging to argue why this work should not be done and shared, yet the bias is clearly out there.
Of course, there's always the danger of presenting correlation research which is then misinterpreted or misused, as the folks from PHDComics brilliantly illustrated below:
But, I'd rather risk some misunderstanding and have the data available than not investigate the connections between things in the marketing world out of fear.
Here's just a few ideas for correlation-based research that I'd love to see someone put together:
- Correlation between a topic/phrase/brand trending on Twitter and search volume spiking on Google
- Correlation between Facebook shares, Tweets and Google+ shares for URLs across various industries (where are some networks potentially stronger/weaker, what are the outliers, etc)
- Correlation between amount of funding and revenue/growth/success across industries (think this would be fascinating to entrepreneurs)
- Correlation between types of share buttons used on a website and quantity of shares received
- Correlation between # of email subscribers to an RSS feed and the rankings / social shares of that feed's content
- Correlation between search rankings and RSS feed inclusion overall (do URLs that are included in feeds tend to perform better than those that aren't?)
- Correlation between sentiment (positive, negative, neutral) of content on various sites and their success in social media
- Correlation between social shares and traffic
- Correlation between Klout score and traffic driven to URLs shared (to see if Klout lines up with how much traffic that person's tweets/shares drive)
-
Correlation between having a testimonial, physical address, email address, and/or phone number on the page and higher/lower rankings in Google's search results
If you or your team feel confident, capable, and excited about potentially doing this work but need some funding or publishing support, we'd love to talk. Just drop me an email (rand followed by the @ and seomoz dot org).
p.s. Check out Dr. Pete's excellent "Mathographic" on correlation vs. causation to learn more about the difference and the nuances.
+1 for XKCD, +2 for PhD comics. Two of my favorites there!
+3 for encouraging more understanding and use of data amongst marketers. Some of the best marketers have an intellectual curiousity about data, analysis and statistics. It's similar to how some of the best SEOs have an intellectual curiousity about programming, database architecture, and related technical topics. Literacy in these disciplines is a big step forward in anyone's practice as a marketer or SEO.
And +4 just for this line: "For example, I personally care less about what Google actually uses as ranking elements in their massive algorithm than on what kinds of sites and pages tend to perform well."
Yes. This!
One of my grad-school professors liked to say something like: "Any first year grad student can attack someone's work. It takes a real scientist to do it better." If you just read "Correlation" in the title of a blog post and then rush to comment "CORRELATION DOES NOT IMPLY CAUSATION!" just to prove you're smart, then you're absolutely right - just because using big words is correlated with being smart doesn't mean you're not an idiot.
We absolutely have to keep each other honest, but I see three types of attacks on correlation work (or, really, any work in our field):
(1) Honest, critical analysis from people who care about the results
(2) Bandwagon criticism from people who want to show they're smart
(3) Political criticism from people who don't like the source
We need (1) - those folks contribute to the betterment of our industry and create dialogues that improve all of our work. Groups (2) and (3) need to get a hobby.
I agree (it's why I have presented at conferences on how to present data/correlation), but can we add (4) groups of people who will automatically applaud and not analyze to the latter group as well? It creates echo chambers rather than true dialogue and analysis.
"For example, I personally care less about what Google actually uses as ranking elements in their massive algorithm than on what kinds of sites and pages tend to perform well."
I'm with you on this one Rand.
Here's something I would like to see someone work on.
Correlation between social shares and money in our client's bank account.
I just finished reading the previous blog post "GM's Doing it Wrong: Facebook Marketing Lessons" and I think there needs to be more coorelation analysis on Social Media and conversions.
I'm not certain you can see a direct correlation between social shares and money in a bank account. Purely because, in my mind at least, social isn't really there to drive conversions it is there to build a brand, nuture relationships and manage reputation.
You can see a direct coorelation between Google Adwords Quality Score and money in a bank account. That's why Adwords works for making money. Facebook marketing is important, but it sure won't pay my client's bills. I like social media because it's another way to communicate with prospective customers on an international level.
Maybe everyone else makes a lot of money off social media, but not me. When I look at my Google Analytics, even Microsoft Adcenter (Yahoo/Bing) brings in WAY more money into my client's bank accounts than social media. We make more money off Craigslist than social media.
Nothing wrong with correlations. It's when correlations are trumpeted around as facts that problems start to arise. Or when people take correlations, run a flawed experiment and then claim the correlation as The Truth.
The fact of the matter is that most SEOs are not trained mathematicians or scientists, and many of the experiments that are conducted in our industry are flawed or statistically insignificant. People need to apply a lot more rigor to their experiments before they rush off to tweet or blog about their latest "discovery". And of course, we never hear about the studies which didn't go the way we wanted them to.
Correlation ONLY works when you understand context. If you try to correlate results in events where you don't understand the basics, then you are GUESSING.
It kills me when people completely throw logic out the door and suggest or believe that correlation proves causation. Show me some cause-and-effect with a little education on the side, and then I'm a believer.
The whole quantum mechanics theory is based on correlations (my physics background speaking). We wouldn't have technologies that we have today without it.
Same applies to search and all marketing strategies.
+1 Max, that’s right I don’t think its useless and people should not do it but I think it’s important for people to research, study and collect data to come up with answers based on correlation… and that research will allow us to come-up with ideas (with hypothetical stats) and that eventually will lead to NEW!
I do agree that ‘correlation is not causation’ but I might never believe that correlation can never be causation!‘
P.S.: I would live to see more research on how Google see and use traditional links data vs. social share data!
The biggest thing here is to prove on first glace that it's significant. A lot of people have only a passing understanding of statistics, and if they're smart, they'll be needlessly cautious than "oh wow a correlation exists!"
The Open Algorithm Project is a great site, but when I see r scores of <.3 unless there's some strong significance there, I'm concerned that it's just not a meaningful correlation. This was the issue I had with Dan Zarrella's work too -- lots of data, but nothing to publicly check what is actually a correlation and what is simply a small and insignificant correlation.
If you're using correlation studies solely as marketing, that's OK; but there is a greater opportunity for information sharing that I think is heavily overlooked in this space.
@Ferk: I have said time and time again, and anyone with a passing understanding of data analysis and APIs would already know this: All of my data comes from publicly available sources. You are more than welcome to reproduce my research by gathering your own data and checking my correlations. Simply complaining that other people haven't already done your homework for you is a little lazy, isn't it?
I don't think marketers deplore correlation data at all. It's more the offering advice based on correlation alone that gets our goat!
I must be really missing something here. I have no idea why there is so much discussion about correlations. It is just a basic statistical indicator, much like an arithmetic mean.
Why don’t you just create a hypothesis, test / transform the data, build a suitable (multivariate) econometric / regression model and test the results. This is what happens in all fields of data-led science. You will need to hire someone to do this unless you have advanced statistical knowledge.
Because SEO isn't generally a data-led science, at least, not yet.
Agreed. The problem is that it's impossible to do a truly scientific study in SEO and anybody who does a study just gets slated for it.
We did a tweets vs rankings study recently and it was slated even though we gave all the data away.
https://www.branded3.com/tweets-vs-rankings
It is absolutely possible to do truly scientific studies in SEO. Certainly not all factors, but specific factors are quite easy to test. For example, back in 2009 we ran some studies that resulted in a piece called "The Triviality of On Page Optimization". This was actually an experiment, where we controlled nearly every variable by getting the sites indexed via Google Sitemaps rather than links. Using this technique, we were able to rule out several key HTML elements as having a meaningful impact on search.
You are right that it is very difficult to do experiment-based SEO studies but it is definitely possible.
I personnaly agree with you. I used to work on a lot of regression models and I try to support my thoughts with pearson correlation and stuff like that and the way I present SEO to my clients is much more easy and efficient.
Here's how I like to think of doing scientific findings/studies/analysis for SEO:
If there are billion (soon to be trillion?) dollar corporations doing this for the stock market, primarily based on many data points and on the concept of human fear, perception, expectations, etc. then doing the same for SEO should be a (comparative) cakewalk.
Whether it's cost-effective is another thing entirely. ;-)
Disagree,
The problem isn't that some marketers can't enjoy a good correlation (we can), the problem is when that becomes all that's written or researched on the topic and by default becomes "fact"
In your own words, "After all, we constantly use correlation-based observations in our everyday lives, scientists use it frequently to discover potential hypotheses and put forward experiments to test them."
Yes, scientists use correlations to discover potential hypotheses. The problem is Internet Marketers don't, they leave out the hard part (applying the scientific method) after they've seen a correlation and consider it causation. Or worse yet they put a disclaimer at the end of their blog post reminding people that "correlation is not causation" and the next day move onto a different topic.
I think the key difference between general science and SEO is that SEO (or Google) can change completely in the future. The Earth might still be going around the Sun for 1000000000 years to come but Google can turn around tomorrow and say: we no longer value tweets, only Google Plus signals affect rankings.
Therefore, whichever study about SEO being carried out will have a likelihood of being wrong due to Google algorithm changes. That's why I'd like to think we follow best practices in SEO rather than following the "right" practice. And the best way to do so is to look at those who rank well and analyse why they rank.
Also what many of us have been doing (myself too at times) was to outsmart Google. What I commonly see in well ranked websites is that they don't or violate Google Webmaster Policy very little.
So IMHO Google is showing us how to do SEO, it's just whether we want to follow their "advice" or not.
Totally agreed, and great examples of some good work Rand. We need a lot more!
Sorry to say, but a lot of people in this industry cannot understand what I consider to be just intermediate math; show them a graph with a best-fit line and an R-squared value and their eyes just glaze over.
The over-reliance by people on industry consensus causes disbelief even when numbers are staring at them right in the face.
I think we are all prone to this in one way or another, it's psychological anchoring. My favorite example of this is SEOMoz's correlation numbers for 2011. One of the graphs showed that nofollowed links to a page versus followed links to a page had the *exact* same correlation with ranking. This seems to me to be a finding of *huge* importance - but I'm not even sure you folks believe this yourselves, otherwise you'd be shouting it from the hilltops (!)
I'm late to the party on this one (that's what I get for taking a vacation and what not)... so who knows who'll read this now, but couple of points to note:
During correlation analysis it is common for a analyst to overlook the role played by other variables (variables other than the one being investigated) which may be responsible for the apparent correlation. This results in forming and testing a relationship which may be statistically insignificant in the first place. In other words you are testing the wrong hypothesis to begin with. Before you prove correlation you need to prove that the relationship exist and it is statistically significant and the premises from which you are going to draw conclusions is likely to be true. For example before you measure correlation between social shares and rankings, you first need to prove that their is relationship between the two and that relationship is statistically significant. I think the way we approach correlation studies give outsiders the chance to speak ill of our work and industry.
Disclaimer: I am just a student of statistics and no expert
Wow - that is going to take a lot of data to establish. No bad thing btw - the more data the more confidence in the correlations.
Some of the focused correlation works could be undertaken by split testing on a small number of *massive* websites but we'd need a balance of small site data too.
I wonder if there is a way in which data can be put in to the correlation pot annonymously - the project is going to need data from a large number of sites. The type of data would need clearly stipulating to participants. It would be a great crowd project to be involved in.
Hi Rand,
I believe that the marketing world needs more correlation research. Conversion rate optimisation is so important and correlation is a necessity for this technique. It’s useless to have bazilions of readers without significant conversions. Anyway, thanks for sharing this very informative article.
I thought I would explain my comment above a little more clearly. With regards to the research questions Rand posted above what needs to be taken care of is omitted variable bias. https://en.wikipedia.org/wiki/Omitted-variable_bias
Causation is naturally a concern but sometimes it’s clearly not a problem, such as the correlation between share buttons used on a website and quantity of shares received. The direction of causation is clear in this case.
Given the example above there may be a correlation of 0.3, say. However, the model only includes one explanatory variable, share buttons. If you were to add another variable, like a dummy variable stating whether the author shared that link on his Facebook account or not (which would encourage re-shares) then the correlation may fall to 0.1, say. This is the main issue with correlations, there is significant omitted variable bias.
This is why correlations are only an initial indication. I’ve seen correlations as high as 0.9 move to practically zero when another variable is added. An example might be tweets and rankings, because those tweets are being scraped into DF links that might be the factor driving rankings and this is not accounted for in the initial correlation (not tested this).
Perhaps it's time for a peer reviewed scientific journal in the area of online marketing. Insisting on proper scientific method and rigor and ensuring that any studies on correlations are published with the proper caveats regarding their significance.
I'd subscribe to that.
I recognise that much of online marketing practice is simply defined as "what works". It is essentially an engineering approach. There is still room though for the science approach to investigating why it works or indeed for confirming if it works.
As Rand suggests, while investigating and comparing these types of data might not give us a true causal relationships, it'll certainly give us some interesting new questions.
Been wondering for a while about the direct correlation between social shares and actual web traffic.
This gets my thoughts swimming, conjures an image of the pied piper, and reminds me of that saying villains and elitists say, "a little information is dangerous."
+1 for the best use of comics on SEOMoz in a while.
Woo! Blackbox correlation studies are interesting, if used to support best practices and existing case studies, instead of just random study areas (which take up all your time). There is one downside, while you might have found an interesting result at one time, the same result can become meaningless tomorrow. This becomes ever more true with statistical studies.
Perhaps rick put the pattern of change of validity over time would in itself be a useful thing to have. It would enable a clearer view of trends in both user behaviour and search engine signal importance (and any other data set) over time.
Most apnthropological dataset standards have to be changed over time. An "average" man is not the same now as he was 100 years ago but the fact that we even investigate the reasons for this is proof that over time a lot of data becomes invalid. That's not a reason for not doing the research.
The problem with much thinking here, is that this somehow renders the research without value. If something isn't the case today then you know something. If it IS the case suddenly tomorrow then you actually know, not just 2, but 3 things if not more. That it wasn't the case yesterday, is the case today and that this has at least once changed in the space of a day. Over time the rate of change of different types of factors would be a very useful thing to have.
That Open Algorithm Project is fantastic - hadn't come across that before so thank you for sharing such a brilliant resource, and great post as always!
I saw you speak about the difference between causation and correlation last year at MozCon; this is a great extension to the comments you made then. Great post!
Hi Rand,
I agree with the sentiment of this. One of the things that I love about the internet and also SEO is that they provide an endless source of free learning for those with curious minds looking for answers and I also agree that research should be encouraged not criticised.
Those that dedicate their time and resources to the research and analysis that establish correlations leading to insights, hypotheses and sometimes proofs help us all and I applaud them.
I haven't seen any examples of work being criticised for being 'just correlation' so can't comment.
The dangers you have pointed out about misinterpretation are real though and this is wonderfully depicted in the graphic. These dangers escalate with the perceived authority of the publisher. For example, if you guys at SEOmoz publish some stats with conclusions, the vast majority of your readers will take it as gospel and propagate that knowledge because of this site's perceived authority.
As in the graphic, it takes a few seconds on the internet before an item is published as 'scientists say'. Before you know it the whole world has changed their diet from natural food to low fat food and back again on the back of an irresponsible news headline that established a correlation between cancer rates and low fat foods. The research may lead to the saving of humankind from cancer ... but not without a little further testing.
Great post rand, that is true that it is quite tdifficult o utilize the big data around us. But due to social media, most of business are trying to use social media to impact. But it is also important to find correlation between twitter,facebook and rankings. As shown by patrickaltoft in his post, gives a good idea about tweets and ranking.
Thanks rand for such good post.
Have a look at Branded3's Tweets vs. Rankings study. Evidence based reporting!
https://www.branded3.com/tweets-vs-rankings
Hi Randfish,
Thanks for this good and infomative article. Yes it's right that now a days all marketers need Correlation Research.
Thanks,
almost feels like we are trying to quantify the quality itself here. if we start with the traditional phrase 'we focus on quality (say, links) not quantity', we are going towards how much of (say, links) will it take to make it of quality X or give result Y? quality itself cannot exist in isolation without something to represent it. then again, isn't that where correlation comes in from?
PHDComics brilliantly illustrated!!!
And the whole post is really useful ...
Very wise words:
"For example, I personally care less about what Google actually uses as ranking elements in their massive algorithm than on what kinds of sites and pages tend to perform well."
+1!!
That was a general article Rand, but it is always a pleasure to read you.
I like where your head is at on this one, Rand. I would like to see the trails leading to some statistically significant patterns in online marketing. I think such information would be really useful for those a bit more detached from the industry, business owners who are 'just the facts, maam' types or don't have the time to heavily invest in the everyday ambiance of the industry.
Though, admittedly, because I really dig marketing and others' thoughts, I do enjoy the theoretical points as well; they are great for inspiration. As you reference above, if we didn't 'dream' to some degree, we would keep going round and round.
It calls for a balance; but, we would definitely benefit from seeing more correlative posts.
"If it were not for the Poetic or Prophetic character the Philosophic & Experimental would soon be at the ratio [rational calculation] of all things, & stand still unable to do other than repeat the same dull round over again." - William Blake
Great Topic!
I think from a business owner point of view there is always a ROI question like what kind of $$$ do I get for all of this or if I invest this much time or hire this many employees that do social media what will I get? Alot of Social Media activity to me SHOULD always be something you are proactively doing on a constant basis kind of like going to boring trade shows where you network with colleagues and friends but may not get any $$$ its all about your LONG TERM approach and VISIBILITY.
People don’t correlate because they don’t want to give up their “competitive” advantage. If I’m doing something which is giving me a ton of traffic why would I share it? Then when Google changes I’m pleading with anyone to correlate with me to find answers.
I think correlation is a great idea but we will never get the full benefits because so few do it.
I've always been interested in which of the listed correlations (and things like Google Insights and traffic predictions) have applications in the stock market.
Brilliantly explained the marketing correlation, Hats off Rand. Its easy to understand via PHDComics image.
"Correlation between Facebook shares, Tweets and Google+ shares for URLs across various industries (where are some networks potentially stronger/weaker, what are the outliers, etc)"
I need more explaination for this.
I think he means to compare the strength of networks by industry, as an example. For instance, a techy startup might be more liable to recieve a stronger following on G+, while a fashion company might have better success on Facebook.
"I personally care less about what Google actually uses as ranking elements in their massive algorithm than on what kinds of sites and pages tend to perform well." (Quote of the Day)
Loved this Rand. Going back to keyword density percentages all the way up to the debated 60% anchor text threshold - who cares what the percentages are or where the line is. The reason sites rank well is that they follow actual best practices in design, development, hosting, backlink profiles, and social influence. I've found that if you can provide users value in those main areas, the rankings and organic traffic will be in your favor.