Since last December's admission from Google + Bing's search teams regarding the direct impact of Twitter + Facebook on search rankings, marketers have been asking two questions:

  1. What signals are Google + Bing counting?
  2. How much influence do these social signals have on the results?

Over the last few weeks, we've been collecting data and running calculations in an attempt to provide more insight into these answers. Today, I'd like to share some results of that process. But, before we begin, there's some important caveats.

The data we're sharing below examines the top 30 ranking results for 10,217 searches performed on Google in late March (after the Panda/Farmer update, using top suggested keywords in each category from Google's AdWords data). It compares the features that higher ranking results have, which lower ranking results do not. Since the standard error numbers are very, very tiny, we can be fairly confident that these correlation values would apply to Google results as a whole (i.e. if we were to run these correlations on 100K, 1 million or 1 billion results, we'd get the same correlations).

However, this does not mean we can be confident that what we're measuring are actually ranking factors having a direct influence. Let's use an analogy about dolphins to help illustrate:

Dolphins + Correlation
image credit: alfonsator on Flickr

Thus, our first caveat is - correlation is NOT causation - the features we show below may indeed be directly influencing Google's ranking algorithm, but they also may just be artifacts or features that high ranking pages tend to have (though, we do know from their public statements that at least some data from these sources is influencing the results).

It's also true that our analyses will not be nearly as sophisticated as whatever Google + Bing are doing with the data, so while we look at raw numbers from APIs, the search engines may have arrangements enabling them to look far deeper into the signals that make a tweet or share authentic - in particular the "author authority" metric they mention in the linked interview above. Thus, the second caveat is that results presented here are likely overly simplistic. A big takeaway for marketers should, thus, be - even if you're sure that a social metric is highly influential, spamming the heck out of it is probably a dumb way to try manipulating the rankings.

With those out of the way, let's look at some data!

Correlation of Link Metrics vs. Social Signals

How well do metrics like the quantity of shares on Facebook, Tweets on Twitter or Google Buzz shares correlate with higher rankings in the top 30 results in Google's web search results?

Correlation of Social Factors w/ Higher Google Rankings

In June of 2010 we ran a similar analysis and found the highest correlated metrics to be exact match .com domain names and # of linking root domains to the ranking page. Exact match domains have fallen substantially (in both prominence and correlation) - but we'll save that analysis for another blog post - while link metrics have remained fairly static in their correlation to higher rankings in Google. As of late March, the data is showing an unlikely new leader - shares of a URL on Facebook!

Naturally, this data shocked us. I presented at SMX Elite in Sydney last week on this and, prior to showing the slide, asked the audience, by show of hands, who believed Facebook to be more influential in Google's rankings than Twitter. Not a single person raised their arm. When data's this surprising (and particularly when the rest of the data from the analysis - much of it available here - matches our expectations), we want to look deeper.

Is Facebook Share Data Available for Enough Pages to Be Significant?

My first reaction was to ask Dr. Matt Peters, SEOmoz's in-house data scientist conducting this analysis, if the results were skewed by a few search results where Facebook shares just happened to be present in the top results. His response...?

More data:

Percent of Results Where Social Data Was Present

Link data was present for nearly every result we examined (99.9%+), which is to be expected, but social data? Of this magnitude? Even for plenty of weird, uninteresting queries? Shocking. If you had asked me to guess, I would have said we'd find Facebook share data on maybe 5-10% of the results - 61% is mind-boggling. It challenges a lot of my assumptions about how far social data really could take web search (e.g. see this video from April of last year in which I proclaim there's no way Facebook search could replace Google search), especially considering the relative newness of Facebook's Open Graph project.

Are Social Correlation Merely the Result of Overlap with Link Signals?

My next guess was that Facebook Shares' correlation was simply a matter of being a good predictor of links. Surely, pages that earn lots of Facebook shares also earn lots of good links. As before, Dr. Peters had some analysis to help answer the question.

Correlation of Social Metrics, Controlling for Links

In this chart, we examine the correlations of social data, controlling for links (in this case, specifically # of linking c-blocks). And yet, we still see a remarkable positive correlation between Facebook shares and higher rankings. Twitter, on the other hand, drops dramatically, potentially signalling that its influence as direct signal may not be as strong (though we must keep in mind this data is not causal).

Takeaways from this Data

While we can't say for certain whether these numbers mean that Facebook strongly influences Google rankings, I personally have some big learnings and opinions to share:

  1. Social Metrics are Well Correlated with Higher Rankings
    To me, correlation alone is interesting because I want my sites/pages to be similar to the pages that rank higher in Google, irrespective of whether those traits are directly measured in the algorithm. Pages that earn tweets + Facebook shares also correlate well with earning links, and send direct traffic on their own - ignoring these services at this point seems foolish.
  2. Testing the Direct Impact of Facebook Shares on Google is Imperative
    We've already observed several remarkable results from testing Twitter's impact. Facebook should be next on the list for many search marketers.
  3. I Need to Learn More About How to Earn Facebook Shares
    Given the potential importance and the obvious direct impact (traffic from and visibility on Facebook itself), I, and probably many web marketers, need to examine successful strategies and brainstorm new ways to earn sharing activity from Facebook's massive user base.
  4. Shares Might Be More Valuable than Likes
    In Facebook's own environment, a "like" of content will show up on your own "Wall" and in "Most Recent" (a new feature as of last week), but it rarely shows in "Top News" where most users scan and click. If that alone isn't reason to encourage sharing v. liking, the data above certainly is (at least to me).
  5. Twitter May Be Less Powerful than I Thought
    The correlation data and the presence of tweets in SERPs was less, in comparison to Facebook, than I would have expected. It could be that in cases like those of our experiments, where many influential Twitter users shared a URL in close temporal proximity, Google takes it as a signal, yet for standard search rankings, it's not as powerful. We'll definitely keep testing and watching, but my expectations for tweets correlating with rankings, after controlling for links, were higher, and thus the results, somewhat surprising.

It's up to you how to interpret this data, but whether you believe (or have tested) the causality of Facebook/Twitter or not, all of us in the SEO sphere should be carefully watching the social space and Google's social efforts.


For those interested, here's the full presentation on correlation + opinion data shared at SMX Elite last week:

Looking forward to a vibrant discussion and, hopefully, some testing (and reports back) of Facebook's influence on Google's rankings :-)

p.s. When the full search ranking factors report is released in the weeks to come, we'll also be providing our methodology and a raw dump of data so anyone can reproduce and double-check our results.