Earlier this year, Danny Sullivan of Third Door Media asked me if SEOmoz could put together some data comparing ranking elements of Google against those of Bing to help illustrate the potential biases SEOs might face when optimizing for the two engines. Today at SMX Advanced in Seattle, I presented the following data, compiled by our own Ben Hendrickson with help from the entire SEOmoz engineering team (particularly Phil & Chas on the Linkscape side). The results I'm sharing match those in the presentation, with a bit more detail added in for those interested.
Rather than include the entire slide deck, I've taken the charts, graphs and data directly from the presentation so those of you seeking to convince clients or motivate internal teams can use them in your own presentations. But, before we begin with the data, I'd like to share a few critical notes about this research that shouldn't be ignored.
Goals of the Correlation Data Research
With this research, we hope to accomplish three big things:
- Add a new source of data to SEOs' understanding of how Google & Bing rank web pages
- Bring more science to SEO through a repeatable, peer-reviewed dataset
- Provide recommendations based on our own interpretations AND open the data for interpretation by others as well
Further research, including causation analysis through more sophisticated ranking models and possibly more correlation analysis on other factors are certainly part of our goals as well.
Methodology
- We collected 11,351 search results from both Google & Bing via Google AdWords suggest data for the various categories (you can see these keywords yourself via Google's AdWords tool)
- We looked only at the first page of results (which typically included 10 results, but sometimes contained a higher or lower number). We ignored all no-standard results (meaning universal or vertical results such as video, images, local or "instant answers")
- The correlations relate to higher/lower positional ranking on page 1 of the search results
- We controlled for search results where all (or none) of the results matched the metric. Thus, for example, if we were looking for correlation with .gov domains and no results in the set included a .gov domain, we didn't use that SERP for that dataset.
- We've used Spearman's correlation coefficient, as it is the standard (and in our opinion, best choice) for ranked datasets. You can read more about this selection via Ben's comments here and here.
This is a very similar methodology we used for our recent information on Google PageRank correlation.
Understanding Correlation Significance
The correlation numbers we show range between -0.2 and 0.35, where a perfect correlation would be 1.0 and no correlation would be 0.0.
The standard error for each result set is also included, but tends to be so low in most cases that displaying it on the bar graph would make it nearly invisible. This is thanks to the large number of results collected - we've got very high confidence in the statistical significance of these.
Correlation ≠ Causation
It's long been held in statistical analysis that even very high correlations do not necessarily mean one data set is the cause of the other. People holding umbrellas don't cause rain. Ice cream sales don't cause hot weather.
-
The more I wear suits, the more I speak on panels about SEO. Does it therefore follow that wearing suits gets me onto panels about SEO?
It's critical to know that the data below, like data from other types of SEO tests, requires careful consideration and analysis. Parsing a bigger correlation as a direct sign that one should do X or Y more would be a fallacy.
Understanding Negative Correlation
In the research below, you'll see a few data points where the correlation is actually negative, meaning that when we saw the element, it tended to predict lower placement in the results, rather than higher. For example:
The data for URL length shows that longer URLs are negatively correlated with ranking well. This isn't particularly shocking, and it probably is wise to limit the length of our URLs if we want to perform well in the engines. However, the second data point on .com TLD extensions shouldn't necessarily suggest that using .com as your top-level domain extension will actually negatively affect your rankings, but merely that all other things being equal, .com domains didn't perform as well in the dataset we observed as other domain extensions.
As we go through each set below, we'll try to explain our thinking, but certainly invite you to draw your own conclusions from the data.
As we've seen in the past, when more sophisticated ranking models are introduced, using machine learning against the search results, we often find that previously negative correlations turn out to be positive (or neutral) ranking factors.
That's it! Let's dive into the data.
Query Matching in the Domain Name
Our interpretation and conclusions:
- Exact match domains appear to continue their powerful level of influence in both search engines, though I think many SEOs will be surprised to see Google actually has a higher correlation with ranking exact match domains higher (when they appear on page 1 of the results) than Bing.
- Hyphenated exact matches certainly appear to be less influential, though they're more frequent (Google: 271 results contained these vs. Bing: 890)
- Just having keywords in the domain name has substantive positive correlation (Thus, for example, if I wanted to rank for the word "dog," the domain mydog.com would fit with this correlation point)
Exact Match Domains by TLD Extension
Our interpretation and conclusions:
- If you're aiming for exact match, a .com extension is the way to go. Others aren't nearly as well correlated.
- Bing does seem to appreciate non-dot-com exact matches more than Google, though not tremendously (especially in the case of .org)
Keywords in Subdomains
Our interpretation and conclusions:
- Keywords in subdomains aren't nearly as powerful as in root domain names
- Bing may be rewarding subdomain keyword usage less than they have historically, though the results counts suggest that they do show up on page one much more frequently (Google: 673 vs. Bing: 1,394)
On-Page Keyword Usage
Our interpretation and conclusions:
- The alt attribute of images is interesting - our research last year found this as a peculiarity and it would appear to still be potentially useful in both engines (definitely worth some testing)
- Placing keywords in your URL string has some correlation with rankings on Google, though this is certainly a case where the "copy/paste" of URLs may be biasing this due to the accompanying anchor text benefits
- Note the placement of the "0" axis - some of these are negatively correlated, though not massively. All of the correlations are in a fairly narrow zone here.
- Everyone seems to be optimizing their title tags these days (appeared in Google: 11,115 vs. Bing: 11,143). Differentiating here is hard.
- Overall, simplistic on-page optimization doesn't appear to be a huge factor.
Link Counts & Link Diversity
Our interpretation and conclusions:
- Links are still likely a major part of the algorithms. These numbers are among the highest we observe with any single metric.
- Bing may be slightly more naive in their usage of link data than Google, but appear to have improved since last year.
- Diversity of link sources remains more important than raw link quantity.
- Correlation numbers this high say good things about Linkscape's Index - way to go engineering team!
TLD Extensions
Our interpretation and conclusions:
- This data gives us more reason to believe Google's webspam chief, Matt Cutts, when he says .gov, .info and .edu are not special cased and don't receive special bonuses or penalties to rankings
- The .org TLD extension is surprising - do these sites earn more links? Do they have less spam? Perhaps they tend to be less commercial and have an easier time garnering references? In any case, we're happy to be SEOmoz.org!
- Don't forget about the exact match data from above - .com is still probably a very good thing (at least own it if you're using a different extension)
Length of Domain, URL & Content
Our interpretation and conclusions:
- Shorter URLs are likely a good best practice (especially on Bing)
- Long domains may not be ideal, but don't seem awful
- Raw content length seems marginal in correlation, which fits with Matt Cutts' advice from the Google I/O panel - "Don’t overfill your page with text for the sake of search engines. They don’t need a dissertation to decide to rank it highly; they want what the users want – for your site to be useful and informative."
Website Homepages
Our interpretation and conclusions:
- Bing has the stereotype of ranking homepages much more so than Google, and this appears to hold true in the correlation results - they're about double with Google's propensity/preference for higher rankings on website homepages (note that we included site.com/, site.com/index.*, site.com/default.* and site.com/home.* in these numbers)
Anchor Text Link Matches
Our interpretation and conclusions:
- Many anchor text links from the same domain likely don't add much value
- Anchor text links from diverse domains, however, are one of our highest correlated metrics
- Bing seems more Google-like than in the past on handling exact match anchor links
Features w/ the Highest Correlation
Our interpretation and conclusions:
- Link attributes as a whole have much higher correlation with rankings than on-page or domain related elements
- Exact match is still a powerful influencer
- Google and Bing are remarkably similar - building two different sites/pages to separately target the two engines would appear to be a waste of energy
- Bing seems to be moving much closer to Google over time; although we didn't measure all of these results precisely last year, the similarity of the two has dramatically increased (of course, it's also possible that Google is getting more Bing-like, though this doesn't fit with our personal experiences)
As with previous studies, I look forward to your analysis, hypotheses and data requests in the comments. Ben & I will both try to dive in to reply as we're able over the next few days.
While I appreciate the effort that has been put into this research, there are some very big caveats that should be called out so that the data isn't taken out of context.
Reliance upon the keyword suggestion tool is likely to provide popular searches, skewing toward head terms and not tail terms. These terms are also the most likely to be optimized because of their popularity and -- hence -- inherent value to the top listed businesses. This skewing is exacerbated by only looking at first page results. The heavy skewing is borne out by the number of optimized title tags: over 96% of the results were optimized for the search phrase. This is kind of a "duh" result: they wouldn't rank well (and thus be counted here) if they didn't have optimized title tags. Think about new clients you take on: what's the very first thing you look at on their site?
What does this skewing mean? The analysis is evaluating factors that differentiate between optimized sites which is not the same as what matters for high ranking. Note that this doesn't say the two are necessarily different, but some of the conclusions that can be drawn from this being over-generalized. For instance, there is basically no correlation between KWs in title and ranking, but this doens't mean that titles aren't important. Another example: the data says that correlation between ranking and keywords in the text is relatively low. This makes sense for large sites with abundant content about very popular and competitive topics: everyone probably has at least some mention of the keywords in their pages, so there may be too much consistency to determine a correlation, or the domains may have so much authority that the specific phrase usage doesn't have to be high. To extend this to less competitive and long tail terms is absurd as it is very often the case that very simple on-site optimization has an impact.
Let me use an analogy that might make this clearer: If you're comparing very high end sports cars, you may find only a weak correlation between top speed and engine size because they'll all have big engines, and at high speeds other factors like aerodynamics and weight become extremely important. You might infer that engine size isn't that important for high speed. Now extend your data set to include commuter and compact cars. You'll quickly see a very strong correlation between engine size and top speed: you can't go fast without a reasonably big engine.
This study is comparing sports cars. It's comparing Ferrarris and Porsches. It's looking at what the top sites do relative to each other and how that differs between the two search engines. it doesn't necessarily say how you turn your Yugo into a 911.
I say all this as "buyer beware" for this study. Bear in mind the context of the data that's being presented. A lot more might be gleaned about its applicability and uncertainty if we knew more about the data set, such as the keyword list.
Couldn't agree more. Not trying to reduce the effort that was put into this project (I do appretiate the time that you all took to create this), a study that analyzes extremely complex algorithms must have a huge data set covering all bases of comparison. Now, this is understandably difficult (which is probably why it hasn't been done), but just try and take any statistics you see with a grain of salt; there are a ton of factors that could be skewed with such complex data.
Very astute observation--everyone needs to remember that this study was meant to analyze the two tracks, not even the cars--meaning, it's looking for differences between Bing and Google. The small differences between cars (between the top sites) help us discern how each track is different (is there a big difference in what makes a site rank well in Bing vs. Google?). If there were a big difference, then we would need to start talking about how to change our cars or maybe even get a second car tuned for the new track. Gratefully, it looks like we don't need to worry about having two sites or performing extra work on our sites that helps rankings in one search engine but is irrelevant (or harmful!) in the other.
Yes - excellent point!
This data is really looking at keywords that SEOs care about - commercial terms, things that people want to rank for, that draw substantive search traffic and likely attract ads.
We've been asked to look at long tail keywords as well, and certainly want to work towards that soon.
BTW - Anyone can get the keyword list. It's just the suggested keywords for each section from Google AdWords public tool. We very much wanted to make this a "repeatable" study.
@Rand -- It's about *some* of the keywords SEOs care about, and arguably those that some clients should ignore. For example, I would not advise a small restaurant to optimize on "Mexican food."
It would also be helpful for you to explicitly provide the list. The results of tools may/will change. Is there really a good reason *not* to provide the list?
Yes, but you would probably ask your client to rank for "city, state mexican restaurant" or something of the like. However, what you looking at IS a long-tail search, or at least mid-tail search.
Mexican Food may be a fantastic search term, however, for a major recipe site that earns on a CPM basis.
I actually think there is are better reasons to focus on the short-tail in a study like this...
1. It pits the most competitive niches against one another, which means there are a depth of SEO techniques in use which we can compare. It is highly possible for a long-tail search that not a single top 10 site will use the keyword in the title, meta or any anchor-text, making it nearly impossible to determine the efficacy of those factors.
2. It avoids several nuanced-algorithms like geographically-influenced search results which could easily cause unneeded distortion in the correlative outcomes.
3. By covering short-tail terms, they needn't generalize the results as far because they can be confident that the keywords touch a substantial percentage of real world searches.
There are myriad reasons to use short-tail over long-tail in a study like this.
I'll email it to anyone who wants a copy (email me at [email protected]). To get a feel for it, here are 10 queries randomly selected ("cat queries.txt | sort -R | head -n 10"):
baby cribs
car seat guidelines
refinance mortgage rates
retirement cards
foreign exchange
cookie bouquet
audi tt diesel
sunday times wine club
custom business cards
physician employment contract
Off topic comment... it's "funny" to see in the same list "refinance mortgage rates" and "audi tt diesel"... would be ironical if the two were somehow related.
I want to encourage every attempt to make SEO studies as academical (part of that means that it should be repeatable)! This makes the world of SEO much more concrete and tangible. Also it brings the possibility forward for lecturing students with it.
However, it's part of the magic that we don't know exactly everything :)
It's frustrating to see that even though Rand explained how correlation != causation, the comments on this post make it obvious that 90% of your readers completely ignored (or don't understand) this important distinction. Unfortunately, I think your "interpretations and conclusions" content is the source of the confusion. For example, under the chart, "Keywords in Subdomains," you conclude that "keywords in subdomains aren't nearly as powerful as in root domain names." This statement strongly implies that {keywords_in_root_domains} is a ranking signal that causes a page's rankings to change, and furthermore, its effect on rankings is more profound than that of the {keywords_in_subdomains} ranking signal. In reality, it's entirely possible (albeit unlikely) that Google doesn't even look for keywords in a document's URL. I guess what I'm saying is...it seems like you guys merely harvested the low-hanging (i.e. simple, obvious) interpretations and hand-fed them to your readers. A more interesting and thought-provoking approach might have been to ask/answer the question: What might be causing the observed correlation between rankings and [a given metric]? For example, after observing the correlations in the "Exact Match by TLD Extension" graph, one might conclude something obvious like: If you're aiming for exact match, a .com extension is the way to go. Others aren't nearly as well correlated. ...but intuitively, that doesn't make any sense. If you're designing a ranking algorithm, why would you program it to favor .com exact matches over .org exact matches? Instead, I would try to explain the correlation without mentioning the obvious. For example: .com domains are preferred by most registrants because it makes the domain name easier for consumers to remember. As a result, the .com version of any given domain name is typically the first version to be registered, and therefore it typically has the most inbound links (because it's the oldest and/or because it's the "official" version). Also, when a registrant registers other versions of a given domain name (.org, .net, .info, etc.), it is often because (1) the .com version was already taken, and/or (2) the registrant is trying to monetize traffic generated by the official .com domain (which they don't own). The content hosted on these "knockoffs" tends to be of a much lower quality, resulting in fewer backlinks. Lastly, the non-.com versions are more likely to have "exact-match competitors," since the .com version is probably already ranked. One last thing I'll mention... It's impossible to conclude (or confidently theorize) anything meaningful from the charts when the metric isn't clearly defined. The most notable examples are in the "On-Page Keyword Usage" graph. This data could represent any of these: Total number of individual keyword matches (integer) Total number of exact phrase matches (integer) Page element contains individual matches? (boolean) Page element contains exact phrase matches? (boolean) Despite my criticisms, I really enjoyed this post...especially the comment exchange between Ben and various readers. :) -SEO Mofo
You've put the shortcomings of such results far better than I could (as I have tried to explain in previous similar blog posts here). I agree that it is good that (finally) some context was given at the start of this post (which was lacking in previous similar efforts) but I totally agree with you that this appears to be ignored or not understood by the majority of repodants here, thus placing too high a value on such research.
Man between Rand, Darren and Bludge, I am getting a major education in statistics. Thanks guys.
I miss you Darren.
Call me?(pathetic plea)
Fantastic data as always Rand. Fan-fricken-tastic...
I am particularly excited to see that the .org bias still exists. We ran a small controlled experiment back in 2008 that pointed out this bias, but we lacked the correlation data to back it up on a wide scale.
The original study is here and my quick follow up is here.
The On-Page data you mention is a clearly pointing to the reality that as more and more competition enters the search space, low hanging fruit like on-site optimization is necessary but not sufficient.
In the case of Alt tags, I think they are simply more evidence of a more devoted search engine optimization campaign. Moreover, alt tags are (this is an assumption) more likely to be assigned to primary images (like a giant pic you placed in the middle of your blog post). Perhaps there is a relationship between image-centered content and link love. We certainly see that with infographics in the link bait space.
Just some thoughts.
Me too! I am also curious as to how this relates to backlink data for .org sites vs non-.org sites. Especially when you factor in that .orgs, by their nature, tend to attract more links from a variety of root domains than more commercial sites.
Probably the best and most important insights into Bing ranking factors I've ever had the pleasure of witnessing. Pure awesome!
Great post! I appreciate all the research that went into this. I'm just trying to figure out how anyone ever gets to a point when they have time to optimize for individual engines. I'm just trying to make Google happy! :)
i agree, who would create 2 seperate sites for bing and Google? would love to get insight from SEOs who are actually doing this - i doubt there are any, ut you never know
Unless the factors diverged to the point of contradiction, it would not make sense to launch sites that are optimized for each individual engine.
This can be the case for creating throw-away sites that use grayer-techniques to rank in nascent search engines (like Bing's predecessors MSN or even Live), but would unlikely never make sense as a long term strategy.
Unsurprisingly, I love this. Did I miss the link to the data? (Ref: "Bring more science to SEO through a repeatable, peer-reviewed dataset").
I'm currently using some tips from Ben and the amazing free lectures published by Stanford to learn some machine learning techniques, so I'm keen to try stuff out (despite currently being pretty much at the "hello world" stage!).
What of the intermediate data would be most useful to share, and what format would be best?
I think you could get fairly closely repeat these statistics without any data from us. That said, I imagine if we provided some intermediate data it would save folks effort.
Who is teaching the machine learning classes you are watching? Do you have a link?
Ben
P.S.
The process to repeat this without data from us is roughly: pulling down the suggested keywords in every second level category from Google Adwords; for each of the queries (after removing duplicates) hit google.com and bing.com to find the urls ranking on the first page; fetch the html of those pages; compute mean Spearman's Correlation coefficients on whatever you want to measure; toss into excel to make chart.
Yeah - you're right that it's repeatable - but I guess it'd be useful to have the keywords and ranking data / features of those pages to save us having to go and gather that again. Is that something you can share?
The machine learning class is CS229 by Andrew Ng. I love how in the second lecture, he carefully explains how to take the transpose of a vector and then by lecture 3 is hammering thourgh partial differentiation of matrix functions...
I also have been playing with svm-learn and got the O'Reilly book "programming collective intelligence"...
My guess is you are going to need a much larger dataset to do multivariate tests like that..
Go ahead. Spoil my fun. ;)
Spoil? More data = excuse to write YAS (yet-another-scraper)
I can share that.
It will be a little work to get it out of my super sophisticated key-value database into a CSV. By "sophisticated database" I mean a directory of JSON files named after the sha1 of keys. By "a little work", I mean a couple of hours.
If you ping me after SMX advanced I'll make it for you.
Ben
I'll bug you for datasets when my skillz get a little better. Don't worry for now. Thanks buddy. I'll connect with you on email.
This may be seem really stupid to all of you, but I'm a little confused about the exact match anchor text so bare with for a seond if you can. When I am creating an anchor text link should I try to mirror exactly what someone might use for their search terms? If a phrase is used for an anchor text link and somene only uses two of the words within that phrase for their search would that be an exact match? I'm thinking the best way to explain this migh be by example so which of the following represents the best usage of anchor text link
a.) "Atlanta lofts and apartments"
b.) "loft, apartment, and studio rentals"
c.) "lofts in atlanta"
d.) "lofts"
I've been wodering about this for a whlle so if anyone can clear it up, thanks ahead of time!
Spencer
Yes, exact match anchor text means that you write the exact query term, for example if the search term is "lofts in atlanta" then you will write the same exact anchor text.
In phrase match just part of the anchor text is found in the search term, for example you write "loft" in the anchor text and you rank for "lofts in atlata",
Broad match is when the anchor test is for example DVD and you rank for CD, the last one is not much common in SEO but much common in Adowds.
WoW... without words.
This study is really really useful, especially if we think how Bing is going to increment is presence in the market due the Yahoo merge (and if we remind that from the next 21st of June it will be available on iPhone due the iOS 4).
I think I'm going to experiment with all these information creating 9 websites with .org, .com and .es domain names (1 with exact match keyword domain name, 1 with part of the main keyword and 1 hyphened) and about the same topic (a very very niche one in order to see faster results) and let's see a practical confirmation. The .es in order to see how your study works with gTLD domain names.
What I appreciate is that many of these things were "feelings" we were having and that you and your great team confirmed with scientific methodology.
And, finally, I liked a lot your false syllogisms:
The more I wear suits, the more I speak on panels about SEO. Does it therefore follow that wearing suits gets me onto panels about SEO?
Obviously not... but if we make the same phrase using Yellow Shoes instead of Suits... then the syllogism is going to be true ;-)
I'd be interested to hear the results of your controlled experiments with different TLDs.
We've found it hard to get enough data with controlled experiments to get conclusions with a small enough margin of error to be useful. Of course, such an experiment would really show causation.
Hey Ben, I'll try to drag up these results...
<a href='https://www.thegooglecache.com/white-hat-seo/google-showing-bias-towards-org-tlds/ '>https://www.thegooglecache.com/white-hat-seo/google-showing-bias-towards-org-tlds/</a>
Actually, it was such an easy study to run, might as well try again. Basically, buy 10 .orgs, 10 .coms and 10 .nets and, if you want, 10 .infos
Put 10 subdomains on each, so basically you are running 100 separate tests.
Generate random gibberish words for title and body text, and a random test term that will occur in the exact same location in the body text as every other (lets say the 4th word, where each of the previous 3 words are the same length).
Use Google Webmaster Tools to get them all indexed with sitemaps, varying the order with which you upload the sitemaps.
Wait and record.
Thanks for the suggestion!... I want to remark that I'd like to make this experiment in order to check out also Territorial domain names, as I work mostly for the Spanish and Italian market... and also because all the countries gtld are somehow missing from the study (surely because it has been conducted in the USA) and to see how Bing especially performs with not english languages.
Lovin' it! Great stuff, Rand.
To your point,
but merely that all other things being equal, .com domains didn't perform as well in the dataset we observed as other domain extensions.
Which domain performed "best"? We did (a much more informal and far less rigorous) internal experiment of this same variable back in 2008 and we found the same thing. Our experiment showed .net outperformed the .com, .us and .org in terms of PageRank.
Great info. Just goes to show that while it's great to attend conferences, you can follow the conference via Tweets and blog posts to get the real scoop.
While all of you are sitting at SMX, I'll be over at Namejet buying up all of the dropping .ORG domains.
During the panel, we speculated that Wikipedia may be a big cause of that .org skewing :-)
You could pull wikipedia out of the dataset and re-run, perhaps? And then, when we show Google how skewed it is, maybe they will pull wikipedia out of their dataset too.
One question, were these rankings pulled before or after May 1st?
After. They were all pulled last week.
Nice article, is there an updated version of this for one year later? I would like to see how these numbers have changed in the last year.
We always new that Bing and Google use different algorithms. This is the first article that highlights the differences based on solid investigation work. Well done!!
Will there be a similar presentation done at this year's SMX Advanced comparing post-Panda Google and 1-year older Bing? If not I would love to see an update to this!
Nothing mentioned about Toolbar PageRank. It would be interesting to see how much of a bellweather the little green bar would be for rankings at Bing.
Thanks for sharing this real time Rand, and thanks Ben et al. for the numbers crunching.
Like a few of the others above, I've concentrated solely on the 800 Kilo gorilla (Google) and only occasionally at best considered Bing. I know I ought not be single engine focused, but with time being a major factor I have to economize somehow.
I'll be reviewing this data in my leisure time, so I'm hoping to get insight that will reform my wayward myopic ways and allow me to include Bing in my calculations.
How very, very intereschting....
Especially liked this point:
"Don’t overfill your page with text for the sake of search engines. They don’t need a dissertation to decide to rank it highly; they want what the users want – for your site to be useful and informative."
I think many sites have the tendency to overload...who wants to sit down and read through a whole bunch of text? (My eyes are burning just thinking about it!) The world now has something like a three-second attention span. More pictures and graphs and less words make for a site that people will likely stop at and stay a while, specially if they sit in front of the computer for more than eight hours every day.
:)
Ben,
It does seem rather arbitrary, and it may violate some statistical assumptions. T-tests and ANOVA are a perfect fit for categorical variables such as "Exact Match Domain" (EMD). For instance, the mean rank of for EMDs was lower (M = 2.30, SD = 0.31), than the rank of non-EMDs (M = 2.92, SD = 0.35) (t = 7.42, p < .001).
There seems something fundamentally unsound about taking the mean of "Yes" and "No," or "True" and "False." Assigning numeric values to such values can be appropriate when using something like a Likert Scale, but even that is debated. I cannot recall ever seeing a case of a binomial variable being used in the calculation of correlation in an academic paper, and all of my statistics textbooks indicate that ANOVAs and t-tests should be used in such situations.
It's probable that in this case the results would not change if you were to recalculate using t-tests. However, there are certainly situations where assigning an arbitary binary value to a binomial variable would be innapropriate.
---
Edit:
Source
As it turns out, the rank biserial correlation coefficient can be used in situations where one variable is a binomial and the other consists of ranks. I learn something new everyday :-) I would still make the case that it is much simpler, more interesting, and more statistically sound to use a t-test.
T-tests are only valid if the samples are independent and identically distributed. Positions of results matching some definition (like having an exact match domain) are not independent, and thus it wouldn't be valid to do a t-test on them.
To see this, consider the positions of two results, A and B, in the same SERP. If A's position is #1, then there is zero chance B's position is #1. Hence their positions are not independent.
...
There is not anything wrong with mapping true and false to arbitrary numbers and applying statistics to that. Beyond requirements that are well known, like samples being IID, there are not limitations on the types of random numbers statistics applies to. Functions of random variables (like one that takes a boolean value and outputs a number) are themselves random variables, and so are fair game to apply statistics to.
Ben
Relevant quote from the guy who defined the concept of a nominal variable:
"…the use of numerals as names for classes is an example of the assignment of numerals according to rule. The rule is: Do not assign the same numeral to different classes or different numerals to the same class. Beyond that, anything goes with the nominal scale."
So I think he would strongly endorse my use of 1 and 0 for true and false as valid.
The quote came by way of wikipedia.
https://en.wikipedia.org/wiki/Level_of_measurement
Ben
PS
Talking about this is interesting.
Enjoying the conversation as well.
From the same Wikipedia article:
Spearman's correlation coefficient requires calculation of the mean (M) of both variables. One cannot calculate the mean of a nominal variable any more than one can calculate the mean of cats and dogs.
You raise a good point about the lack of independence in your data making a traditional student t-test difficult. There are other types of t-tests that can be used that do not assume independence in samples. Another solution would be to simply treat each keyword as a single data point rather than each ranking.
To get around that problem is why I transformed these nominal variables to a real valued variables.
great presentation of the data.
I'd add from my own personal experience with Bing - pay close attention to duplicate content within your site. make sure any crappy url structures are redirected properly so they resolve to only one url.
I absolutely believe their link valuation methods have gotten better as well. I had a few links out there that rode that thin line - bing ranked the site very well for at least 4 months, then tanked it last month for the exact anchor text search of the culpable link.
Other than that, I'd just recommend that everyone install the bing seo toolkit.
i know im almost a year late in seeing this but thanks for a great and thorough article. Lots of interesting stats
Wow - great research and presentation, so much useful data here to go through and study!
Thanks for sharing the results!
So, according to this, content not only seems to no longer be king, it seems as if it's barely a factor. Granted to earn the proper links, you'll need solid content, but from a SE perspective, they're barely worried about what's on-site?
I've known it's been "all about quality links" for awhile, but not quite to this extent.
Radical post. Thanks for all the research.
While content may not be easily manipulated to quickly gain rankings, as it was in the past, content is still really important.
As pointed out by spinnakerguy's comment, all these sites were probably fairly well optimized. I would guess they didn't compare the ranking of a well written page with a thin affiliate page. I think if this comparison were made, content could be shown to be a much more important factor.
I would argue that content is still king in that it is what earns the high quality links that you want. Without quality links, how would you get valuable links (without paying for it)
I think I'd agree with Geoff here. Not having the right content on a page still doesn't make sense from an SEO perspective.
But, if a SERP is already optimized fairly good, then having KW's on a page doesn't make a huge difference compared to other metrics. At least that's how I read the content, also because I have some good experience in cases where "just" content would help ranking a page higher.
so... much... information... hard... to... compute... ;) Excellent post!
Bing may pass internal link juice differently than Google. In the past I noticed that Bing's Webmaster tools rated non-optimized pages in the "Top 5 Pages" above my keyword pages. This changed after I started adding some deep links and optimized anchor text backlinks to my keyword pages. Now my top 5 pages are the home page and four keyword pages. It seems as if my external actions shaped how they judged my internal pages. Although I may just misunderstand what Bing means with the Top 5 Pages.
Do you think the SE's distribute internal link juice according to external factors (links, anchor text, etc.)? I hope this makes sense.
Fantastic research and an excellent post as always guys
There's a tremendous amount of data here, and for that, my thanks. I think the differences and opinions voiced in the comments underscore just how far we still have to go.From my perspective, SEO is a means to an end. It can help to start a conversation between company and prospective customer, but once the introduction is over the content has to be there. Otherwise, you'll merely succeed in driving up the number of hits to a site, page, etc. without converting that traffic into revenue.
WOW- Awesome analysis and presentation... I loved it. Thanks for sharing...
Our site appears on page 1 of Bing and Yahoo for several queries, but rarely makes it past page 3 for Google. Your article is interesting but it doesn't explain what we are seeing.
Thanks for the informative article anyway - interesting
It's really intresting to see that Bing actually puts more enphasis on # of linking root domains to the URL.
I've actually learned a lot about Google's search factors not just only the relation between Google vs Bing's search factors from this blog post. Great post!
Great post! Great research and presentation. I learned many new.
Its nice to see such a visual representation of the differences between Google and Bing. Very easy to read and digest.
The Alt attribute is an interesting item...hhmmm. Maybe I should used these more :)
Wow Great Analysis...I saw this post come up the other day and I've had it set as my homepage so I wouldn't forget to read it haha.
I was really surprised that Google actually has a higher correlation with ranking exact match domains higher..
When clients ask for some of the differences in the SE's I always list this is more of a priority for bing..not anymore
Question about the On-Page Keyword Usage...So this data should lead us to believe that there is a negative correlation for KWs in H1 and Title tags?...You found this predicts lower placement in the results? Or am I interpreting that wrong?
I'll have to add this one to the bookmarks...
I've been bookmarking wayyy to many posts...
Nice insights :) watery in statistical mean but relevant in principle.
Was wondering if the KW analysis performed included alpha-numeric char and/or proximity (2n) correlations within website homepages, URL and onpage elements.
Thank u for taking the time to put this together and exp for leaving it open for p2p interp/discussions. ftw!
Liked Rand's suit ! But I liked the comparison between domain ranking differences in Google and BING. I think it is really helpful article to get an insight about, how you can choose your domains for optimum quick results.
Good Work Rand and SEOMOZ
Interesting article but perhaps even more interesting points raised on the statistical analysis, some of which seem to be valid. Hopefully this discussion will lead to a positive and useful outcome, beneficial to those involved in SEO work.
Definitely a stiff competition taking place between these two search giants... I enjoyed reading your take on this battle! Web Videos
Brilliant research. So useful!!! Thank you guys!
( Small spelling error on line 387: "Bind seems more Google-like [...]" )
Looking forward to comparing this data with next year's! :D
Great article and insight in comments. I have several exact mach geo-targeted keyword domains that rank #1 that are 10+ years old. My question to the experts… How important is the age of the domain name in determining search engine ranking?
In 2009, SEOmoz estimated that registration and hosting data represented about 6.91% of Google's ranking algorithm, keeping in mind that every SERP is different; for some SERPs, domain age doesn't matter at all.
Around the same time, @seosmarty reasons that domain age isn't very important at all. The age of the link profile is.
The Hobo also reasons that domain age doesn't matter much. Rather, what matters is what you do with the domain over time.
Nowadays, I'd estimate that the age of a domain name is, on average, less than 2% of the ranking algo, and increasingly less important. I agree with Ann that it's the link profile of a domain that matters, not the registration data. However, this metric, I'd guess, is variable across SERPs; in some it's likely irrelevant, and in others it might have more relevance (but still little, compared to link popularity).
If you're going to be "squatting" on a domain, I'd suggest publishing 5-10 pages of relevant text-driven content in the near future, and building a few key links, so you can take advantage of having some history in your link profile.
"As we grow old… the beauty steals inward." Ralph Waldo Emerson
Hi Folks, Yes, it's a fascinating and much appreciated set of data. Thanks.
I've had a hard time finding specific details about the things that Bing and Yahoo value and devalue. The big mystery I have on my hands is a site that ranks #6-10 on Google for its primary keyphrase, and it never breaks 80 on Bing or Yahoo. I can't find any logic to it. Unfortunately, no "aha's" from this article. Can anyone make any suggestions about how I could sleuth this out? I've tried using more and fewer instances of the keyphrase in various places on the site, and it never seems to make much difference. The only noteworthy observation I made is that my Yahoo rankings sunk (to the level of Bing) when I failed to renew my Yahoo Business Listing at $300 per year. Thanks for any ideas!
Thanks Rand for actually providing some good information at an Advanced SEO conference. It is frustrating when I go to a conference that I paid good money for to hear the speakers closing line/sales pitch "if you want more information call my consulting firm at XXX-XXXX"
Granted I now realize that even though I paid for the conference the speakers were not paid to be there... So I paid to get their "Free" version.
Sorry for the rant.
Awesome info. And I really despise the fact that Yahoo gives so much weight to domains with exact matchs... of course since they are selling their seach to Bing that won't be as much of an issue here soon.
Very useful information and great post!
I'm wondering one thing though, as this has been posted on June 8th, 2010. How much impact has all big G:s Penguin & Panda updates had to the result presented in this post?
I'm relating to 'TLD Extensions' chart particular. How much I understood from different posts, it's a bit controversy that .ORG should give more 'juice' than .COM:s today. But again, it is near the end of 2012 so maybe this post does not present the current state of big G:s algorithms.
I would appreciate if author could share a bit light on this. Thanks for very deep analysis again.
Just finished reading! I learned a lot about the ranking elements of Google and Bing through this post. Thanks again! This was posted last year, yet, I still don't know most of the ranking elements of both search engines. O__O
Once again, thanks thanks and thanks! :D
Another killer post
Cheers Rand!
Few are the important points which I would like to summarize on and give my interpretation. As always correct me if I am wrong
High correlations numbers for link count and diversity which signifies the importance of links from varied sources for both the search engines.
I'm surprised with the .net version a bit and even more surprised to see .org extensions having good correlation points.
Google giving higher correlation numbers for exact match domain is a surprise I felt it would be the other way round.
Bing is bad for long tail keyword searches unless you have an exact domain(homepage is king for Bing)
Linking root domains matter less to Google when compared to Bing. Google may have a slighter inclination towards links in content and number of links rather than Bing.
Shorter URLs are likely a good best practice(other reason why homepages rank so well for Bing if you got your URL right)
Link attributes as a whole have much higher correlation with rankings than on-page or domain related elements (once again link building is the king)
"all other things being equal, .com domains didn't perform as well in the dataset we observed as other domain extensions." - have you normalized TLD amounts, so that it would be fair to point that out?
"Everyone seems to be optimizing their title tags these days (appeared in Google: 11,115 vs. Bing: 11,143). Differentiating here is hard." - doesn't it means that you really should optimize title tags? Once that no (almost no) results have no KWs in the title tag.
"Links are still likely a major part of the algorithms" - once having 30 title tags, for example, is quite silly and has no effect, and it is possible to have 30 thounsand links, maybe it's not that major part of the algorithm, but the only one you can add more stuff and really do better.
"Link attributes as a whole have much higher correlation with rankings than on-page or domain related elements" - again, it's no use (and nonsense) to have 30 title tags, 1000 KWs on page, 15 exact matching KWs on 1 domain name, etc., while it is really possible and useful to have thousand dozens of hundreds etc. of links, so, in the algorithm itself, that may not be a ranking factor which is more important than others, however, it is the we are able to manipulate.
These resulst suggest that ther's not much difference between Google and Bing from an SEO perspective.
My site's pretty well optimised for Google, but I'm nowhere in Bing.
I get 120x the traffic from Google than from Bing; Bing is 0.8% of search traffic.
That suggests there are some very big differences, not the small ones you are finding?
Are my figues out of line with what others are seeing from Bing?
It seesm to be driven by Bing being, well, rubbish. Look at the quality of the results for a serch for "iphone 4" on the UK Bing site:
https://www.bing.com/search?q=iphone+4
Maybe one out of 10 is relevant!
Was the ranking data pulled after Google's May Day update? From what I've read from others who have experienced ranking decreases, the bonus for exact match .com domains has been drastically reduced.
It was. I started collecting data for these charts last Monday.
Love it, great work guys. I have to ask though: How does one calculate a correlation with a nominal variable such as TLD? The correlatin equation requires computing the mean of both variables, and I don't see how it is possible to calculate the mean of a binomial such as "Exact Match Domain." It seems that a t-test or ANOVA would be more appropriate in these situations.
Thanks Sean!
Boolean features I take as 0 if false and 1 if true, although this is arbitrary. Using 7 and 42 would give the same answer. After one converts it to ranked indices it will be the same (as in Spearman's), or in Pearson's the larger numerator will be exactly balanced out by a larger denominator.
Binary t-tests and ANOVA don't work well on ranked data like this. How would you formulate it into binary IID trials?
Ben
This is brilliant, I am so grateful that you guys have the time and manpower to put this all together. Off to re-read it and digest a little further. Thanks again.
Love it,
many thanks guys you've helped me decide there is no great worth in optimising for Bing seperately to Google! Saves me a whole bunch of trouble :)
Thanks for the insight! It's always to know where the focus is site structure wise.
This was a great chart not just for bing comparison but also for Google ranking factors as well. A good report before SEOmoz does its 2010 rankings factors report in full force.
Sadly, I think they leap frog the years they publish that report and it was just published last year.
But then again, since they are now not consulting maybe they'll be able to make it an annual?...Rand?
Rand - great session today and great blog post too. I've had a chance to go back over your data and thought of a couple things that might be interesting to look at (since you asked). It may be that this is too much detail for your methodology, but how about measuring what constitutes too much keyword usage. For example, is it better to put your keyphrase in the title tag twice. What about the body text - how much keyword density is too much? Is more better, worse or doesn;t matter. what about the benefit (or lack thereof) of short anchor text with your keyphrase vs. longer anchor text (i.e. keyword density in the anchor)? Just some thoughts for you ...
Great presentation. Thanks for sharing it on such an immediate basis. Truly Real Time.
Need to read it in detail to actually comment .
Regarding the last point
Bing seems to be moving much closer to Google over time; although we didn't measure all of these results precisely last year, the similarity of the two has dramatically increased (of course, it's also possible that Google is getting more Bing-like, though this doesn't fit with our personal experiences)
I think both are becoming truly search engine like.
But since a long time search engine has been synonymous to Google.
Great data collection and very useful, thanks !
how long (max) would you say an internal page URL should be rand? I've always kept it in the ball park or below what the meta title is
Thanks Rand, good stuff.
Will now go get my glasses, a strong coffee and work through the finer details!
Cheers
I like the TLD extension section and the thoughts on .org domains.
Lately, as I see more and more .org domains showing up with strong content, I have wondered at the reason. Is it correlation or causation?
We have recently thrown around the idea of starting a new online information source as a .org instead of a .com but were not able to put any hard data behind our decision. Obviously we are going to scoop up both domains, but the question still remains as to which one will be the primary brand.
I think it's the psychology behind .org that is the most beneficial as .com is commerce, and org is organization which implies credibility vs someone selling to you. All the same most people assume a site is .com instead of .org so if you have both for a redirect to the .com that's the ideal situation. If you haven't gone through the Google sandbox stage it wouldn't hurt to try the .org out. It's kind of ironic I was just talking about this yesterday with a friend.
Wow...great post....my head hurts
Thanks for taking the time to collect and analyze this data. But I'm still going to focus on optimizing for Google....Can't imagine the time to optimize for multiple SE
I think over time there's going to be a lot more equilibrium as Google increases the amount of variables in their algorithm to the point where it's not going to be as blatant as it has been. As we've seen in other posts on here the H1 tag doesn't have as much significance as we'd like to have thought. All the same everything counts, and as long as we know what search uses to find relevancy we'll be on topic.
Nice post! Especially because it triggers a discussion :-) And I really like Bens comments!
Is the search engine bing give us Page rank our website also?i have a domain ClickToSeo this one is perkaing google adsense for domain. if i add a domain for adsense and i want to seo for this domain is there any problem?
you'll have to elaborate a little more about what you mean
When you used Google AdWords, at what level in the category hierarchy did you collect keywords? Also, when you downloaded keywords from each category, did you keep the default sorting option (by relevance), or change it to something else?
I didn't play with the sorting options.
I went to the second level down.
Interesting. If that is the case, then the argument that your sample only consisted of very popular terms does not necessarily hold up. The default sorting option is by "relevance" to the category. When I replicated your methods, many of the keywords were actually fairly low volume.
I understand Sean pulled down 800 results for all of the top level categories and got a number of queries rather similar to the number we used. Presumably my comment above reflects a bad memory of what I did, and I must have actually used the top level categories. If I really had used the second level categories, we should have a lot more queries.
Super great post, ive been scratching my head a lot lately thinking about bing. this clears a lot up thanks!
Under Anchor Text Link Matches
"Bind seems more Google-like than in the past on handling exact match anchor links" is Bind supposed to be Bing?
In the Exact Macth Domains by TLD Extentions, the data reveals that .com is more highly correlated as an exact match than other extentions. In the event the the .com is taken what would be a better alternative, a hyphenated version of .com with the same keywords, or an exact match of another extention.
For example, if domainkeywords.com is taken, what would the best alternative be? domain-keywords.com or domainkeywords.ca
domain-keywords.com if you have global aspirations, domainkeywords.ca if you want to keep it in Canada.
Jeremy.
domain-keywords.com if you have global aspirations, domainkeywords.ca if you want to keep it in Canada.
Jeremy.
how can anyone get to a global when there is usually a local version of the site.. I think google still need to retweek their recent changes... say you look up travel in australia. All I get is .au site .. I want to travel out of Australia so I want global sites...
So so much info to process. This post reaffirms that one can still overlook Bing and work to please just Google. I have seen the average traffic share from Google still hanging around 20% and I think it's more to do with bing's market share than bing's love for my site.
Great great post.
Cheers,
Pulkit
Thanks for the great analysis. I hope that Bing continues to become more Google-like. I cannot imagine it going the other way.
Super post! I love to see all that data and try to pull some sense out of it...
Thanks Rand for the effort..
I think this kind of information will become more valuable in the future if Bing gets a bigger market share, but as of now, Yahoo & Bing's share are just too small, Google pretty much owns the market. We are ranked very similarly for the same keywords on both Google & Bing/Yahoo and see very little traffic from the latter two.
The future is possibly less than 6 months away :) Think changes are in play just after the holidays.
I think that the Mircosoft Giant is still a viable threat to Google. Frankly, I would prefer to see a balance of power.
Does any of this data reallly change anything? Matt cutts says good content is all your need. People will link to you because of that. So write good optimized content and the web is your oyster. Right? Hello?
Getting crazy between following the tweets from the SMS Advanced and reading the post... gotta get more multitasking!