This past week during the SMX Advanced conference in Seattle, I presented some correlation data alongside Janet Driscoll-Miller, Sasi Parthasarathy of Bing & Matt Cutts of Google. Matt in particular was quite vocal in expressing a desire to see additional data points from our research, primarily around the prominence/visibility of particular elements in the results. This post is intended to help make that available.
I must say that I don't agree with Matt on the importance of the raw visibility/counts over the ranking correlations. My feeling is that SEOs in these spaces are more interested in answering the question - "what features predict a result will rank higher vs. lower on page 1?" - rather than the more straightforward - "does this feature appear more frequently on page 1 at Google or Bing?" However, I certainly agree that both are relevant and interesting.
If you're trying to wrap your head around how to understand this prominence/visiblity data vs. our earlier data on the correlation with rankings, here's how we'd best describe it:
- Correlation w/ rankings data helps to answer the question, "when this feature appears in results on the first page of Google/Bing, who ranks it higher and by what amount?" Those correlation numbers were derived by looking at the liklihood that a result would rank above another when it contained the target attribute.
- Visibility/prominence of an element helps to answer the question, "is this element more likely to appears on the first page of Google's/Bing's results?" This simply looks at the number of times we saw a result (or multiple results) ranking on page 1 containing the target attribute.
We're looking at the latter one in this post, but before we dive in, there are a few critical items to understand:
- This isn't correlation data and there's no standard error or deviation numbers here. It's simply how many times we saw the element in the results we gathered, divided by the total number of results (SERPs or URLs depending on the chart) to get a percentage.
- This data is from page 1 of results from 11,351 search results, gathered from Google's AdWords categories. This means the terms and phrases vary somewhat in search quantity (from sub-100 searches per month to tens or hundreds of thousands) but generally have a commercial focus and a intent. They generally don't include brand names, long tail phrases or vanityname searches. Overall, we picked them because they're precisely the kinds of queries most SEOs care about when they're doing competitive SEO for their companies and clients. We also ignore the second result in a SERP from the same domain to avoid effects of indented results (which was important for our earlier statistics, but not those in this post).
- The results were collected the week of May 31st and thus, include post-"Mayday" update SERPs and likely results from after the "caffeine" launch as well (though Google did not announce when exactly that rollout occurred - it may not have much bearing as caffeine supposedly is an infrastructure, rather than an algorithmic change).
- Each feature contains two pie charts, one showing the percentage of results that contained at least 1 URL with this feature and another showing the percentage of total URLs in all results (102,296 for Google and 109,966 for Bing - note that some SERPs will fluctuate the quantity of standard web results they show on page 1). These are labeled as "(feature) in SERPs" and "(feature) in URLs," respectively.
In gathering this data, we did not optimize to share it in this fashion. In fact, Ben & I both feel that if we wanted to do it this way, we should gather the first 3-5 pages of results, not just the 1st page. The way, one could compare the counts on page 1 with the counts on page 2. However, since we've got the data and Matt, Sasi and several other folks expressed interest, we're sharing anyway. Hopefully in the future we can do more on this front.
Let's dive in!
Exact Match Domains
These are domains that precisely matched the keywords in the query - e.g. for the query "dog collars" only a domain that matched *.dogcollars.* would be included.
You can see that Bing has slightly more exact match domains appearing in at least one result of the SERPs we collected and in the overall count of results (all the URLs from all the SERPs).
Exact Match .com Domains
Similar to exact match domains, exact match .com domains had to contain the exact query in the domain name and have a .com TLD extension.
Again, Bing showed a slight preference for displaying results from these sites in the SERPs and URLs we observed.
Exact Match .net Domains
As above, but replace ".com" with ".net."
The similarity is much closer in the number of total URLs we saw with .net exact match, but Bing is showing a preference in the SERPs count.
Exact Match .org Domains
In the .org TLDs, we start to see a bit of what we observed in the ranking correlation data:
This is the first exact match domain TLD where Google actually had more SERPs containing a result of this type. Bing, however, had a very tiny amount more URLs with this feature.
Exact Hyphenated Match Domains
One of Matt Cutts' complaints centered around how Google vs. Bing handled exact hyphenated match domains. When we observed them in ranking correlations, it appeared that, when Google listed them, they would rank them higher than Bing did when they appeared on that first page of results. However...
As I called out in the presentation and the prior post, Bing has quite a few more SERPs where exact match domains appear and somewhat more URLs, too. This is another data point that should make us all think carefully about the fallacy of presuming correlation = causation. Bing might have a preference for exact hyphenated match domains, but the ranking correlations suggest to me there's more going on here - maybe something to do with anchor text or where those types of sites tend to get links or something else we haven't considered?
It's critical to keep in mind that we're just looking at individual factors here - not trying to explain why they exist or correlate (at least, not in the data).
Results that Include All Keywords in the Domain Name
Here we looked for domains that contained the keyword query in the domain, even if the match wasn't exact. For example, mydogcollar.com would now match for the phrase "dog collar."
Again, it's Bing that shows a higher number of these types of domains in their results.
Results that Include All Keywords in the Subdomain Name
We've previously shown some data suggesting that subdomains might have some ranking influence, but not as much as root domains (this was done using our rank modeling / machine learning process). Here's some raw data on the number of times we observed keyword matching subdomains:
Perhaps not surprisingly, Bing again is showing more of these results in their SERPs and individual URLs.
.com Domains
For this feature and all the TLDs below, we're just looking at any URL that has the domain extension.
It looks like Bing has very slightly more .coms in their results vs. Google.
.org Domains
Let's see what happens for .org domains, recalling Google's apparent preference for them in the ranking correlations.
Oddly, Bing again seems to have more .org pages in the SERPs and URLs.
.net Domains
URLs with .net probably won't surprise you much:
Yet again, Bing is showing a small number more than their Googly competitors.
.edu Domains
Recall how, in the correlation data, the numbers were small(ish) but negatively correlated? Let's see what the number of results shows:
True to the stereotype, Google is slightly ahead on number of .edu domains in the SERPs & URLs.
.gov Domains
Given the previous charts, this one likely won't surprise you:
Google has more .edus and more .govs, too.
Keywords in the Title Element
Not surprisingly, nearly every set of SERPs had at least one result where the title tag contained the keywords:
Bing shows up with more results that contain title tag to keyword matching. One thing that is worth mentioning is that we didn't observe the titles the engines chose to show, but rather the page titles from the results themselves. Hence, if a result was showing a DMOZ title or a brand title (which Goole will sometimes insert), we ignored those and just saw the title element on the page itself.
Keywords in the URL
This one actually surprised me, if only because there were even fewer results with keywords in the URL than in the title!
Bing again has more results with keyword-matching URLs, though remember that some of that is probably from keyword matching domains, too.
Keywords in the H1
The ranking correlations suggested that the H1 tag isn't much of a differentiator, yet lots of people still swear by them:
The results would bear out that this is a much less frequent item than URLs or Titles for those ranking on page 1. Bing seems to show more of them than Google, though.
Keywords in the Alt Attribute
Alt attributes looked interesting last fall when we collected ranking information and once again provde worth a look in the correlation data from SMX Advanced. Let's see what the raw couts show:
Bing is showing slightly more of these, but if the positive correlation means something, these numbers certanly suggest there's lots of opportunity left for good alt attribute practices.
Homepages
Who lists homepages vs. deep pages in the results more?
My word! It's Google by a good margin. Bing's show of internal pages actually surprises me a bit, though perhaps that's an old stereotype I need to abolish.
And with that, we're done!
One important point to notice is that I've not included data on link results, as these would be hard to interpret and likely non-useful. Every page of results had pages with links to them and nearly every individual ranking URL also had links (a good sign for Linkscape's index, but not super valuable as a data point). There were a few other data pieces like this that wouldn't make sense here (keyword prominence in the body tag, word tokens in the body tag, domain name length, etc) and have thus been excluded.
I've done less analysis on these results in general, as I think the data is a bit less ideal for the purpose, but it's still interesting and hopefully, illustrative of general prominence. I look forward to seeing your interpretations and discussion!
p.s. If you email Ben at SEOmoz dot org, he will send you a lot of numbers in a TSV which is for each query the metrics for each result that we used in these posts. You can also find raw results in a public Google spreadsheet doc here. Feel free to play around and let us know if you see anything else cool and interesting.
Bit of an aside, but a comment came up in the criticism of the original post that was essentially: "Who cares about Bing?". I know there's been a lot of debate, but let's accept the search-engine share numbers from comScore for now. Bing is at about 12%. Now, let's jump ahead to Q4 or so when the Bing algo will power Yahoo (still two separate sites, but the same SEO rules). Yahoo is currently at about 18% share. Add them up: 30% share, almost 1/3 of the market.
I don't know about anyone else, but I think 30% is well worth paying attention to. I think Any good SEO should be interested in what's going on with Bing. The good news is that Bing SEO doesn't seem to be too huge a departure from Google, but we should still be taking an active interest in how it works.
I still am skeptical regarding the alt tag. My guess is this...Let's say that the community as a whole generally thinks that there are 25 factors that affect rankings. Most everyone knows about keywords in the title, so they put keywords in the title. Fewer know about keywords in meta description, but it is still a high number, so they do it. Even fewer know about H1s. And even fewer than that about alt tags, so on and so forth.The important question here is what percentage of the OTHER important tactics do webmasters who put keywords in the title use, vs webmasters who put keywords in the alt text. My guess is that if you get to the level of granularity of manipulating alt text, you have reached a level of sophistication and dedication that you are probably hitting a lot of the other factors that those who just fix the title tags are not.Moreover, nearly all CMS available these days make title and meta data manipulation very easy, and put the title into an H1 tag on the page. Alt text is left un-modified. My guess is that websites that get the right alt-text is indicative of concientious webmasters than it is of alt-text impacting rankings.
Russ - I think that's a great interpretation, but remember that this data isn't causative, it's just correlation, so we're not trying to make that claim.
However... Looking back to the work from last fall where we did build a causative model, alt text did appear to have some positive impact, even controlling for other factors, albeit slight (and the margins for error in that work were much higher).
However, I think your suggestion is certainly possible. The other one that strikes me is that sites that use good alt attrributes are often not SEO'd well, but do often have legal requirements around the use of those elements as they work with government or educational institutions. This often means they're both non-commercial (which people like to link to) and have ties to high value sites that may likely link to them (.govs and .edus and others associated with them).
Having hypotheses about the source of the correlation is great, because it means more deep thinking about the issue and other data points we can look at or control for in the future (or you could do yourself!). I'd just warn about being cautious to presume that any theory is definitive without running some numbers to back it up.
It is perhaps of interest to therefore compare search results against a 'standard' corpus (as standard as the web gets). This would perhaps make it clearer when a feature is particular to the results set as opposed to just something that everyone is doing as good practice.
It's something I've thought of for a while for this type of experimentation, but sadly work gets in the way of my playtime and I've not been able to further consider how such an experiment would be done.
Nice follow-up guys. I ran a couple of chi-square tests (which I believe is the appropriate statistic here), and most of the Google/Bing differences you found are significant. One or two of them were not, but it is pretty easy to identify them looking at the graphs. Bravo!
Thanks for doing the job, I just wanted to ask Rand about it, because this is crucial.
Thanks!
Mostly for fun, here is a way with simpler math but also understating how much is significant. We make an upper bound on error using the normal approximation to the binomial distribution and produce a max confidence interval that way. That overstates our uncertainty, but can give people a rough idea.
Some quick and conservative math for that says SERP percentages 0.18% apart or URL percentages 0.002% apart have 95% confidence to be different.
Here is my work:
variance = v = n*p*(1-p)
standard error = e = sqrt(v) / sqrt(n)
percentage standard error = s = 100 * e / n
= 100 * sqrt(n*p*(1-p)) / (sqrt(n) * n)
Take p = 0.5, which will have the highest standard error.
For the counts based on SERPs (n = 11,351) we get s < 0.0045%, and so we have a 95% confidence interval of +/- 0.09%.
For the counts based on each URL (n >= 102,296) we get s < 0.00049%, and so we have a 95% confidence interval of +/- 0.001%
More careful math could show more differences are significant, but at least this confirms Sean's point if it looks different it is.
The confidence intervals are definitely important, but comparing two will underestimate the probability that the difference is due to chance. Perhaps the most obvious reason is that they each use a separate p value (p < .05). Thus, the odds that both individual means are within the confidence interval is equal to 0.95 * 0.95 = 0.90.
However, using Pearson's Chi-square test, we can calculate the probability that each difference is due to chance. For instance, the differences found between Google and Bing in Exact Match .com Domains are significant for both URLs and SERPs.
On the other hand, the differences found between Google and Bing for Exact Match .net Domains and Exact Match .org Domains are not significant for either URLs or SERPs.
To give you an example of a result that borders on significance, the difference found between Google and Bing in Keywords in H1s in SERPs (0.40%) is significant at p < .05, but not at p < .01.
I thought of an assumption in computing the error bounds in my comment above that is not entirely met.
To consider the URL counts as coming from a binomial distribution we need to assume that the URLs are IID. Because the URLs are grouped in SERPs, this isn't entirely true.
Still, the assumption is close to being met, and otherwise the math very quite conservative.
Ben, I still think that the independence assumption is met because we're not concerned with the actual value of the URL. For instance, let's say that we are measuring the frequency of homepages in SERPs. It is true that the URL in position one cannot also be in position two. However, we don't care about the value taken by the URL, only whether or not it is a homepage. We have no reason to believe that, for any given query, if a homepage is in position one, a homepage is less likely to also be in position two. Thus, in the context of what we are measuring, independence holds.
You could certainly make the case that something like domain name is not independent of ranking. Because of QDD, we have reason to believe that if Wikipedia.org ranks first for a query, it is less likely to rank second. I think the key is that independence applies only to the variables that we are measuring and comparing. Even though https://www.example.com/ cannot take both the first and second positions for a query, it's presence in position two has no effect on most measurements that we take about position one.
Where is Yahoo in all this, I guess they are done.
Nice comparison Rand.
Thanks,
Emil
The premise of the session that Danny Sullivan asked me to prepare this research for was that Bing is going to be powering Yahoo!'s results by the end of the year, thus giving us only 2 algorithmic engines. Hence we showed Bing vs. Google (though in reality, I presume Bing will be incorporating at least parts of Yahoo!'s technology into their own ranking systems, so the new Bing/Yahoo! merger may have quite different numbers once in place).
Is it also possible that Yahoo! will retain some of it's algorithmic magic and rerank the Bing results that they provide?
Actually, Yahoo! and Bing have been quite straightforward that Bing's algorithmic results and Yahoo!'s will exactly match once the transition is finalized.
I think it is safe to say that Yahoo honors exact matches even more. This is just my gut speaking...but alot of other other marketers have said the same.
I think the results in this post are better at comparing how .com's fare on Google compared to how they fare on Bing, but they are much less good at comparing how .coms fare compared to how .orgs fare.
(I mean this point not just for .com and .org, but more generally this new data is good for comparing engines together and less good for comparing features to each other)
Ben, Rand mentioned that we could get the actual raw data by email you, but I do not see an email in your profile. What is it please?
Thanks.
It is my first name at seomoz.org.
You can find it on my profile page here:
https://www.seomoz.org/team/ben
Ben, thanks. You (and other mozzers) should consider adding a link to your employee profiles in your user profiles.
Emailing now.
Hi Rand,
I was one of your harsh critiques in the Sphinn debate. I just wanted to thank you for clarifying the data in the posts. I do apologize for the harsh words. There are sometimes where you just should not push publish and it is too late afterwards. I hope you accept the apology.
No apologies necessary Mert! I just hope the data can be valuable to you or your clients in some way and help bring credibility to back up the recommendations you make to them. :-)
Holy pie charts Batman!
Thanks for sharing all of this data- it's always interesting to see further explanations when someone challenges you on your data.
Rand
Thanks for coming out with this additional information from the original research. I think this does a better job at comparing data in a cleaner way than the previous article.
At the same time however, the statement "The ranking correlations suggested that the H1 tag isn't much of a differentiator, yet lots of people still swear by them:" is off the mark.
If the majority of the results you studied are sites that were optimized by people who don't happen to use the H1, that in and of itself is not a legitimate basis to claim that the H1 isn't a differentiator. Sure, your data might IMPLY this, but why even include such a statement here?
You're not comparing sites that use it to sites that don't given all other factors being equal. Since you're not making that comparison, such a claim only pollutes the information provided, and only confuses the matter given all your initial disclaimers.
Just my opinion...
Alan - well, I think again we might disagree, though perhaps you're just parsing language differently than I would.
When I say the previous correlation data suggests that H1s are not particularly beneficial, I believe that's a pretty accurate statement (I'm not saying it means that or equals that or definitely is that, just "suggests"). We also have the ranking models data from the machine learning work that suggests the same thing, but from a causative model, not just a correlation one (in that process, we controlled for other variables when analyzing).
Thus, when I say that correlation data suggests it's not a great differentiator but that lots of sites/pages are still employing it, I think that's an accurate statement. Again, maybe we just interpret the meaning of "suggests" differently. I see that and the word you chose - "implies" - as synonyms.
Rand,
The more I think about it, the more I have to agree with the issue of parsing words / interpreting. I tend to become a stickler for interpretation of word usage being based on my own internal filters rather than through a detached observer view and often forget I do so...
Either way, the data overall is fascinating and something that most of us don't look at on our own.
So, in laymen's terms, what are the biggest takeaways from this data? I think that I'm able to make sense of it all, but I wanted to make sure I was on the same page as everyone else.
EDIT: Let me rephrase...How does will this data affect your day to day SEO strategy? Furthermore, is there any aspect of this data that makes you say, "Holy Smokes...I had no idea!"?
These are my opinions, but the takeaways I get were:
As compared to the prior data on ranking correlations, this is, as I mentioned, less interesting to me as an SEO, but perhaps equally interesting as an observer of the search space and someone who's curious about whether, how and how much Google & Bing are different.
I'm digging the research posts! Does anyone think that non-.com domains (.net or .org) or hy-phen-at-ed sites receive more scrutiny with B and G? Starting with the assumption that I'm reading the data correct (which may not be wise) it appears that those three types are much more targeted to their keywords than exact match .com's. This could mean that those are niche sites, or that they are more likely to be manipulative. With .org following the same trend, I personally think its more a matter of supply and demand and not likely manipulation. Fewer people use .org and .net, so a higher percentage of the chosen .org and .net domains will likely be more targeted toward specific keywords. In regard to hyphenated domains, if you could get the unhyphenated domain you likely would... unless you sell hyphens (www.we-sell-hyphens.com). Thoughts?
EDIT: By "digging" I didn't mean I was submitting the post to DIGG :-D
Amazing research. Thank you.
Additional points of interest would be long tail verses head, geo and high competition keywords verses low.
Agree... I started to pay more attention to Bing when I've noticed that the conversion rate from Bing where percentually higher than from Google (and Yahoo!).
Maybe it's just a case of "not big numbers", but if someone could provide datas about ROI correlation between G and B, I think it would be useful.
Ah... and now go read my YOUmoz post ;)
I was just speaking with a client and explained that they get 9% higher conversion from Bing and a 11% lower bounce rate. However 80% of their traffic is from Google. Their site gets more traffic from Wikipedia than Bing :-(
EDIT: They also rank better in Bing than Google.
Wow...mind expanding stuff. though I think it needs several re-reads to get the real "meat" in the article.
I wonder how all this will look in a year or so as the Google/Bing match develops...
This may be stupid of me, but I haven't invested any time on Bing or Yahoo because Google seems to have most of the traffic anyways, at least for now. All the SEO info out there is all targeted to how Google rank's too. Anyone else think the same?
Great post. Thanks for providing all that data.
Do you think it would make sense to break that data down to the number of words in the queries? I would think that there might a difference in correlations for single-word and two-word queries.
Best,
Markus
remember .05
okay, Can you show work for Ho: m
Ha:m
for you hypothesis.
Don't forget to go through all the steps. It is one thing to just use your B/S.. I would like to either reject or fail to reject your evidence.
Rand - This was such a great presentation. Excellent timing for this research with the merger of Yahoo! and Bing SERPs coming soon.
Nice research results layout Rand. To echo Dr. Pete above, when the Yahoo/Bing single engine search results coalesce then there will be a 30% contender to Google's 70%.
What that means to me is that I will now be pushing my clients harder to optimize for both (Google and Bing) engines. And thanks to your excellent work above, I'll have something concrete (not to mention snazzy with all the pie charts) to point my clients to.
Thanks.
thanks for the summary to be honest with you the other post was bit hard to follow visuals are better specially for people like me who sneak blog reading while they are at work
I find the correlation research to be very helpful. Though most clients are interested in ranking within Google, there has been an increase in interest in Bing, and I do expect this interest will continue to grow. Though I have not made a huge effort to optimize for Bing it is good to know the effect that optimization for Google will have in Bing and how to estimate performance within this search engine.
As for the H1 relevancy, I am finding that H1 tags seem to have a higher impact when searching for Long Tail words. When I am doing comparative ranking research I find that pages are ranking for words that are only visible in the H1 tag, and not the title, URL, Alt, etc.... When I optimize pages I always organize keywords into primary and secondary terms knowing to expect that they will perform differently. It is my secondary terms that are placed in H1 tags that are affecting page rankings, though this may be due to the fact that they are less competitive and not, as you have indicated that the H1 tags have higher relevancy. I am continueing to monitor this and am looking to see if there is a correlation between keyword difficulty.
I optimize for Google, and rank better in Bing.
Thank you for that post Rand.
I suppose you got the data from google.com and bing.comI would love to hear that you make some similar tests for the European market. Though I understand, that with all those different countries that would be much more effort ....
Petra
Algo - I think looking across some different countries would be a great study, actually! I'd be most interested to see how/if the ranking elements differ from region to region, or whether the biases are purely based on region-specific indicators like language, address, TLD extension and hosting IP. Great suggestion!
again provde - > again proved
Great follow up post...
I really enjoyed the first one but found myself referencing the beginning of the article througout to remind myself on how to interpret the data accurately.
This time around...it felt a little more direct and easy to understand. I love hardcore analytics as much as the next guy (sigh) but I am one of those people more interested in answering the question "what features predict a result will rank higher vs lower on page 1"
Thanks Rand for this great comparative analysis. I was looking for some comparative analysis to check the influence of subdomains in rankings over Bing and Google. In my case, one of my subdomain ranks on front page of Google and yahoo for some of the most competitive keywords. But when it comes to Bing, it is not in 100. so does it mean, Bing more prefers root domains and do not give weightage to subdomains.
Also, what is your stand on single page websites? Some of my single page website rank on front page of Google on competitive keywords, but they are nowhere in Bing.
Great post Rand!
Very interesting some of the information here, I really enjoy seeing at statistics (I come from Engineering). And apart from the analysis between Bing and Google (that shows me that I have to pay more attention to Bing since now) I found very useful things about how web developers uses the sources from SEO. A good point, as you said, is the use of keywords in Titles and URLs.
A post to take in consideration for the next months at work :)
Wow, so many charts!
which Goole - > which Google
I still see from web-master tools and such that Bing is much slower on indexing and ranking the sites I am working with.
Although the positions in Google are not that bad, so it is a bit concerning, especially after this study to see that Bing should be actually more preferential to the sites I work with based on the data above. :-(
Fascinating research - I'm still trying to wrap my head around it all, but it's helpful to note the similarities between Google and Bing these days.
(PS typo: couts -> counts - who needs spell check when you have picky, detail-conscious readers like us :) ? )