Wow. Today has been interesting - I woke up to the news that Bing copies Google search results and I've ended my day watching a live cast debate between Google, Bing and Blekko over on BigThink.

This post wraps up some of my thoughts and insights from the news and the discussion because I think there were lots of very interesting tidbits and hints from the search engines. For a more complete blow-by-blow account check out the live blogging coverage from SearchEngineLand.

Image credit used with permission

Although there was talk of other things in the discussion, the two main points of interest in my eyes were:

1) Bing "Cheating" by Copying Google's Results

Danny does a phenomenal job of explaining the issue over on SEL so I'm not going to re-hash the details but the conversation got pretty heated between Matt Cutts and Harry Shum from Bing. Matt is typically very calm on these kinds of panels and this was the most heated I've seen him for a while(the last time I saw him mad like this was at SMX Advanced 2008 when paid links were a hot topic).

Matt clearly pointed the finger at Bing and accused them of copying results. Harry's answer was a little elusive but essentially boiled down to "we both do it". While clearly both Google and Bing are using user data to influence rankings Matt did say (and I'm paraphrasing here but the sentiment is correct) "we categorically deny that Google uses clicks on Bings website to influence Google results".

The discussion then descended into a debate about HOW Google and Bing get their data - the most obvious data sources being the Bing toolbar and the Google toolbar. The discussion here become a little bit finger pointing with Matt accusing Bing of sneaking the toolbar onto user's PCs via IE, while Bing responded by essentially saying no one reads T&Cs anyway so what does it matter (a pretty weak argument!). I could write a whole post about this but let's stay on topic shall we?

The conversation boiled down to the fact that yes, Bing uses user data on Google as a ranking signal - but that these keywords were outliers and that Bing does not just copy results. An official blog post from Bing reiterates this position.

So where does this leave us? The thing that most excites me here is that people are starting to talk about how user data might affect rankings. This is something I've long suspected influences rankings but there's been real division within the industry. Rand even did a whiteboard friday a year ago essentially saying user data isn't much of a signal. One of Rand's arguments is that usage signals are easily gamed - but it's clear that Google are watching these things closely.

Personally, I really hope this starts more of a discussion and more transparency from the search engines about how usage data influences rankings.

TL:DR:

  1. Both Google and Bing use user data as a ranking signal
  2. Bing uses data about Google, Google doesn't use data from Bing
  3. Google (or maybe just Matt?) are pissed off about it.

2) Is Demand Media Spam?

The second question boiled down to, "should Google ban Demand Media from the index?". I'll paraphrase the responses here:

  • Google - no, we look at page level and algorithmic updates to determine quality
  • Bing - the only reason there is this spam is because of adsense (weak argument Bing!)
  • Blekko - yes, we have.

Wait, what? Blekko really came into their own on this question - revealing lots of very interesting information. Blekko said that they have banned many content farms from their index as a result of enough people marking URLs from their domains as spam. TechCrunch broke the news this morning. The top 20 sites banned are:

ehow.com, experts-exchange.com, naymz.com, activehotels.com, robtex.com, encyclopedia.com, fixya.com, chacha.com, 123people.com, download3k.com, petitionspot.com, thefreedictionary.com, networkedblogs.com, buzzillions.com, shopwiki.com, wowxos.com, answerbag.com, allexperts.com, freewebs.com, and copygator.com.

Rich from Blekko went on to make the point that analysing massive data sets (I missed where from exactly? anyone know?) we can see that the total number of URLs getting visits from search engines is in the region of half a million. Compared the the 100s of billions of URLs that search engines know about. Rich used this data to say that if someone's searching for health related content they should land on one of the top 50 health sites where the content is written by medical professionals. Later on in the discussion, Rich talks about how Blekko wants to bring a wikipedia-style level of control to web search by letting anyone create a slashtag of niche sites (an example he gave included "gluten free").

Matt countered this by saying that if you search for decormyeyes (the glasses merchant that got a lot of press a few weeks ago) on blekko you don't get the website, suggesting that this is a negative user experience. Compare the following:

I think that specific example is kind of moot. We're talking about a niche query for a banned domain. More interesting in my eyes is the question of what do you do with demand media? I don't have the answer (otherwise I'd be rich!). I think Blekko's approach is interesting but ultimately will fall short since I don't believe that restricting queries to a certain subset of sites is the right approach - people want to be able to find forum postings, blog posts etc even about authoritative topics. Remember that user intent can vary wildly between two users, even for the same search query. I think Google's approach here is terms of providing a sampling of results for different intents (QDD - query deserves diversity).

In essence however there was nothing new from Google on the topic of content farms and Demand Media. The only news is that Google are developing a Chrome extension to allow you to block certain sites from your personal search results (and share that data with Google). This should be released soon, Matt has a working copy on his laptop apparently.

Bing, on the other hand, dropped in a fascinating comment. While talking about how you might go about determining algorithmically the level of experience of the author there was a suggestion that the authority of a piece of content might be tied to the author independently of the site. I don't think this is necessarily that new, after all the concept of citations in Google Scholar has been around for ages, but it got me thinking that especially with social data playing more of a role I wonder if we'll see personal brand authority being passed (somehow?!) to the piece of content they write. So for example if you all retweet this post, next time I write a blog post for Distilled perhaps that page will have slightly more authority than it would have otherwise. Could this be how we solve the problem of trusted sites rolling out millions of pages of low quality content?

Out of interest - this makes the humans.txt protocol a little more interesting....

TL;DR

  1. Blekko deals with Demand Media by banning them (not scaleable?)
  2. Google have developed a chrome add on that allows you to block sites from your own search results
  3. Bing blamed Google for causing spam with adsense (weaksauce argument.....)
  4. Bing hinted that perhaps author authority is a factor independently of domain authority

Wrapping Up

Well it's been a rollercoaster day. Personally I don't think this news is that revolutionary (good article by Matt McGee here about how it's not as big as we've been making out) but I do think we'll see a lot more public discussion of user data, how it's collected and how it influences rankings which is a good thing in my eyes.

In closing - I'd like to give Danny a massive pat on the back, I think the level of journalism in the original article was world class. Keep up the good work Danny.