Wow. Today has been interesting - I woke up to the news that Bing copies Google search results and I've ended my day watching a live cast debate between Google, Bing and Blekko over on BigThink.
This post wraps up some of my thoughts and insights from the news and the discussion because I think there were lots of very interesting tidbits and hints from the search engines. For a more complete blow-by-blow account check out the live blogging coverage from SearchEngineLand.
Image credit used with permission
Although there was talk of other things in the discussion, the two main points of interest in my eyes were:
1) Bing "Cheating" by Copying Google's Results
Danny does a phenomenal job of explaining the issue over on SEL so I'm not going to re-hash the details but the conversation got pretty heated between Matt Cutts and Harry Shum from Bing. Matt is typically very calm on these kinds of panels and this was the most heated I've seen him for a while(the last time I saw him mad like this was at SMX Advanced 2008 when paid links were a hot topic).
Matt clearly pointed the finger at Bing and accused them of copying results. Harry's answer was a little elusive but essentially boiled down to "we both do it". While clearly both Google and Bing are using user data to influence rankings Matt did say (and I'm paraphrasing here but the sentiment is correct) "we categorically deny that Google uses clicks on Bings website to influence Google results".
The discussion then descended into a debate about HOW Google and Bing get their data - the most obvious data sources being the Bing toolbar and the Google toolbar. The discussion here become a little bit finger pointing with Matt accusing Bing of sneaking the toolbar onto user's PCs via IE, while Bing responded by essentially saying no one reads T&Cs anyway so what does it matter (a pretty weak argument!). I could write a whole post about this but let's stay on topic shall we?
The conversation boiled down to the fact that yes, Bing uses user data on Google as a ranking signal - but that these keywords were outliers and that Bing does not just copy results. An official blog post from Bing reiterates this position.
So where does this leave us? The thing that most excites me here is that people are starting to talk about how user data might affect rankings. This is something I've long suspected influences rankings but there's been real division within the industry. Rand even did a whiteboard friday a year ago essentially saying user data isn't much of a signal. One of Rand's arguments is that usage signals are easily gamed - but it's clear that Google are watching these things closely.
Personally, I really hope this starts more of a discussion and more transparency from the search engines about how usage data influences rankings.
TL:DR:
- Both Google and Bing use user data as a ranking signal
- Bing uses data about Google, Google doesn't use data from Bing
- Google (or maybe just Matt?) are pissed off about it.
2) Is Demand Media Spam?
The second question boiled down to, "should Google ban Demand Media from the index?". I'll paraphrase the responses here:
- Google - no, we look at page level and algorithmic updates to determine quality
- Bing - the only reason there is this spam is because of adsense (weak argument Bing!)
- Blekko - yes, we have.
Wait, what? Blekko really came into their own on this question - revealing lots of very interesting information. Blekko said that they have banned many content farms from their index as a result of enough people marking URLs from their domains as spam. TechCrunch broke the news this morning. The top 20 sites banned are:
ehow.com, experts-exchange.com, naymz.com, activehotels.com, robtex.com, encyclopedia.com, fixya.com, chacha.com, 123people.com, download3k.com, petitionspot.com, thefreedictionary.com, networkedblogs.com, buzzillions.com, shopwiki.com, wowxos.com, answerbag.com, allexperts.com, freewebs.com, and copygator.com.
Rich from Blekko went on to make the point that analysing massive data sets (I missed where from exactly? anyone know?) we can see that the total number of URLs getting visits from search engines is in the region of half a million. Compared the the 100s of billions of URLs that search engines know about. Rich used this data to say that if someone's searching for health related content they should land on one of the top 50 health sites where the content is written by medical professionals. Later on in the discussion, Rich talks about how Blekko wants to bring a wikipedia-style level of control to web search by letting anyone create a slashtag of niche sites (an example he gave included "gluten free").
Matt countered this by saying that if you search for decormyeyes (the glasses merchant that got a lot of press a few weeks ago) on blekko you don't get the website, suggesting that this is a negative user experience. Compare the following:
- https://blekko.com/ws/decormyeyes (decormyeyes.com nowhere to be found)
- https://www.google.com/search?q=decormyeyes (decormyeyes.com #1 - but likely banned from other queries)
- https://www.bing.com/search?q=decormyeyes (decormyeyes.com #1 - but likely banned from other queries)
I think that specific example is kind of moot. We're talking about a niche query for a banned domain. More interesting in my eyes is the question of what do you do with demand media? I don't have the answer (otherwise I'd be rich!). I think Blekko's approach is interesting but ultimately will fall short since I don't believe that restricting queries to a certain subset of sites is the right approach - people want to be able to find forum postings, blog posts etc even about authoritative topics. Remember that user intent can vary wildly between two users, even for the same search query. I think Google's approach here is terms of providing a sampling of results for different intents (QDD - query deserves diversity).
In essence however there was nothing new from Google on the topic of content farms and Demand Media. The only news is that Google are developing a Chrome extension to allow you to block certain sites from your personal search results (and share that data with Google). This should be released soon, Matt has a working copy on his laptop apparently.
Bing, on the other hand, dropped in a fascinating comment. While talking about how you might go about determining algorithmically the level of experience of the author there was a suggestion that the authority of a piece of content might be tied to the author independently of the site. I don't think this is necessarily that new, after all the concept of citations in Google Scholar has been around for ages, but it got me thinking that especially with social data playing more of a role I wonder if we'll see personal brand authority being passed (somehow?!) to the piece of content they write. So for example if you all retweet this post, next time I write a blog post for Distilled perhaps that page will have slightly more authority than it would have otherwise. Could this be how we solve the problem of trusted sites rolling out millions of pages of low quality content?
Out of interest - this makes the humans.txt protocol a little more interesting....
TL;DR
- Blekko deals with Demand Media by banning them (not scaleable?)
- Google have developed a chrome add on that allows you to block sites from your own search results
- Bing blamed Google for causing spam with adsense (weaksauce argument.....)
- Bing hinted that perhaps author authority is a factor independently of domain authority
Wrapping Up
Well it's been a rollercoaster day. Personally I don't think this news is that revolutionary (good article by Matt McGee here about how it's not as big as we've been making out) but I do think we'll see a lot more public discussion of user data, how it's collected and how it influences rankings which is a good thing in my eyes.
In closing - I'd like to give Danny a massive pat on the back, I think the level of journalism in the original article was world class. Keep up the good work Danny.
Tom - first, off let me just say thanks for writing up a post so quickly and covering so much of the material. I hope lots of mozzers go thumb this up; it's more than deserved (especially since it's after hours over in London).
Second - some thoughts I had about the Google accusing Bing of using search data on Google:
All in all, an exciting day in search and a great post. Thumbs to Tom!
After all the privacy issues with Buzz and Street View (especially in Europe), I can't help but find it somewhat ironic that Google is complaining about THEIR data is being used by others :)
Ah! You have just make me laugh out loud (I so agree!)
Ok that is pretty darn funny. :)
I agree. Cheating is a pretty ridiculous slander to toss. Really, Google tracks people with a toolbar, on page behavior, and site track with Google Analytics. For any business it pays to know where your competitors are and what they are doing. Bing isn't spoofing Google they are pulling data from a website, in this case Google. And, that is a street that goes both ways Google scrapes data from a large number of sites.
Google is making a complaint that is analogous to Walmart complaining because K-mart carries the same off-brand mayonnaise. All sides get to see all of the public facing content of the competitors and use it to make decisions. On that note: Go Blekko for taking a stand on spam directories. Since Google isn't taking real action, instead using their PR team to wag the dog, I think we should put some of the attention back on Google's failure to fulfill on the promise to clean up their search results.
It was very interesting to watch. I felt slightly uncomfortable in parts due to it being so agressive!
On the Demand Media (eHow.com) spam content farms:
I hate to bang on about this but I hate spam. It is horrible. The quicker Google address it, the better.
And Tom...good work!
I'm not sure I get the whole eHow furore. They're generating unique content on a wide range of topics to try and drive visitors and then monetize those visitors via adsense. Yes the quality of of the content isn't always brilliant, but I don't really consider it spam.
If they're generating natural links into the domain (which presumably they are to rank well) then some people must be finding the content useful and Googles algorithm is rewarding them accordingly.
Unless I've missed an important part of the puzzle?
eHow are the epitome of all that is bad about content farms. They have their own algorithms that detect popular search queries, both short and long term, and then pay extremely low sums for freelance copywriters (mainly students and other non-qualified workers) to produce low quality and often incorrect 300 word articles. The sole aim of which is to generate income via advertising. Not good for either users or search engines.
See https://en.wikipedia.org/wiki/Content_farm
I think I have probably spoken far too much about my detest for eHow recently. I probably should stop soon…!
Before I was aware of SEO and content farming, I was one of these students writing for ridiculously low sums (something around 5 euro per 300 word post), only I made the mistake of putting a small amount of effort into the content. They are certainly not upfront about what the content is being used for and what it is driving.
Have you actually read an eHow article? It's not a 800 word essay on the topic - it's five points with barely a sentence in each point. Even a ten year old could come up with better quality articles than eHow
It's a bit rich Google complaining about someone using their data. I think Yelp and TripAdvisor would find that rather funny (https://searchengineland.com/review-sites-rancor-rises-with-prominence-of-google-place-pages-62980) Also, a bit rich Google complaining about Bing being sneeky in installing their toolbar in IE. Has Google always been clear about the data it collects through the toolbar? Is it clear about what it collects through Chrome? Is it clear what it collects through Google Analytics? Webmaster tools? Adsense installs?
Sorry but Matt Cutts getting self-righteous on this front drives me nuts.
I guess if Demand Media used MSN ads instead of AdSense, then it wouldn't be considered spam, would it?
The Blekko content farm blanket bans sound nice in theory, but don't work well in practice. Try out some simple "how to" type queries that long-tail content sites actually deliver on, as I have here.
There are many content farms Blekko failed to ban, but adding them in would only cause Blekko to serve up an even larger number of "no results found" pages.
I'm having a hard time understanding why Bing blaming Google for causing spam with adsense is a "weaksauce argument". If adsense didn't exist, neither would Demand Media. It's exactly why they haven't been banned outright. Google is sending them tons of organic traffic, which just so happens to make Google A LOT of money.
Aaron Wall wrote a great article on this: https://www.seobook.com/paid-content-new-paid-link-0
If AdSense didn't exist, then there would be a bunch of affiliate links.
It's not about AdSense, it's about the fact Google gives a big importance to domain authority so it's better to build 1 domain with all kinds of topics on it than put each topic on different domain.
Thanks Tom for the roundup and incisive thoughts - these live events often happen at the most unfriendly euro timezone hours ;)
For me the whole shift in emphasis towards the author authority element seems one of the more interesting that I hope will outlast the spat and dust between the search giants over cheatgate and may have resonance beyond SEOland.
Efforts like humanstxt.org bring attention to it - although I'm not sure it could evolve to an actual protocol - and it seems like the author authority is an area other services are actively considering. Look not just at PeerIndex and Klout for social scoring, they have just further consolidated their API into a new version of Seesmic. I'm sure Twitter will do some variant themselves of authority this year and further mainstreamise the entire influence economy. Even nascent Quora has rapidly evolved to openly debate the merits of some sort of People Rank.
As to how much weight the author authority is assigned by search in the future is largely down to Google and Facebook, which is why the former's success of their anticipated social effort is so keenly anticipated and why the latter's direction is so pivotal.
It's not surprising that Bing uses Google data at all. Pure and simple, from an information retrieval perspective, for a ranking algorithm to return truly relevant results, it has to have an incredible amount of data. The more data you have, the more relevant and efficient your ranking algorithm will be.
This is the perennial struggle that academia has in the world of search engines. No one has as much data as Google, and by deduction, no one can have a ranking algorithm as intuitive and complex as Google (for it's intended application).
For Bing to be able to compete at purely the ranking algorithm level, it has to be able to get access to more data. Obviously Bing could overtake the market with a change in user behaviour, like the social shift from Myspace to Facebook.
Bing and Google essentially make sure that they dont have a very diverse search experience (in terms of search results). Its an unsaid fact that they both use the same core ranking algorithms and do small tweaks to them.
On demand media and ehow, I think that content websites will eventually die down if they progress the way ehow is progressing at the moment.
ehow needs to be informative and should have very strtct sanity checks to keep the content clean and spam free.
Can't blame them for using this data. Probably saved them a hell of a lot of time. With G's aggressive, less than open tactics of collecting their own information (by almost any means) I'm sure no one has any real sympathy here.
Author Authority is a very interesting concept.
I'm curious to see if this develops...
I was rather looking forward to a write up by Rand on this subject the day I read the SEL article.
I'm very impressed with your write up though. Thank you for taking the time to cover this big event in search marketing.
Take away: Cheap 101 content produced in india for 1-$5 a article will no longer have value........if they acted on the discussion.
I would really enjoy this. It is reallly sad how for small clients the most efficient way to get links from article directories is to deliver absolutely terrible cheap content. We have done comparison tests of the link value passed from pain stakingly creating great content and submitting to ezines vs. 101 drivel and both worked just as well. The only difference was the time and cost. (about 10 to 1) minimum. Of course that changes when you refer to a publishing platform like SEOmoz. The takeaway here is if the sole purpose of the publishing is to obtain link value... the resource attributed to the article need only be strong enough to pass minimum editorial standards.
This does not address issues of tarnishing your brand by distributing terrible content. This is focus on obtain link value. I know someone will mention that great content will obtain more links but only in the right circumstances and on the right platform.
Thank you for the summary.
I will take an eye on Blekko!
"Bing copies Google Search Results"
Hahaha! This makes me think of what I heard from Dan Kennedy one time about Burger King's strategy for success. They ride along on all the research & due dilegence work McDonald's puts in to find PRIMO locations and then build a store 400-500 feet away. Seems like Bing's been studying the King!
Maybe Blekko's way is extremely different but I guess it's good that they decided for such move and only time will tell who chose the best option, if there is any "best option".
Anybody got any idea how Bing is accessing Google data, are they simulating a user by feeding queries into some bot they have written and scrapping the results, or are they accessing the data some other way?
I really appreciate the thorough recap and your comments, Tom. There is a lot here... and a lot that strikes me as things we'll be hearing more about in the future. I find the bit about using user reputation as a ranking signal to be really interesting, and I'm eager to see what (if anything) happens in the content farm space in the coming weeks and months. Google's timing of accusing Bing of copying them came off as petty and sophmoric. If they have a legitimate claim, that's fine - but this was an insultingly obvious attention grab in my opinion.
Thanks Tom, really appreciate the wrap up, nicely done.
I read the techcrunch article about Blekko's move, and what I really weren't able to figure out was, why did they ban buzillions? My knowledge about that site is very shallow, but until now I thought that buzillions is a great review site (even though I never really used it, I just tried to pull some ideas from it). So if somebody is into it, please lighten me up.
Great summary of the whole "Bing cheats" thing. I was too busy to read the SEL post, although I'm sure Danny did a fantastic job covering it.
This is really a good post. Google is complaining about their data is being copied. very apreciateable post.Lots of information.
Thank you, Tom, for the great summary.
Will we be getting an option to add Blekko to our keyword rankings :)
Excellent recap!
About the banned Demand Media from the index from Blekko I have ambvalent thoughts. Probably an advantage but farsighted more negative.
For a search engine to ignore user data would be madness. We might not like the invasion of privacy but from there perspective it is a goldmine. Seeing how users behave on a site is one of the best quality/relevance signals possible.
There is a good reason Google has released Chrome and Analytics!
I’m not surprised that Bing is looking closely at the results created by Google. Copying the best elements of your competitors is nothing new!
Wow... really dense post this one, Tom. I will need more time to digest all the infos and digest them.
But one thing - a sort of sensation - I felt when seeing Rich silent during the J'Accuse of Matt against Bing and the Bing self defence: two dinosaurs fighting while the small mammal almost hiding... let see if 2012 prophecy was about Search Engines :)
More later about Author factor, as at first it can have also negative consequences imho.
Based on most of the recent discussion, Demand Media seems to be the only name that is being bandied about. However in an example of how broadly distributed content farms have become, I just stumbled across a small scale ring of interconnected blogs that are ranking well on Google and Bing for a product category that is only moderately competitive. In conducting a reverse IP look up, I discovered that "There are 1,394 domains hosted on this IP address." Gee, do you think that should be a signal that there might be something going on that merits further investigation. Suspicious minds might wonder if Google is really terribly motivated to take down these content farms that were built to generate Adsense revenue.