As my previous posts and articles have tended to be too technical, going forward I will simply try to provide the pieces that are relevant for the discussion, and point to the sources for learning more.
After carefully reflecting on the article that proves Google is using behavioral data in the search rankings, I did some digging and came up with some slightly different conclusions.
First, let me state that I do think they use all the data they collect (or will collect) from search query logs, Google Analytics, Google Adsense, Google Toolbar, browser extensions, Doubleclick, FeedBurner, etc. to improve both their ranking algorithms and ads targeting technology. That is the reason, in my opinion, they offer all of these tools for free. The data they collect is far more valuable. It is so valuable that Ask is even considering selling this data.
WebGeek correctly quotes Google's official blog:
(Let me give you a little background. A few days earlier, Visio posted his reaction to something he read in Google's Official Blog proving that they use behavioral data in rankings:)
"Similarly, with logs, we can improve our search results: if we know that people are clicking on the #1 result we’re doing something right, and if they’re hitting next page or reformulating their query, we’re doing something wrong. The ability of a search company to continue to improve its services is essential, and represents a normal and expected use of such data."
I am really glad to start seeing efforts such as Visio's. I hope there will be a lot more to come. We should all encourage other SEOs to perform experiments, do research, and publish their findings for peers to review. These are great examples of what we can learn from the scientific community. That's why they have all the credibility they have.
Although I haven't been an active participant in the SEO community, I've been reading several blogs for years and I feel great respect for all of the experts. One thing I've always wanted to see, is more attempts to provide more facts (research papers, patents, etc.) and experiments to back up all of the claims. Search engines are black boxes, and giving advice based on opinions is inevitable and necessary. However, one of the most difficult things for people trying to learn SEO is all of the contradictory information they find online about the same topic. One suggestion I would like to propose is the creation of a site where we try to put together all of the SEO insights, but backed with sources (papers, patents, experiments, etc.). We can use an open source license for the content. The idea is to use this as a reference where we can link to and prove our points.
Now, let me explain my different conclusion about Visio's findings.
I think Google and other search engines use behavioral data for relevance feedback.What is relevance feedback? From Wikipedia:
Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query and to use information about whether or not those results are relevant to perform a new query. We can usefully distinguish between three types of feedback: explicit feedback, implicit feedback, and blind or "pseudo" feedback.
Relevance feedback is simply considering the input of the actual searchers to determine if the ranking formulas are producing the best results. There are different ways to collect this information, and based on this can be: explicit, implicit, and blind. These lecture notes provide a nice explanation of the process. Please read them for a more detailed look at the topic. It is very interesting.
Please note that relevance feedback is used to tweak the parameters of the ranking formulas, not as an additional factor in the equations.
Does Google use it?
From the paper describing Google's original search engine Future Work section:
...However, other features are just starting to be explored such as relevance feedback and clustering...
Google's well known use of quality raters for improving their search results is a clear confirmation that Google already uses relevance feedback on their systems. This type of relevance feedback is explicit feedback. Trusted searchers are presented with a different set of search results for the same query and they select those, that based on their judgment, include the most relevant results. It usually takes several iterations to get the results right.
Now, to the really interesting part. The implicit feedback attempts to infer user search intent by observing user behavior. This is carefully documented in the lecture notes.This is an excellent use for Google Analytics (and other properties) intelligence data.
Think about this. They have so much information on us and on our sites that they can get pretty close to what we are thinking.
Bounce rates, repeat visits, retention, etc. are the best indication of whether a search result was good or not.
Now, what is the difference between my conclusions and Visio's:
1. I don't think the implicit feedback information is being used as a factor in the ranking formula. I think they use aggregate information to tweak the equation variables. When too many queries are not giving the best results, then they may alter the ranking formula.
2. I don't think directly clicking on results will have a direct effect on the ranking formula for the non-personalized Google ranker. This would be very risky for them to do, because it will leave the door open for manipulation.
3. I don't think a few websites' behavior information will have a drastic impact on the rankings. Changes to the ranking formula affect many, many sites.
Mr. Singhal often doesn’t rush to fix everything he hears about, because each change can affect the rankings of many sites. “You can’t just react on the first complaint,” he says. “You let things simmer.”
I've been blogging for close to three weeks and I have to admit that I am really enjoying it. I did not expect this to be so addictive. I want to thank Rand and the SEOmoz team for giving me the great opportunity of sharing my thoughts via the comments and the Youmoz posts, the SEO community for referencing my posts, and for the excellent feedback I've been receiving. You guys rock!
Google is ultimately going to become a massive recommendation engine.
But I wonder if they are just making it harder for themselves to be leaders in 'fresh' content as well. When top search results are weighted on hundreds, if not thousands, of indicators, won't it be more difficult for newer content to gain ground?
New content will not have collected as much feedback, implicit or explicit, as older content, though it may be much more relevant to that day's search....
Hey, Bud.
I think in the terms of "fresh" content Google uses the "Query Deserves Freshness" model for rankings. I would assume they cross-reference their data (user search data, google trends, news) to pick up on what is currently the "hot" topic and they can then tweak the algo to deliver more timely relevant pieces, regardless of the lack of feedback to it. In an instance like this I would assume the relevance of the domain and it's historical performace would be a huge factor.
While I read about that indicator, query deserves freshness, I think it's still creating a barrier to fresh content. Think of it this way, in order to determine a 'hot' topic, google will have to rely on trusted domains, older, popular yes, but older domains -- I immediately start to think of examples of topics in the news.. they appear very quietly on blogs, bounce around days, if not weeks, and then suddenly make headlines on CNN...
the issue though isn't whether you won't be able to find fresh content on Google, I'm sure you will.. in time. But a search engine, or a platform we don't even regard yet as a search engine (like delicious or technorati was) that uses less historical data to determine rankings may surpass google in delivering truly hot and unheard of content.
Hamlet,
I'll second the kudos above! Another interesting and enjoying post.
Along with your actual article's subject, you've of course hit on one of the biggest challenges of the industry... the breadth and variance of knowledge. An even bigger challenge... while some of the information is out right wrong, much of it may be correct, even two conflicting points. Something I think we are all discovering more and more is that there isn't just one set of "rules," rather different applications of rules depending on industry, competition, supply of information, spamminess, etc.
What I especially liked about this post is that it reiterates that everything doesn't revolve around rankings. It is easy to look at every data point that the SEs are able to touch as some how having a plus or minus impact on rankings, but obviously data can be used in many ways for more things than just ranking, especially at the micro, this site versus all these other sites level.
identity,
As usual. Thanks for your excellent contributions to the discussion.
You are right in that two conflicting points could be right under the right circumstances. My point is that making it clear to the newbie where the insight is coming from, would avoid confusion, and help him or her tremendously.
Google confirmed something similar on the NYT's article, when they talked about QDF (Query Deserves Freshness). If they can use a model to identify topics that are hot, and use different criteria to rank those; they can definitely do the same by industry, competition, etc. Remember, they own Adwords and know what makes money (the spammers target).
This is huge that the search engines will be ranking websites whose user experience leads to them staying on that website. Furthermore, if you use Google analytics they could grab the length of stay, amount of pages visited, & a lot more user information that could determine a "successful" user visit for the search.
It's more than obvious Google didn't offer all those tools for free. I mean, they are free to use, but Google is basically gathering lots of data from everyone thus gaining much more than compared to a case where they offered webmasters central, analytics, etc for a monthly subscription or any other pecuniary system.
They said it's free and now have so much user power in their hands that they're surely going to use it for the online search system.
I believe they already do
I believe that they use it in highly soptisticated ways. One reason that a "clicking the SERPs" experiment can fail is that the number of clicks are not of concern. Instead, WHO clicks thorough and how long they stay is much more telling. With personalized data Google can probably figure out that I don't know a damn thing about knitting, so me clicking into a knitting site and backbuttoning out might be treated very differently from a person who visits them a lot - and who maybe has claimed a popular one in google webmaster.
Google can limit their processing load by identifying what a person knows about and then using them as an invisible rater for a limited number of themes.
See how complex it can be?... overlap the influence of on-page, links, visitor data and you see how fast the number of evidences mounts. I think that it has reached a point where trying to figure it out is pointless. Just spend that time build a great site. People who pay no attention to Google have some of the best sites.
EGOL,
It makes perfect sense what you are saying. I was checking my reading trends in Google reader and guess what? I bet Google had those trends for a while, long before they made them available to us. They know far more than we think they do.
Things are getting so complex, that I have to agree that algorithm chasing is a losing proposition. It is definitely easier to focus on creating great content and buzz.
The benefit of learning more about search engines, IMHO, is to better prepare ourselves for the challenges ahead. Personalized search seems to be one of the biggest ones.
EGOL: I do agree. The visitor is ultimately the king, another king near to content king. If your site gets 10 unique visitors a day and all those browse for 1 hour and all 10 visitors bookmark the site, it's clearly an incentive for Google to lift up that site because Google will interpret this as a 100% trust and valuable information.
EGOL: I just read your comment. Very true. It make alot sense that some users would have more authority. Like you say, I believe if you make a superb site you will do ok because Google is looking for relevance.
I blogged about Google Analytics a few months back. I agree entirely with the sentiment that Google acquires services like Urchin and Feedburner for one reason only, more control.
It would be foolish to think that data collected doesn't influence rankings already in some shape or form.
I conducted a little experiment a while back with two brand new 'long tail' domains that i knew i could rank for easily. I attached Analytics to one and not other other.
Both domains had the keyword in the url and were optimized and coded in the same way. The two domain names had 1 letter appended to the keyword in the same position each time, to try and make the experiment as fair as possible.
I made no attempt to publicise either site and built no links.
The results were interesting, the site WITHOUT analytics installed ranked number 1 for the keyword, the site WITH analytics ranked in position 8. Could Google be seeing NO traffic and ranking down accordingly? Does analytics harm your chances of ranking a new domain highly?
To take things a step further, i then installed analytics on the other site and left them both for a week to 'bed in'.
I took the position 8 site and started to artficially increase traffic via search engine clicks, through proxies (to change my IP address) and so forth.
Within the next 2 weeks the two sites had swapped leadership on Google.
Does this mean that Google saw the traffic data and boosted it's rankings, whilst seeing the previously number 1 ranked site had no traffic, subsequently ranking it down?
While this is by no means conclusive evidence and please don't take it as such, it was a useful experiment. It swayed me towards thinking that Google do indeed use their traffic data in the algo.
In case anyone is interested in digging into the academic foundation of relevance feedback check out this survey: https://www.dcs.qmul.ac.uk/~mounia/CV/Papers/ker_ruthven_lalmas.pdf
If you want to know more about the long and glorious history of Information Retrieval: https://www.cs.cornell.edu/courses/cs630/2006sp/ (scroll down to 2/28/06)
When I was studying all this stuff in school (thanks Prof. Lee) it was all pretty straight forward. Which is to say it's really complicated, but with a long history in research. So Google, et al. is almost certainly using some of this stuff. Buried in those research papers are empirical studies which suggest as much.
A great article. I've seen many post on this topic and appreciate the details you have listed here. Nice explanation..
This Post gives me lot of information:)
Thank you very much for this information.
Interesting post Hamlet and thought out. Glad to see people are taking my research and coming up with their own comclusions.
However because of the research I did in the experiments(and continue to do. I have been actively testing it for a while now) I know for a fact that the data is used as a factor.
I respect your opinions and theories but they are all "I think..." situations. If you believe this please do a experiment to prove it, I would be very happy to join with a few seo to do a much larger scale experiment on this.
I realize some of you don't see the experiment I performed and the research I continue to do as fact that this factor exists. That could be that you think the experiment was too small or maybe I didn't do the best at explaining it. However again any attempts at a larger scale testing will be fully supported by me.
Also there is data/information I can't give out however I can say does prove the existence of this factor.
One last thing. I took a clients site which had a very very low bounce rate and which did not use Google Analytics. Upon initiating its use there was a large increase in Google traffic seen. I have even tried the other approache, taking sites with high bounce rates which have never used google analytics and then adding it, the result everytime was a large decrease in Google traffic. So try it out, if you don't use Google Analytics try and find out your bounce rate by using another analytics tool and if it is low(preferrably under 50 but more like 30 and lower would see a big change) then you can add analytics and tell me if you don't see an increase in traffic.
I have also analyzed sites who have lost traffic and a good many cases turned out to have high bounce rates, not all of them but quite a few. Upon fixing this(thats hard) the rankings slowly returned(traffic returned)
Again I respect your ideas but please do your own testing based on this and I would love to see what happens.
Visio,
Thanks for your comment.
Please note that I am not questioning the validity of your experiments. In fact I am confirming your findings. The main difference is my interpretation of what happens inside the Google black box.
Input : Page, Links, Usage Data
Output: Improved results
Your conclusion is that the Usage Data is part of the equation. My conclusion is that the Usage Data is used to tweak the equation variables (Relevance Feedback).
The results for both conclusions are the same: Improved Results.
Now, the relevant information retrieval research I provide, favors my conclusions. There are obvious reasons for search engines not to use this data directly in their formulas.
If the factor was part of the ranking formula, the effect would be immediate. This is the case for the personalized ranker.
No I understand what your saying. In a way the idea seems logical because from what we have been able to gather(inside scoop ;-P :P) is that this data was supposed to be used to help news topics get to the top of Google faster so Google has the news as soon as it happens.
Maybe Matt could share a little more of his intuition with us and then we could get a better picture :P
Visio,
I am glad we can agree on something.
Thumbs up to you!
Here's the problem... we might be able to do experiments to prove the impact of on-page optimization or links because everyone knows that those elements play a role in the rankings. However, when Google uses this type of "Invisible Criteria", coming up with a test to"prove" it become almost impossible - because the experiment can not be controled. The one we discuss here was done in plain view of many people (including Google) who can act independently and with great variation. So, we have little to no idea of what is really taking place. It might look like clicking the SERPs does something but that could be totally wrong - some entirely different factor(s) can be at play.
I think that there is a lot of sniggering going on in the Plex these days. They know that they now have invisible criteria - and they have probably been building and testing it for a while. Read their pattents or Bill Slawski's summaries of them. Step by step they are taking away the opportunity of manipulation.
Plus, now you will need the combination of traditional SEO plus user embracement to take and hold top rank for an important term. A million backlinks is not enough anymore.
As always EGOL, you bring a wonderful viewpoint into any discussion, crafting intelligent responses that acknowledge and give respect to all parties on both sides of the fence.
The only thing that could make it better is seeing more of you here ;)
EGOL,
I agree it is almost impossible to do controlled experiments when there are so may variables at play that we don't know about. However part of this "invisible criteria" is clearly documented in their original research paper (Future Work section), and their well known use of quality raters (explicit relevance feedback).
It would be natural to assume they use the intelligence data they collect from their web properties for implicit relevance feedback.
I agree... so we now have...
... and who knows what other data (adsense anyone?) that could be blended into this forumla to rank a site. The variables are now so numerous and squishy that it is almost impossible to pinpoint exactly why a site ranks where it does.
The part that I enjoy is that it is now becoming easier for a good site to climb the SERPs without an enormous load of links and harder for a crap site to hang onto high SERPs based only on optimization and a big load of links.
This is the age for usability people and analytics experts to make some of the money that previously would have gone into building junk links.
I think it is very important that we realize there is an ongoing shift in the skills necessary to do successful SEO work. From 80% technical and 20% marketing to 80% marketing and 20% technical.
Due to my technical background, I find myself lately reading a lot of books, blogs, etc. trying to play catch up with things such as website usability, effective copywriting, website stickiness, viral marketing, etc.
I think the next battle in the search engine wars will be won by those that provide the visitors what they are searching for, keep them on the site the longest, and get them to come back.
Great post Hamlet. I really enjoyed it and thought deeply about your words. Thank you.
I especially like your rationale of Google being selective about using personalized data for ranking.
This would be an important way to validate the informatioin.
In addition they are probably confirming the information. For example, are rapid link acquisition being backed up by user behavior?
When validated information confirms then they know they have ranked properly.
Thanks, EGOL.
I agree with your remarks. I personally think that as search engines become increasingly sophisticated, we will need to shift most of our attention to the end user. Getting the visitors to the page will not be enough.
"One suggestion I would like to propose is the creation of a site where we try to put together all of the SEO insights, but backed with sources (papers, patents, experiments, etc.). We can use an open source license for the content. The idea is to use this as a reference where we can link to and prove our points."
I'd also love to see something similar. I always find "lab tests" interesting (there's a certain "dark team" that publish their findings... well, once or twice), and I found them most interesting readings.
I wonder if anything in this field can truly ever be "backed up" 100% though.
Burgo,
I would like to hear what others think about this. Do you guys think it is possible/worth it?
read that article too
Future Of Google Search
IMO - the danger with utilizing user data to manipulate search results was articulated perfectly in Rands interview with Michael Gray. He used himself as an example which pointed out that he performs 100's of searches a day, and often clicks on urls' for 'off' reasons. It was a great interview which made sense of where the pitfalls of personalized search lie.
https://www.seomoz.org/blog/the-smx-diaries-the-michael-gray-interview
One issue I haven't seen brought up relative to the idea of utilizing user data to manipulate search results: antitrust, or something that smells a lot like it.
Particularly if they're using feedback via data used in Google Analytics, there's a possibility of viewing that as 'rewarding' a user of their service if the data ends up improving the site's performance through the search user universe. This would be considered anti-competitive on a number of levels relative to other analytics providers.
It'd be a hard, messy case that only a few providers would have the ability to take on, and it'd be nasty to prove in any way, given that the algo is proprietary/trade secret/etc.
Great post, Hamlet.
I agree 100% that Google is using user data to manipulate its rankings - it just makes sense. Google is in the business of giving the user what (s)he is looking for in as few clicks as possible. If you came to Google to run a search and have to click through 15 search results would you be likely to come back? Now if that same query gave you an answer at the number 1 spot that would make for a much better user experience. It only makes sense that if they are so focused on the user experience from their end that they would be just as interested in the user experience at the site they send their visitor to.
They are also always finding ways to exceed user expectations, which in turn brings repeat visits. For example, do a search for a UPS tracking number. The first result in Google will take you right to your tracking info instead of you having to go to UPS.com, select your country, enter your tracking number, check the "I agree" box and hit submit. They just saved you a few page loads and clicks - I bet you'd use it again.
This is what has caused Google to go from a noun to a verb in record time.
Now the really clever SEO will discover how to manipulate the user data to force Google to give them the rankings they want :D
I bet the battle will never end :-)
Nope...not as long as we find ways new ways to fight! :)
Good post... well thought out and presented... gave it a thumbs up.
Unfortunately, it will most likely get overlooked by most readers due to its title/headline.
Look at the article you're a referencing, "Proof Google is Using Behavioral Data in Rankings". You know that headline will bring in some clicks. It was moved to the blog of SEOmoz from the Youmoz section (even with its flawed testing and logic). The mozzers aren't stupid... they know this type of headline and article will stir up some controversy and bring in some links.
I'm no expert copywriter... far from it. I just hate to see a good post sit on the sidelines because of a bad headline.
Kurt,
Thanks for your comment and thumbs up :-)
I agree with your observation. I was so focused on the content of the post that I completely forgot about the title. My bad.
I requested the mozzers a title change. Hopefully they will accept it.
I'm going one better and promoting this to the main blog. This is excellent insight, Hamlet and certainly worthy of reaching the entire audience :)
Rand,
Thank you! It is an honor for me the fact you are validating my insights. You are a leader I look up to, and feel a great deal of respect.
Can you remove my first comment? it's no longer relevant since the title has changed.. and I can't edit it.
lol... someone gave me a thumbs down on it.
Kurt,
Don't worry. I think I know who thumbed us down. I don't think it had anything to do with your comment about the title.
don't fret the quiet minority. I think plenty have gotten a lot out of this, regardless of whether they agree or disagree... sometimes the greatest thing we can do is push an idea out there that prompts discussion... many more nuggets of gold often come of that!
You'll get thumbs down no matter how good what you write is. Some people here just like thumbing down.
It's sad, but it'll happen in any community where there's voting. Some people want to be better, but rather than contributing, they'd rather spend time knocking everyone else down a peg.
You want to see some REAL thumbing down, look at the numbers for Rebecca or myself.
Sure the better the post the more chance of it being seen by someone who won't like it. I only give thumbs down to stupid comments and so far never to a post. But don't sweat it.
Thumbs down isn't neccessarily because folks here are mean - our thumbing system isn't designed to mean "I don't like this post/comment/person," it's designed to say "This does/doesn't provide value to me as a consumer of SEOmoz content."
Thus, we use it as a way to guage engagement and appreciation (or lack thereof) and help us know what types of content to continue producing.
Now, granted, that hasn't been entirely successful, but hopefully over time, more and more of the community will view it this way.
Diggers come here and do it for sport.
"Diggers come here and do it for sport." -- Yeah. Digg has kind of changed the way a lot of people (at least, those who use Digg) vote on items. They just make it into a popularity contest ...
This is one of the gems of YOUmoz. "Relevance feedback", bookmarked.
Great to see you contributing more within the community, I've found myself agreeing with many of your points in this and other articles.
I think they use aggregate information to tweak the equation variables. When too many queries are not giving the best results, then they may alter the ranking formula.
This is often done on a micro level when we see manual alterations to key term SERPs.
Keep blogging! :D
visser,
Thank you for your kind words. Comments like yours encourage me to keep on blogging!
fyi... I made the above comment when the post was still on Youmoz... a few days ago.
Hamlet has updated the headline/title since then.
Very interesting article and discussion. This is the type of info that gives us the chance to get into the right path of the basics of SEO.
So we're going to have to add to our traditional SEO strategies, user behavior: We need users to stay longer on the sites, we need users to navigate more pages on the sites, we need that users keep comming back to the sites. What would our strategy be?
SEO Practices,
I guess that is what we are taking home from all this.
The best and easiest way to make people stay on your site, search your site, and come back to your site, is to make your site informative or fun, so even if people don't turn into a "sale" they are happy enough to want to return again...and not left with a bent over the table with no vaseline feeling...So much more goes into proper SEO than just the traditional tactics like links, content, blah, blah, blah. At least if you are in it for the long haul that is...Google will always change to fight us because they are evil ;p, but if we have a solid foundation built, then all we have to do is slap on some new paint and weather the storm.
Unless you are a large scale company with a well known brand name, think about your consumer and what they want to see, and what will interests them enough to come back, more so then converting them into a dollar sign.
Corporations are so heartless anyways ya know...Power to the little Guy!!