This weighting system for more ambiguous or multi-intent queries can't be proven, but it doesn't seem unlikely to me. Let me walk through a quick example to help illustrate the concept. Let's say you're performing a search for "GDP."
Why, Googlebot, what a fascinating idea. What kind of results might this produce?
So - because a lot of searchers express a preference for more diverse results than just those pages that ordinarily would "make the cut," Google provides an extra helping hand to pages they feel help to satisfy those searchers. This data could be gleaned from lower CTRs in the SERPs, greater numbers of query refinements, and even a high percentage of related searches performed subsequently.
Of course, when Google wants to get really serious about disambiguation, they go a different route. Check out these SERPs for the term "application:"
These "horizontal line," disambiguation style results appear on many searches where Google thinks that the searcher is probably seeking something that their query isn't producing. They're especially likely to appear for very general search phrases.
My personal experience has been that a more subtle, "Query Deserves Diversity" (QDD) algorithm does exist, at least to some degree. It makes a lot of logical and searcher satisfaction sense in results like:
- Company names (where folks might want to get positive and negative press, as well as official company domains)
- Product searches (where e-commerce style results might ordinarily fill up the SERPs, but Google tries to provide some reviews and non-commercial, relevant content)
- News & political searches (where it might be prudent to display "all sides" of an issue, rather than just the left/right-wing blogs that did the best job baiting links)
p.s. Just out of curioosity, does anyone know why, on that first search result I showed for "GDP," the second result from BEA.gov isn't indented?
Rand,
After working heavily on online reputation management campaigns for the past year, I personally believe in the existence of the QDD (I call it the "diversity filter"). I see pages rank for company and personal names that have no link strength, no authority, no obvious signals of relevance or quality, no "nothing" except that they appear starkly different in tone or intent than the result of the content on the page.
I am completely convinced that this exists - having seen "negative" but non-deserving search results time and time again, with no other explanation for it.
I won't claim to know if it exists, but if it does I would think that G would limit the human interaction as much as possible. I'd imagine it working something like this:
User: [cars]
Google: What on earth are you looking for? Okay, I know millions of people have searched for this before, and n% of people have immediately followed up a search for [cars] with a search for [x] and [y]. There's an n% probability that you are looking for [x] or [y], so I'll mix up the results a little with these things.
User: Thanks G, I was looking for information about the movie Cars.
As far as taking advantage of this...I guess if it exists and G is diversifying queries, I'd imagine they'd try and diversify with other quality results. So, just keep doing what you're doing.
It's interesting to speculate on these things, but in the end I think that's all we can really achieve - speculation. We can find/cite examples that support our theories, but running queries in Google isn't equivalent to a scientific test. This is an environment with factors well beyond our control and even our ability to verify their existence. There are a number of things going on that we simply cannot see or measure - and therefor can never truly understand.
With that in mind, and while I understand you don't claim to have proof of this "Query Deserves Diversity" algo, I find "personal experience" hinting to the existence of a "more subtle" algorithm to be little proof at all. I appreciate speculation of this sort for what it is, and it is part of the reason I'm a PRO member of SEOmoz and visit the blog religiously. I guess I'm just not terribly convinced or concerned with this based on the examples given and argument made.
ps Rand, it looks like that result for "gdp" is a lost page. I've seen this before - the title text in the SERP being all lower case is a first clue. Since the page isn't found and there is no title tag for the link to be based on this text is taken from anchor text in links pointing to this page (at least from what I've seen).
Looks like the BEA could use an in-house SEO - this page should have been 301'd.
The first link for BEA.gov is blocked in robots.txt, I think thats the reason that the title is in lower case.
This is the new way G treats pages in robots.txt
Hmm, you may be right, but where would it be pulling the "gross domestic product" text from? I think it's from anchors.
I'm guessing they added this line to the robots.txt after a site restructure of some kind.
Obviously a 301 redirect would have been better than this - I've seen pages that were blocked by robots.txt show up this way for more than six months. It also probably has something to do with the fact that their custom 404 is not recognized by Googlebot as such.
There is nothing "new" about the way a robots.txt blocked page is treated.
I assume that the reason it isn't indented has something to do with the redesign of the site as well. It would help to know when the site was redesigned as well as if the old GDP is the same as the new GDP page. Google might have duplicate issues figuring out which to rank and this threw off their 'indented' result.
However, I really think it may have been a fluke because I also see it as an indented result on different datacenters.
Since you ask Rand,
Yes, I have noted Subject Diversity and I have seen (or imagined) a 2nd type of diversity which I'll call Type Diversity. This is when you find different types of websites in the top ten. For example, information that explains the subject, a commercial site that sells subject services, and a school that teaches people to provide subject services.
I believe separation can be accomplished algorithmically. I imagine Google uses their word matching technology, what they use in the AdWords Keywords Tool which tries to identify unique word groups.
I use Google Adwords and this reminds me of the behavior I see when checking ad placement.
My theory on this behavior is based on probabilities. When a user types in a search there is a certain probability that any given search result will be precisely what they are looking for. As we go further down the page, the confidence in this probability tends to diminish to the point where more 'random' choices could be be exactly what a random searcher IS looking for.
If a value is assigned to a correct search result for a given user then the value over say one thousand SERPs for the same search phrase may be greater when including these lower results. Over time they could probably get pretty accurate and determine which percentage of a given search phrase is looking for certain non standard results.
David
QDD seems to be a strong factor in searches for physician names. For instance, our agency works with a father and a son with a similar name. When general searches are made for a physician's name, Google shows a variation of listings for both physicians. If that is the case, the big question is what are the best practices to separating search rankings for companies or physicians with nearly identical names?
I have no idea about the reason of lack of identation, but I have seen it many times now. Maybe a new algo on the way?
This really is either fun or a pain in the butt. Diversity algo meh.
Some little experiment we did back in November 2010, just in case someone is interested.
https://www.firstrate.co.nz/blog/tag/query-deserves-diversity/
However, I guess now QDD has become an accepted fact?
I'm seeing non-indented second results a lot - I particularly notice it on my own sites, so I know there is no reason for either result to be less valued than the other (ie its not 404ing)
I'd love to know why, too.
I've had senior Google engineers assure me that Google has a diversity component, precisely because I criticize Google for its lack of support for exploratory search (which isn't the same as diversity, but is related).
I believe they are at least trying for some kind of diversity. But of course it's hard to see clearly inside the black box.
I think they are shown in the order that the 'regular algorithm' would rank them and not due to any specific 'diversity' ranking tweak.
My opinion is that the only time this 'diversity' algo is being used is on the searches that show the horizontal line you mention.
Otherwise, any diversity is determined by the regular ranking algorithm (which looks at the freshness you mention).
For example, a search for the term 'fantasy' shows a diversity of results but I think they are shown in the order that the 'regular algorithm' would rank them and not due to any specific 'diversity' ranking tweak.
hmmm
I have seen that too. But how can you define the terms that tigger this QDD, if its present. Cause we have seen such behavious in mostly all the non-comercial terms. If the term you are targetting is a genral plus informational typo, than the chances are that you wull get these results.
As an additional sidenote on a possible stretch reason why the 2nd result is not indented for your result is that the first BEA.gov result is a defacto 404 page - tho not corrected tagged as such or having a timed redirect.
I'm not convinced Rand, with the first example anyway. I'd have to see many more examples in addition to your GDP search to spot a pattern.
The second example is much more obvious and I wonder if it takes the suggestion below the horizontal rule from previous bulk search volumes... for example, if the word APPLICATION is often teamed with the word JOB for a significant portion of searches it occurs in.
What's interesting to me is the level of human editing in the SERPs - I mean does someone actually assign alternative diversification to popular queries? Or does Google just take the 2 top search queries that contain the word and put them together?
What do you reckon?
David Lindop
I don't think they wouldn't have to use human intervention - unless you count the searchers.
When a searcher forgoes the first (and second, and third) page results, and then clicks on something that is farther down it would indicate that the higher results don't seem to be what they are looking for in that particular case.
So those deep results that are patterned as getting clicked on in such cases are likely to appeal to a more "diverse" search intent.
I'm sure there are other criteria that could apply as well, that's just the one that seems simplest to me.
I don't think it's human interaction. Google works to avoid this type of action at all costs, as it would lead to upkeep and tons of additional manpower.
Google, like any other web resource, wants to be useful/sticky, so it automatically triggers "diverse" related search terms, similar to their suggested keyword lists at the bottom of each page. And much like the keyword suggestions present on other search engines like Yahoo.
I think the diversity is part safety net, and part a way to keep you interested and satisfied to ensure you come back.
I think this analysis confuses things. If there is only one algo (no matter how long and complicated or subdivided), this one algo deals with all the queries.
"Last year we made over 450 improvements to the algorithm" said Udi Manber in a recent interview. "The algorithm".
Multi intent queries must be determined mostly by CTR on the SERPs. That's why Google insists in data mining (toolbar, GAnalytics, iGoogle...). Those statistics even show up on Webmaster Tools for webmaster to know which keywords gave the most clicks.
In your GDP example - that isn't a multi intent query, IMO, it's more double meaning. "Application" can be multi intent, and therefore Google tries to find out what you mean, but shows a best guess first.
In the future multi intent might be dead in the water, due to iGoogle-like pages, facebook search history, etc...
I agree - GDP probably wasn't the best example. I should have thought a little harder about some that better illustrated the issue.
The 2nd result is indented at the Google server Im pulling results from https://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLF,GGLF:2005-34,GGLF:en&q=gdp at 74.125.19.103
I have seen it with many queries in the past. Can’t think of any off the top of my head. I have not seen it with "short tail" queries though. Things like "cars" for instance.
Even looking at the word "cyclones" doesn’t give me any fresh blog posts or news stories ranking in the top ten in the organic SERPs where I would have expected to see some.
Sometimes when I search for stock quotes in Google by symbol, I get an initial set of results that do not auto-populate the stock quote. Then if I hit "search" again for press enter, it pops up.
I find that interesting...simply by pressing the search button consecutively you get different results. It has happened several times, but I can't remember what symbols trigger it.
Most symbols tend to just pull the quote right away, but some don't until you hit enter or search twice. I got UBS to toggle between showing the stock quote and not showing it by hitting enter.
Additionally, when searching for a broad topic, often if you press enter or search again "related searches" or image results will appear at the top of the SERPs.
So you can fine tune by hitting enter repeatedly.
I have never been called a conspiracy theorist before but what I write here may tag me with just such a title. My gut feel is that there is an algorithm that recognizes searches on personal names and peppers the results with some nasty-grams. On the assumption that the hypothesis is true, Google appears to often display hastily written and maliciously posted libel pages (sometimes with ten year old content) and pegs them to third or firth search ranking (give or take). Try as you may, Google will not move the negative listing from its place irrespective of the page's lack of credibility or GoogleJuice. Victims of this algorithm and the defamation it often propagates are left with an impermeable negative listing on the first search page.
Recently some very negative pages libeling some new clients have achieved high search results almost overnight, these observations fit the hypothesis. The pages in question are full of negative words that this hypothetical algorithm seems to devour (scumbag, scammer, rip off, ponzi, dishonest etc). Furthermore it appears that some domains dedicated to libel, gossip and extortion appear to have been designated by Google as a “diversity site” by default. I have made first-hand observations, and I am quite sure that it exists. Bear in mind that I am not an SEO practitioner although I do work with an excellent team. I am more involved with administration and customer relations. We have had a few successes against this hypothetical anniversary, we refining how methods further.