I'm more than a little skeptical of mainstream media articles about the search engines. With so many terrible experiences - inaccuracy, bias, shallow information, agenda-based reporting - it's easy to see why. However, today I'm thrilled to see an article from Saul Hansel in the NY Times that's not only impeccably well-written, but informative to even those of in most deeply inside the search industry. The article - Google Keeps Tweaking Its Search Engine - is quite possibly the best mainstream media article about Google, or modern search technology, in the last 5 years.
There are several big takeaways for search marketers, so let's dive right in:
Mr. Singhal is the master of what Google calls its “ranking algorithm” — the formulas that decide which Web pages best answer each user’s question. It is a crucial part of Google’s inner sanctum, a department called “search quality” that the company treats like a state secret. Google rarely allows outsiders to visit the unit, and it has been cautious about allowing Mr. Singhal to speak with the news media about the magical, mathematical brew inside the millions of black boxes that power its search engine.
Google values Mr. Singhal and his team so highly for the most basic of competitive reasons. It believes that its ability to decrease the number of times it leaves searchers disappointed is crucial to fending off ever fiercer attacks from the likes of Yahoo and Microsoft and preserving the tidy advertising gold mine that search represents.
It's nice to hear that Google feels much the same way I do about search quality - in particular that the current competitive advantage is primarily about the relevance of results. We're also getting a peak at a Googler that we've never met before (at least, outside the 'plex). I'm guessing that poor Mr. Singhai is now receiving quite a few emails to every possible variation of his names @ google.com (poor guy).
Any of Google’s 10,000 employees can use its “Buganizer” system to report a search problem, and about 100 times a day they do — listing Mr. Singhal as the person responsible to squash them.
“Someone brings a query that is broken to Amit, and he treasures it and cherishes it and tries to figure out how to fix the algorithm,” says Matt Cutts, one of Mr. Singhal’s officemates and the head of Google’s efforts to fight Web spam, the term for advertising-filled pages that somehow keep maneuvering to the top of search listings.
Some complaints involve simple flaws that need to be fixed right away. Recently, a search for “French Revolution” returned too many sites about the recent French presidential election campaign — in which candidates opined on various policy revolutions — rather than the ouster of King Louis XVI. A search-engine tweak gave more weight to pages with phrases like “French Revolution” rather than pages that simply had both words.
The Google bug system reminds us that behind all the magic, human beings toil to ensure quality, compare individual results and make tweaks based upon the best aggregate changes. The short paragraph about the French Revolution, if accurate, gives some insight into the fact that the algorithm is not uniform - not even close. Individual queries get individual attention - so next time you're stumped because Google's formula for some new term you're optmizing doesn't match up against your experiences from the past, you may simply be dealing with a different set of criteria.
But Mr. Singhal often doesn’t rush to fix everything he hears about, because each change can affect the rankings of many sites. “You can’t just react on the first complaint,” he says. “You let things simmer.”
So he monitors complaints on his white board, prioritizing them if they keep coming back. For much of the second half of last year, one of the recurring items was “freshness.”
Freshness, which describes how many recently created or changed pages are included in a search result, is at the center of a constant debate in search: Is it better to provide new information or to display pages that have stood the test of time and are more likely to be of higher quality? Until now, Google has preferred pages old enough to attract others to link to them.
But last year, Mr. Singhal started to worry that Google’s balance was off. When the company introduced its new stock quotation service, a search for “Google Finance” couldn’t find it. After monitoring similar problems, he assembled a team of three engineers to figure out what to do about them.
Hmmmm... Google not showing fresh results, eh? Sounds mighty familiar, no? We at SEOmoz, and most of the rest of the informed SEO world had been talking about this for the last few years; in particular since March of 2004 when the infamous "sandbox" first reared its ugly head. It's nice to get confirmation and feel the vindication of this transparency, but there's also a lesson to be learned - Google isn't perfect and they often look inward. The note that this problem wasn't addressed until the query "Google Finance" didnt' show "Google Finance" is strong evidence that Google is like many other companies. Things don't get fixed unless the folks internally feel the pain of the problem. Thus, next time you want to fight with the Google engineers about what you feel is inequitable treatment in the SERPs, the best way to do it might be to illustrate how the problem affects Google products.
Mr. Singhal introduced the freshness problem, explaining that simply changing formulas to display more new pages results in lower-quality searches much of the time. He then unveiled his team’s solution: a mathematical model that tries to determine when users want new information and when they don’t. (And yes, like all Google initiatives, it had a name: QDF, for “query deserves freshness.”)...
...“What do you take us for, slackers?” Mr. Singhal responded with a rebellious smile.
THE QDF solution revolves around determining whether a topic is “hot.” If news sites or blog posts are actively writing about a topic, the model figures that it is one for which users are more likely to want current information. The model also examines Google’s own stream of billions of search queries, which Mr. Singhal believes is an even better monitor of global enthusiasm about a particular subject.
As an example, he points out what happens when cities suffer power failures. “When there is a blackout in New York, the first articles appear in 15 minutes; we get queries in two seconds,” he says.
Mr. Singhal says he tested QDF for a simple application: deciding whether to include a few news headlines among regular results when people do searches for topics with high QDF scores. Although Google already has a different system for including headlines on some search pages, QDF offered more sophisticated results, putting the headlines at the top of the page for some queries, and putting them in the middle or at the bottom for others.
In the SEO world, we're all familiar with the new onebox results that pop up with news results, and now we've got a bit of backstory on it. I also suspect that although it wasn't mentioned in the article, there may have been some tweaking to the organic listings to help support more freshness in the results themselves. Google's still favoring a lot of old results, but of the thousand or so queries we monitor internally and for clients, there's at least some indications that a freshness boost exists.
Another big takeaway here is the thought process about how temporal data and query analysis happens at the 'plex. The level of awareness of satisfaction with results is certainly impressive, and so is the exceptionally fast timeline for fixes (at least, some fixes - in SEO, we've got our own examples of tortoise-speed implementation). What the article says, though, is that Google can determine, by examining blog posts and news articles, what topics and queries might be getting "hot' and return more "fresh" results for those queries. This fits in precisely with how smart SEOs advise on "escaping" from the sandbox - get lots of link love and lots of people talking about you, i.e. become newsworthy.
As Google compiles its index, it calculates a number it calls PageRank for each page it finds...
...Mr. Singhal has developed a far more elaborate system for ranking pages, which involves more than 200 types of information, or what Google calls “signals.” PageRank is but one signal. Some signals are on Web pages — like words, links, images and so on. Some are drawn from the history of how pages have changed over time. Some signals are data patterns uncovered in the trillions of searches that Google has handled over the years...
...Increasingly, Google is using signals that come from its history of what individual users have searched for in the past, in order to offer results that reflect each person’s interests. For example, a search for “dolphins” will return different results for a user who is a Miami football fan than for a user who is a marine biologist. This works only for users who sign into one of Google’s services, like Gmail...
...Once Google corrals its myriad signals, it feeds them into formulas it calls classifiers that try to infer useful information about the type of search, in order to send the user to the most helpful pages. Classifiers can tell, for example, whether someone is searching for a product to buy, or for information about a place, a company or a person. Google recently developed a new classifier to identify names of people who aren’t famous. Another identifies brand names...
...These signals and classifiers calculate several key measures of a page’s relevance, including one it calls “topicality” — a measure of how the topic of a page relates to the broad category of the user’s query. A page about President Bush’s speech about Darfur last week at the White House, for example, would rank high in topicality for “Darfur,” less so for “George Bush” and even less for “White House.” Google combines all these measures into a final relevancy score.
The sites with the 10 highest scores win the coveted spots on the first search page, unless a final check shows that there is not enough “diversity” in the results. “If you have a lot of different perspectives on one page, often that is more helpful than if the page is dominated by one perspective,” Mr. Cutts says. “If someone types a product, for example, maybe you want a blog review of it, a manufacturer’s page, a place to buy it or a comparison shopping site.”
Wow... OK - 200 signals of quality (we've covered a lot of the big ones here), a classification system that attempts to determine query intent and an automated system to determine diversity. That's a lot of confirmation about what many have only theorized until now. I'm not going to go into detail about each of these - I invite you to do so in the comments - but, I'll certainly be writing about them sometime in the near future.
SMX starts tomorrow and between Lisa Barone, Andy Beal & SERoundtable, I think there's going to be a heap of coverage. I've asked the mozzers covering (Jane & Rebecca), to do their best to be as thorough and thoughtful as possible - they'll try to present you with as much signal as possible, and most of the "advanced" topics, rather than disgorge everything from every session. Meanwhile, I'll be on active duty, presenting, networking, listening and learning and do my best to bring back valuable information as well.
And yes, Rebecca's making comics...
Good post. To be fair, I think the "find a way to relate a problem in a way that directly affects your audience" works for anybody, not just Google.
You're right - we're all human! I think it can be easy to feel like Google is a huge cyborg (see Rand's robots) rather than being a collection of people trying to solve problems. With how close to omniscient Google can appear, it often feels hard to believe the robot doesn't already know about our pain - but of course things have to be prioritised within that - there are always many things clamouring for attention.
Damn you, Matt Cutts, and your "sound reasoning"
Great find and makes a lot of sense.
You create your core set of rules, then you develop a number of tunings that can be tweaked based on current circumstances... all of this of course is a moving target. You have various teams work on the various rules. Once they are in place, do you really need to know the "entire" formula, or is it like plugging in modules... this one gives a plus 1 for this event, this takes away 2, etc.
True and it fits in nicely with the idea of many open source applications. I've never gone through all the code in WordPress to see everything under the hood, but I can still write and add plugins to improve the overall blog.
Firefox is another example. I have no idea what the code behind it looks like, but I could still write an extension to make it better.
Here's a random question: who are these people Googling a power outage? Shouldn't their power be . . . y'know . . . out?
(Nice meeting you yesterday! Thank Mystery Guest again for saving my life, please!)
Probably friends, relatives, neighbors and reporters (natch) in neighboring areas trying to get info.
Reading the article made we wonder for a moment whether Google is laying the groundwork for an ever increasingly difficult task. Increasingly difficult to the point of insurmountable in even a relatively short amount of time with their current approach. Hey, I don't know enough about what Mr Singhal and his team does, but here's an idea to generate some thought:
You start with a rule, a simple rule, a good rule that ranks pages. It does a good job and is intuitive to understand. Mortals (albeit really, really smart mortals) can think through the implications of tweaking a few parameters, the weights in the calculations. Any small change can be managed with ease because the system is simple.
Then you realise that one rule isn't enough. It's too crude, too blunt and all that's needed to fix some major short-comings is to introduce a second rule. Now the system is more complicated. Two, interacting rules must be considered with every change in parameter. It's still possible to understand how changes will affect the rankings of all the millions and millions of pages ranked, but it is more difficult.
As each successful adjustment and rule-change and rule addition and new algorithm and special case and "French Revolution" and "Apple versus apple" is embedded, the system becomes more complex, less intuitive. And the number of people who can understand and appreciate it dwindles rapidly. Every change requires more thought, more consideration. The risks are greater - the system is already so finely balanced, that it is that much easier to tilt it off balance. At some point, maybe the marginal cost of making any change comes too close to the marginal benefit from the change that changes are not possible, and the State of Search Stagnates.
Some of the other avenues they seem to be exploring, based on analysing information from individual users web-habits from things like Gmail or the google-toolbar may be less susceptible. This is a new source of information, rather than just increasingly complex tweakings and additions to a system. However, the insights obtained from these new data sources must still be integrated into the search results for you and me when we search "technical business advantage" and hope to find something useful.
From the article:
“People still think that Google is the gold standard of search,” Mr. Battelle says. “Their secret sauce is how these guys are doing it all in aggregate. There are 1,000 little tunings they do.”
Of course, they are working on something huge.
Here's what Larry Page said earlier (on TV):
"People always make the assumption that we're done with search. That's very far from the case. We're probably only 5 percent of the way there. We want to create the ultimate search engine that can understand anything ... some people could call that artificial intelligence."
"a lot of our systems already use learning techniques".
"The ultimate search engine would understand everything in the world. It would understand everything that you asked it and give you back the exact right thing instantly," saying, "You could ask 'what should I ask Larry?' and it would tell you."
The post and the video.
Sp, in the future, G should even be able to find your keys (obscenity warning).
Compared to that, the current affair of things shows us that Google, is, indeed way, way behind their goals.
Why can't G develop an AI, anyway? They have quite a handful of PhDs out there.
Thanks for the links Yuri. AI would be the ultimate search engine. It could still even be tied to personalized search as we each get our own AI to learn what we want.
I think we're still quite a ways off from seeing it, but it would make for one really godo search engine.
From what I have gathered, G already uses some artificial intelligence techniques. It was said either in one of the links in my prev. comment or somewhere else.
While there's a long way until something complete (as in finding exact things), there already are certain algorithms that work in that direction.
From this point of view, it is prety clear that there's a long way in search (only 5% gone, as mentioned previously - somewhere).
"Google's still favoring a lot of old results, but of the thousand or so queries we monitor internally and for clients, there's at least some indications that a freshness boost exists."
Yeh, I've noticed that one too. One of the joys of having *that* many SERPs that you watch on a daily basis.
I have seen how Google displays other options to the searcher , not with every result, but they must be working on it to give the searcher a most accurate result. For instance when searching for drugs, Google shows other options like:
Searches related to: drugsillegal drugs bad drugs types of drugs pictures of drugs marijuana drug abuse cocaine drug facts
That should be done for every search, maybe they will in the future.
I caught this article this morning via Digg and was impressed as well by the quality, relevancy and lack of fluff in it. It restored a little lost faith I had in Google. I am quite impressed on how timely they appear to react to results.
Perhaps that was part of the point. With some of the concerns of user data and privacy in search and just general public image, I have to imagine even Google needs to make a few deposits in the image department.
This paints a good picture to the public that Google is going to great lengths to serve them (the reference to the Google ad model was very brief) and puts a human face on the algo.
I've noticed, for the keywords in my little corner of e-commerce, that Google definitely favors older content, even if it is less relevant or less helpful than new content. It's very frustrating, to say the least. Can I hope that "freshness", as a concept, makes its way through all of Google's search results, and not just some?
Yahoo! is more than willing to bump up more relevant results over older ones; I believe that this is attractive to people and pulls users away from Google.
I read the article too. This is intersteting material. Thanks for SEOmoz insight as well on the article
This was a great find. It gives a good perspective and is well written. Some good and interesting comments to read as well.
I think recently changes in the Google algo is less drastic and new ideas are integrated systematically and much slower than in the past. Perhaps it makes sense too as you do not make major changes to a recipe that works.
Which is more amazing? That a main stream article did a good job writing about something search related or that Google opened up the plex and gave everyone a look inside?
Great coverage Rand. I read the NY TImes every day and yet somehow I still managed to miss this article until a friend pointed it out to me on Monday.
There's quite a lot of good in the article that it's hard to say what part was most infomative. I particularly liked getting a glimpse of the process as a whole as well as the leaning toward understanding query intent to better present results.
Saul Hansel from the New York Times and Kevin Delaney from the Wall Street Journal have both written extensively on the search industry with some great reporting - definately worth a read when you see one of their names on the byline.
This article was particularly good.
I do kinda wonder how the story came about - i.e. did the NYT pitch it or did Google approach them?
BTW, the NYT owns about.com, which presumably has search as a major, major traffic generator.
It's good to see that the GooglePlex is full of humans and not just wirring computers. The glimpses are great and really useful. But the fact that the algo is tweaked daily means that it still doesn't help to see into the algo and crack it.
Just keep building good content with plenty of link juice.
Nice find Rand! How did you stumble upon that gem?
Glad to see we are nicely on track when it comes to industry assumptions about Google search, this article conifrmed some long standing myths.
Pity Singhal couldn't have listed some other examples of signals! ;)
I don't blame you for being skeptical of mainstream media when it comes to search stories (or any story for that matter). I think they've proven time and time again that their motivations are ratings - not actually getting the facts right.
That being said, it was a well written and informative piece......kinda weird, huh?
Great tip on relating problematic queries to Google products!
In regards to "query intent", do you think it would be considered cloaking to show visitors hitting your landing page a relevant banner or offer based on the referring keyword that would otherwise not be displayed on a direct load? Or would it be a better strategy to show a default banner that could be changed based on the referring keyword so that a direct load view also sees a banner?
Hi Kwyjibo,
I'm not sure what the 'right' answer is to your question but in the past I've always erred on the side of caution and gone with showing a banner to everyone and a different banner depending on keyword used.
If you start serving content which simply isn't there to direct-traffic then it looks a little like cloaking to me.
The last point is that if you know they are direct traffic then you can still display targeted ads (just not by keyword). No sense to miss out on advertising simply becuase they came straight there. You still have a whole heap of information on the user you can use.
Tom is probably right, but I'm thinking as long as you showed the same ad to a search engine as you would to a real person there wouldn't be an issue. Of course a spider likely isn't following keywords into your site so they probably don't see the ad at all.
I would think it's fine, but it might be better to play it safe where potential cloaking is concerned.
The article gives a sneak preview of what goes into the Google journey of search. There are lots of milestones to go till Google finds its way into the mind of the searcher. Yet, it seems like Google will break new barriers as a search engine..
Great post Rand about an article I wouldn't otherwise I have found. I think the stuff about diversity in the results is one of the most interesting elements for me. It's not something I have yet considered in my work - that you might also need to significantly different from the other top results in order to rank.
Having said that, it is interesting that as a search engine user they are spot on - I do often want to find the top few of each kind of result (e.g. reviews, vendors, manufacturers). Interesting stuff.
As you say, nice to get some confirmation on certain aspects of search development the community have discussed for some time. The QDF score sounds interesting and I'll be keeping my eyes peeled for any more info on this in the future.
Thanks Rand.
Looking forward to Rebecca's comic too as I can;t make the SMX :0(
Nice post Rand, glad you found that article it was fascinating. I hope we hear more from Mr Singhal in the future.
It was good to get confirmation of how to get 'fresh' sites and posts into the SERPs.
I found the whole tweaking of the algorythms for individual keywords and also individual SERPs for people logged in fascinating too. I'm yet to see any significant changes in my personal SERPs even though by now google must know everything about me.
[p.s You have spelled Singhal wrong in your 2nd paragraph...]
Very interesting read Rand....Nice to see Google (Amit) taking ownership of problems such as broken queries, when searching for "French Revolution"...., it would be good to see other occurrences of this happening, maybe a top 10 list of broken Google queries....
:-)
Wow, thanks for that article. You're right... it's amazingly well-written and informative. Having been involved with search for a few years now, one thing I've learned is that it is one of the most evolving marketing mediums. However, I never knew that Googles "search-quality team makes about a half-dozen major and minor changes a week". Thanks again! Very well worth the read, and I'll probably make a post about the article on my internet marketing blog citing seomoz too.
- Casey Removed Link
Hey Rand, I also was quite impressed with the article and wrote about it on my blog this morning, as well... There's a nice little bonus towards the end. :)
- Casey Removed Link