Rand has talked before about the need to think like a search engineer when you are doing what we do. So a week or two ago, I came up with a theory about how I would look at things if I were a search engineer and set out to prove it.

I was wrong.

My theory went something like this:
If I were a search engineer, I would want an algorithm to determine my results. I would, however, validate these results with human input for at least the highest-volume search queries. For the very highest volume queries in the world, I would hope, by now, to have got it "right" - and that at least the first page of results output from my algorithm would be exactly what I wanted it to be.
With a corollary that:
If this isn't the case, I would strongly consider hand-editing the top results for these huge volume phrases while I worked on the algorithm in order that my search engine worked as well as possible in the meantime.
There is a big question over what the "right" answer should be for very high volume generic queries that I might come back to another time (for generic queries there can often be far more than 10 pages good enough to be on the first page, and choosing between them requires more knowledge about the searcher than you can possibly have). To be clear here - I'm not talking about which results should be top under the current algorithm, but rather which pages should be top when thinking from scratch like a search engineer.

To test my theory, I decided to look at the search results for poker-related terms. I think poker's been on my mind since my brother (who won an award this week - congratulations, bro) took me to a casino for my birthday so I could lose money...

I picked three phrases of varying search query volume:

  • poker
  • free online poker
  • rakeback (a term related to poker affiliates)
And then analysed the top set of results over a week-long period.

My null hypothesis was that the highest volume phrase ('poker') would be very static through the week. Either the results are actually hand-edited behind the scenes (in which case there is very little chance that they would be edited daily) or the engineers are happy with their algorithm (and, again, trying to think like a search engineer, for a generic search like this, what factors would cause you to change your mind from day to day about the top set of results?).

I have couched a lot of this in scientific language, but I'm not trying to claim my little test was perfect. There are a lot of factors that can spoil it, but taking care to minimise as many of these as I could, below are some charts that show what I found.

These are charts of rankings over time: the x-axis is time (from the 28th October to 6th November this year) and the y-axis is ranking at Google.com (gl=US). Each of the lines (or points in some cases) is a different page (generally different website - there were no examples of different pages off the same site swapping for each other in the results sampled).

I haven't labeled the points and lines because this isn't about whether I see the same results as you or whether they are still ranking (or even about tactics or underhanded techniques). I think the patterns are what is interesting:

poker ranking graph

Anyway, you can see how wrong I was.

The 'poker' search which I thought would have acted as though it were hand-edited over a short timescale like this (even if it isn't actually) in fact behaved differently to my prediction in two ways:

  1. The results changed almost entirely on each of the first three days. I still find this hard to believe. As a search engineer, what (in the absence of news, which wasn't in evidence during the course of this week) could cause you to want to change practically the whole set of results for such a high volume search phrase from day to day?
  2. Even with the same set of results in the latter part of the week, there were some pretty significant movements:
free online poker ranking graph

The 'free online poker' search behaved far more like I was expecting for a high volume search phrase. It shows evidence of being algorithmic (the pinpoint result that dropped in on the fourth day subsequently went into free fall and now ranks somewhere in the 60s). I think this shows that it got there via some kind of manipulation (I haven't looked into what, and for the purposes of this analysis, I don't think it matters - I don't think that it came in via a hand edit). Apart from that, the rankings are fairly stable with gradual changes and few surprises.

rakeback ranking graph

I like the pattern of the 'rakeback' search results, the serenity of the top three with chaos below. Obviously I wouldn't like it much if I were number 4, but that's a different story. Given the range of insights above, I'm not sure that this graph actually tells us all that much, but since I gathered the data, I thought I'd include it for completeness.

So what can we learn from this and feed back into my initial assumptions to correct them and see where we end up? I'd love to hear others' thoughts in the comments, but the things I have come up with are:

  • Methodology: I am the first to admit that this is not a scientific study (to understand what is really going on, I would need to see referral data for all the top results - not something I have access to for high volume searches).
  • Testing: even if I am correct and results are as they would be if hand-edited for top volume search queries, given the data-driven nature of Google, they would want to test variants to see if they satisfied their users more.
  • Spam: perhaps the algorithm is 'nearly right' but still susceptible to attacks such as the one we see in the single data point in the 'free online poker' results.
  • News: obviously, when a query deserves freshness (QDF) the results are going to be shaken up regularly. I don't think that is the case in any of these examples, as none of the results coming or going were particularly timely.
What do you think? I'd love to hear your thoughts in the comments.