While search engine representatives and light hatters (the whitest of the white hatters) say that having great, link-worthy content and links is enough to get high rankings, there are many sites with these traits that do not get listed for the words that matter (the ones that send serious traffic). If it were so easy and every page that deserved a high ranking had it, there would be no need for us -- SEOs. star wars kid

The reality is that search engines are far from perfect. They face significant challenges trying to decipher our intentions and find the pages that best match our queries.

Here are some of the reasons why search engines don't return 100% relevant results all the time:

1. Relevance is subjective. This is the biggest problem. You can do a search for 'coffee' in Canada and find Tim Horton's website as the most relevant. Makes sense, as that’s the most popular coffee chain in Canada, but for somebody in Seattle, Starbucks might be the most relevant result. You can do a search for the ‘49ers’ and be looking for the football team, but a historian may be looking for research material on California. And you might even do a search today for 'bones' trying to find where to buy your dog a treat, but tomorrow you do that same search looking for an episode of the TV series 'Bones' that you missed the night before.

How can a search engine disambiguate such searches? Mind reading would be an excellent approach :-)

So far the best approaches the search engines have come up with are the use of human quality raters and personalized search. The better the search engines profile the searcher, the higher the chances of producing relevant results. This method obviously raises a lot of privacy concerns.

2. Natural language searches.  A MySQL database engine can precisely return all the relevant records given a query 'select first, last from employee where last = "Smith";'. There is a formal syntax and no ambiguity. A search engine, on the other hand, receives 'who has smith as last name in chicago' or 'smith last name chicago'.  The query is in natural language -- our language. There are many different ways to say the same thing--there is context, there are human idiosyncrasies, and so on. The searcher component of a search engine must disambiguate the query and translate it into a more formal manner before looking it up in the index.

3. Poor queries. Many searchers don't know how to express what they want in the real world, and are even worse when attempting to ask a search engine. They call the vacuum cleaner a ‘sucker’ and are unable to find cleaning services online. Worse yet they misspell words, making the problem more 'interesting' for search engines. 

4. Synonymy. This is another challenge. There are words that have the same meaning, like ‘car’ and ‘automobile’. When you do a search you would like to get pages that contain your exact words, and pages that contain other words that mean the same thing, as long as they are relevant to your search. Let’s say you do a search for 'monkey'. You would want your results to include pages that contain monkey, but perhaps also the words ‘chimpanzee’ or ‘ape’. If you were a little bit more strict, you would not want to include pages that say chimpanzee because, although a chimpanzee is a primate, it is not a monkey. These details don’t pass through the minds of most searchers, but search engines have a hard time because of it.

primates

5. Polysemy. There are words that change their meaning depending on the context in which they are used. For example, if you do a search for 'wood' you might want pages that are talking about pieces of a tree, or you might be talking about the geographical area that has many trees. Without the right context, it is hard for a human to tell. Imagine how hard it is for a search engine!

6. Imperfect performance. To follow up on my previous post about relevance feedback, let me introduce a couple of related concepts to better explain this problem: precision and recall

Precision and recall are metrics used by information retrieval researchers to evaluate the performance of search engines. It doesn't matter how sophisticated the ranking algorithm is, at the end of the day what really matters is whether the user likes the results or not. Precision is a measure of how efficient the search engine is in returning only the relevant results for the search. The more irrelevant results, the lower the precision. Recall, on the other hand, measures how good the search engine is in returning all the relevant results. (Of course, this assumes the researcher knows how many relevant results there are.) The more relevant results missing from the search, the lower the recall.

Ideally, a search engine should identify all relevant documents without returning any irrelevant ones (100% precision and 100% recall). In practice, this has been proven to be impossible, as precision and recall are inversely proportional.

Empirical studies of retrieval performance have shown a tendency for Precision to decline as Recall increases. Trade-off between precision and recall

Fortunately, most searchers are more concerned about precision, especially in the top ten results. Few of us search past the first couple of result pages (SERPs). Relevance feedback via Quality raters is an excellent approach to improve precision. Quality raters can select the documents that are most relevant to the search, and that information can be used to refine the original search and yield better results for most users.

7. Spam. Search engines identify relevant pages by means of 'quality signals' or metrics that can be deduced from web pages by automated means. The challenge to search engines is that once black hat SEOs identify those signals, they can start to fake them. I think that over time it is going to be harder and harder to fake quality signals, but it is never going to be impossible. For humans it is easy to spot spam, but for computers it is much harder.

Why is it important to know all this?

This subject is important because it proves an interesting point. Although the search engines don't want to admit it, they need us (SEOs). As I mentioned above, relevance is subjective. Do you want to take a passive approach and hope for the Quality raters to qualify your website for the searches they think are relevant? Or, instead, do you want to take an active role and identify the best keywords, include them in your content and incoming links, and carefully study websites that are ranking high (web authorities) to see how you can do the same? Personally, I prefer the active role.