The Economist online has a fascinating article about a software project run by the Dupuy Institute called the Tactical Numerical Deterministic Model (or TNDM). The software is a predictor of the outcome of military events, from the long term wars or insurgencies to single battles to the effectiveness of particular weapons, camouflage or other gear. The software is relevant to our industry because apart from purpose, it acts in many ways like a search engine - seeking out hundreds of thousands of variables in order to most accurately make predictions and forecasts. The article likens it to a weather forecasting system, also an apt comparison and one that made me reflect on some of the foundations for search technology.
The TNDM relies on data from thousands of previous conflicts, from World War II to Vietnam to Bosnia to test its inputs for accuracy. Search engines are forced to rely on less historically grounded and fixed data, and so must use the subjective measurement of human relevancy to predict the effectiveness of their own models. This means that instead of being able to match results of the query to hard, factual data from the past, search engines must try to estimate what "most users" want.
Some of the most fascinating parts of the TNDM is the statistical thoroughness with which a conflict is evaluated - the sheer number of inputs is wholly remarkable, and a big part of what enables accuracy.
The TNDM's predictive power is due in large part to the mountain of data on which it draws, thought to be the largest historical combat database in the world. The Dupuy Institute's researchers comb military archives worldwide, painstakingly assembling statistics which reveal cause-and-effect relationships, such as the influence of rainfall on the rate of rifle breakdowns during the Battle of the Ardennes, or the percentage of Iraqi soldiers killed in a unit before the survivors in that unit surrendered during the Gulf war...
...To model a specific conflict, analysts enter a vast number of combat factors, including data on such disparate variables as foliage, muzzle velocities, dimensions of fordable and unfordable rivers, armour resistance, length and vulnerabilities of supply lines, tank positions, reliability of weapons and density of targets. These initial conditions are then fed into the mathematical model, and the result is a three-page report containing predictions of personnel and equipment losses, prisoner-of-war capture rates, and gains and losses of terrain.
In a big way, this type of "fixed" information is largely lacking from the realm of web search. Implementation and indeed, even a correlation may seem hard to come by, but the devil's in the details.
Where modern search engines have failed is on the input side. Users of search engines have not been trained to enter data, and indeed, they shouldn't be, but the simplicity of a 1-3 keyword search yields massively undesirable queries and thus, undesirable results. One great way to perceive the effectiveness of the TNDM is to note the sharp requests for data on both sides of the processing - input and retrieval.
If search engines could find ways to get better data on what their users want, they too, could deliver more carefully structured, finite results. But, the answer may not be in increasing the length of queried terms (as AskJeeves has done) or in personalizing results to users based on their own historical data (as Google, Yahoo! & MSN are trying).
Instead, the answer may be in how you ask the searcher for what they want. I note that searches at site like Amazon or Gettyone use a single extra select box to let users choose what they want when they search. The addition of a box like this (or, at the very least, the option to use one) could be a huge contributing factor in improving the quality of web search results. The only question will be if search engines have already hooked their users on the simplicity of keyword only searches. If so, making a user population change their ways won't be easy... Perhaps the PHds have better ideas up their sleeves.
I think it'd be neat if a search engine could have a select box that dynamically updates with new categories as you type in your query so that it could match your query better then a static select box.
So if you type in dogs into the search box then the select box would have "dog food" "dog breeds" "dog toys"....and so on and so forth...