I was inspired by Xan Porter's recent post on Evaluating Web Page Quality. Search engines have their own evaluation signals, but as they try to improve in quality, it's only natural to pursue more human metrics in order to refine to the greatest possible degree. The research she points to - less is more: probalistic models for retrieving fewer relevant documents - offers up a terrific structure for how humans might evaluate documents:

Intrinsic Features:

  • How accurate is the information presented?
  • How biased or unbiased is the data?
  • How believable is the content?
  • How credible is the source?

Contextual Features:

  • Is the information relevant to the user's query?
  • Does the information add value to the subject?
  • Is the work recent enough to be of value?
  • Is the source thorough in its presentation?
  • What amount of information is provided?

Representational Features:

  • Can the material be interpreted in different ways?
  • How easy or difficult is the material for a user to understand?
  • Does the document state the information concisely?
  • Is the source consistent?

Accessibility Features:

  • Is the document accessible?
  • Does the content present security risks?

Of these, which can search engines currently measure? I'd guess that of the above, they have signals about many, but only solid metrics for accessibility, security risks and, possibly, readability... There's a long way to go.