I've recently been thinking more about some ranking signals that in the past, I dismissed (perhaps foolishly). Some of these the engines have previously disavowed, while others don't get the attention or discussion they potentially deserve. My list includes:

  • Mentions of a domain / brand name - particularly in sources that the engine has classified as "news." I suspect we'd find a reasonable correlation and probably plenty of examples of domains that begin ranking once they earn these mentions.
  • Nofollow links from trusted sources - by running a bit of analysis across the domains on the web, engines could see, quite simply, who links to very good pages/domains and with what level of consistency. From there, it's an easy step to simply "count" those nofollowed links as followed or treat them similarly to the mentions above. This metric already gets a lot of attention, and our correlation data, at least, suggests that a high number of links/linking root domains with no-follows does correlate to better rankings.
  • LinkedIn + Twitter profile links - since these sites (and likely others like them) are used primarily by real humans, most of whom can't afford to have a spammy site seen by potential employers/networkers, these links are likely golden for search engine uses.
  • Traffic patterns via aggregated Google Analytics data - if the search quality team received a list of domains that sent/received traffic and the relative quantity levels, I suspect they could put this to use as a methodology to sort the spam from the real sites (spam tends not to send out traffic, nor receive it from a diverse range of good sites). It would also be an incredibly tough metric to game - how do you draw down lots of referral traffic from many unique high value sites (directly - most ads would get filtered) without actually being interesting and worth visiting?
  • Mobile visits, check-ins and interaction - Though still tough to determine/track compared to some other metrics, I'm thinking that a local business or relevant website only gets clicks and interactivity from mobile browsers/devices if it's highly relevant and useful. This could be another solid way to filter spam and get data for local/maps types of rankings (presuming the engines had access to the data at scale... can you say Android/Windows Mobile?) :-)
  • Links and references in Gmail - Again, it's unlikely Google's actually reading our email, but certainly the search quality team could get a list of the number and diversity of references to sites used in email (much the same way Gmail delivers "personalized" ads based on the content of emails)
  • Content that garners comments/UGC - if real people are actively participating on a site around unique content, I'd wager to guess that content is likely the type engines would want to rank. Things like comment RSS feeds, trackbacks and content uniqueness analysis could all be leveraged to help sort.
  • Rich media present on site and around the web - Spammers don't make a lot of unique graphics, images and photos. Likewise, they don't film original video, don't post podcasts, don't build Flash elements, upload Excel spreadsheets, graphically heavy PDFs, or the like. Real websites and businesses run by real people and businesses do. Since the engines already have the indexing and segmentation capacity, there's nothing to stop them from examining the data as a quality signal.

I'm not saying that Google/Bing are definitely using these, but I'd suspect that all of them have practical applications in improving search quality and relevancy. And, by running correlations and analysis of these datapoints ourselves (where possible), we may be able to learn more about what makes a site "look natural" and rank-worthy to the engines, particularly since so much of my email and our Q+A seems to be worried about false positives of late.

I'm curious - any other factors you think fit this pattern/system?