At work today, Matt noted that he found Digg's algorithm far more interesting than Google's. I was shocked - after all, Digg isn't nearly as complex or widely used as Google, but with its rising popularity in the tech space, I could, at least, empathize with why he might feel that way. I also took it as a challenge to expose all the possible elements that might be in an algorithm at Digg, Reddit, Netscape, Shoutwire or other social-news-voting sites. Let's see how I do:

BTW - I'm going to use a lot of Digg-specific terminology, despite the fact that I'm referring to all of the sites above.

  • Number of votes over time
    • Uses a floating target based on relative levels of popularity (as mentioned in timing below)
    • Any number of votes in a very short period (if not manipulative) is stronger than the same number of votes over a longer period.
  • Domain of link
    • Has it previously had content submitted? If so, did that content receive votes, get marked as spam/lame, make the front page, etc?
    • Has the domain been manually/automatically flagged for being manipulative
  • Profile of submitter
    • Have they submitted high quality stories in the past?
    • Have they submitted spam/lame stories in the past?
    • How many friends do they have? This could make it harder or easier to get a story Dugg (harder if they have thousands of friends, but possibly easier if they have at least a few)
    • How many submissions have they made? What is their success rate?
    • How long has the member been around? New registrants could be a clear sign of spam
  • Profiles of voters (as above)
  • Timing of submission
    • If a low number of stories have recently made the front page in a given sector or overall, the story is more likely to get on top with fewer votes
    • If a high number of recent submissions, the opposite may be true
    • Time of day - if 50 people all tag a site at 3:00am, that might be a red flag
  • Similarity to other links (duplicate)
  • Source of votes
    • From the same IP address or IP block
    • From the same geographic region (that's not a hotspot for Digg users)
    • From the same group as has voted on previous content from a domain or string of domains
    • From a group of users who aren't regular participants/voters
  • Manual review as it hits the homepage
    • Many Digg users may not realize it, but all stories to hit the frontpage get a manual, editorial review that may pull the story. This often happens with content the editors feel is marketing-focused, driven by marketing dollars or has a marketing agenda.
    • Reddit does this, too, but it's not instantaneous
    • Netscape used to do it, but some have speculated the the level of oversight fluctuates
    • As a quick example, Brian Clark (of Copyblogger) had this post hit Digg's homepage last week for a scant minute or so before the editors pulled it.
  • Number of comments
    • Potentially could be used to detect patterns, though I've seen a lot of Dugg stories that had very few comments, so this might not be a great signal
  • Number of views
    • An abnormally high ratio of views with few Diggs could mean that people aren't fans of the content
    • In my opinion, this is a low signal, and down votes or lame/spam would earn more weight in bringing down a story
  • Down votes
    • Although Digg doesn't specifically have them, Reddit does and surely uses them as an influential factor
    • Digg, Netscape and Shoutwire all use flag systems which could be similarly interpreted
  • Source of Votes
    • I suspect that Digg would follow how users normally reach pages (through friends, via direct links, via email/type-in, etc.)
    • If an abnormally high number of folks came via an uncommon method to a Digg page (for example, with no referring URL, possibly signifying a mass email or IM link), Digg might want to discount the value of those votes

In a wonderful irony, the Digg website appears to have crashed tonight (a likely cause could be the new re-design, which Neil details at SELand).

So, what do you think? Are there other elements you'd consider having in your own social media voting site? Any obvious ones I neglected to mention?