At work today, Matt noted that he found Digg's algorithm far more interesting than Google's. I was shocked - after all, Digg isn't nearly as complex or widely used as Google, but with its rising popularity in the tech space, I could, at least, empathize with why he might feel that way. I also took it as a challenge to expose all the possible elements that might be in an algorithm at Digg, Reddit, Netscape, Shoutwire or other social-news-voting sites. Let's see how I do:
BTW - I'm going to use a lot of Digg-specific terminology, despite the fact that I'm referring to all of the sites above.
- Number of votes over time
- Uses a floating target based on relative levels of popularity (as mentioned in timing below)
- Any number of votes in a very short period (if not manipulative) is stronger than the same number of votes over a longer period.
- Domain of link
- Has it previously had content submitted? If so, did that content receive votes, get marked as spam/lame, make the front page, etc?
- Has the domain been manually/automatically flagged for being manipulative
- Profile of submitter
- Have they submitted high quality stories in the past?
- Have they submitted spam/lame stories in the past?
- How many friends do they have? This could make it harder or easier to get a story Dugg (harder if they have thousands of friends, but possibly easier if they have at least a few)
- How many submissions have they made? What is their success rate?
- How long has the member been around? New registrants could be a clear sign of spam
- Profiles of voters (as above)
- Timing of submission
- If a low number of stories have recently made the front page in a given sector or overall, the story is more likely to get on top with fewer votes
- If a high number of recent submissions, the opposite may be true
- Time of day - if 50 people all tag a site at 3:00am, that might be a red flag
- Similarity to other links (duplicate)
- Source of votes
- From the same IP address or IP block
- From the same geographic region (that's not a hotspot for Digg users)
- From the same group as has voted on previous content from a domain or string of domains
- From a group of users who aren't regular participants/voters
- Manual review as it hits the homepage
- Many Digg users may not realize it, but all stories to hit the frontpage get a manual, editorial review that may pull the story. This often happens with content the editors feel is marketing-focused, driven by marketing dollars or has a marketing agenda.
- Reddit does this, too, but it's not instantaneous
- Netscape used to do it, but some have speculated the the level of oversight fluctuates
- As a quick example, Brian Clark (of Copyblogger) had this post hit Digg's homepage last week for a scant minute or so before the editors pulled it.
- Number of comments
- Potentially could be used to detect patterns, though I've seen a lot of Dugg stories that had very few comments, so this might not be a great signal
- Number of views
- An abnormally high ratio of views with few Diggs could mean that people aren't fans of the content
- In my opinion, this is a low signal, and down votes or lame/spam would earn more weight in bringing down a story
- Down votes
- Although Digg doesn't specifically have them, Reddit does and surely uses them as an influential factor
- Digg, Netscape and Shoutwire all use flag systems which could be similarly interpreted
- Source of Votes
- I suspect that Digg would follow how users normally reach pages (through friends, via direct links, via email/type-in, etc.)
- If an abnormally high number of folks came via an uncommon method to a Digg page (for example, with no referring URL, possibly signifying a mass email or IM link), Digg might want to discount the value of those votes
In a wonderful irony, the Digg website appears to have crashed tonight (a likely cause could be the new re-design, which Neil details at SELand).
So, what do you think? Are there other elements you'd consider having in your own social media voting site? Any obvious ones I neglected to mention?
Netscape banned a URL i was submitting from but managed to keep the story on the main page. So my guess is they are looking enough to flag sites but even after seeing them as bad they keep them in the mix.
Odd huh.
While many of these items certainly play an important role in how digg works, what I said yesterday was that digg (like google) was a black box. Sure, we can speculate about how it works but the real gems are in the intricacies of the algorithm.
For instance: Is it better to have five of your friends digg your stories right away after they are submitted or let the digg users digg them naturally? What if those friends are "in real life" friends but your digg account's aren't linked as friends? Should I submit stories at 8am instead of 3am? Why do certain stories get X diggs over Y time and make it to the homepage, while others with more digs than X in less time than Y NOT make it to the home page?
All of these questions I could provide speculative answers to, much like how we all speculate about how Google ranks pages. This doesn't change the fact that both systems are still black boxes.
Interviews with Kevin Rose provide some information about how Digg works, but ultimately the details are kept secret.
https://blogs.zdnet.com/web2explorer/?p=109
https://www.marketingshift.com/2006/9/diggs-ke...
This unofficial FAQ provides some good information, but again I'm not sure how much of this is evidence and how much of it is just more speculation. https://www.seopedia.org/tips-tricks/social-me...
That's not really what the story was about. It certainly wasn't Digg specific... it could apply to links, bookmarks, whatever. The post was about fulfilling human needs with great content.
Spammy, I know.
Unfortunately, I made fun of the narrow obsessions that Diggers seem to have. With that crowd that's likely all it took to get boomed.
This story that i submit today got enough vote to become popular, however, it was flagged before it became popular.
Obviously digg doesn't like people talk about them. And that's the problem of digg. It is not 100% user opinion.
One thing that I think Digg takes into account is the number of people that mark a story as My#1 before it hits the homepage. Logically if a dozen or so people make a story as their #1 it means that the story is usually pretty good.
Rand, that's Neil's great post on SEL, not Danny's. :)
Holy crap, you're right! I didn't even know Neil was writing for SEL! (runs off to edit his post)
This was his first post and a good one at that.
Another measure may be if someone tries to submit the story after it has already been submitted. This may be worth more than a single digg.
I think that an algo similar to this (but much simpler) could be used to rank content on the homepage or category pages of retail sites or info sites. I do it manually now but it probably could easily be automated taking data from logs, shopping cart, profit margin spreadsheet.
another factor may be how many homepages a domain had gotten in the past.
Take this submission:
https://digg.com/tech_news/Digg_Unveils_New_Fe...
9000+ Diggs and over 400 comments and not even One day old....
But look at many Diggs it took to get to the Homepage (this was from Kevin Rose Blog, which is consistantly Digged) this was taken from Diggtrends (which archives current homepages)
So, even making it to the homepage after 31/2 hours from submitting, it still took 85 votes to validate it.
The "bury" button I'm pretty sure all you need is 10 or so bury votes and your story is gone. Makes things really susceptible to voting cabals.
Vote velocity: The rate at which you are acquiring new votes. For example a breaking news story that gets lots of votes very quickly will get to the homepage with a lower number of votes. This stood out during the "james kim" period.
Netscape, that thing is locked down tighter than a maximum security prison during a visit from the governor. If you're not part the "old boy network" there anything that gets lots of votes and should get to the homepage gets a full cavity search with 36" infrared and ultraviolet borescope. If you anything you have ever submitted has been within a 3 mile radius of anyone who ever thought about something that might possibly be perceived, related or otherwise similar to spiced luncheon meat will get your story locked in New York minute. Penalties will be swiftly administered and be overly harsh.
Graywolf wrote...Vote velocity: The rate at which you are acquiring new votes. For example a breaking news story that gets lots of votes very quickly will get to the homepage with a lower number of votes. This stood out during the "james kim" period. I beg to differ.
The James Kim story got to the homepage fast, because it got to the tipping point of diggs (~60) VERY VERY fast. I don't think they have a part of their algo that says...well, if the story got 30 digs in 2 min, it's front page material! That wouldn't work for a lot of reasons.
Jeff - I somewhat agree with Michael that vote velocity can matter. In your example, it's more of a warning flag for an editor to review, but it could still make the homepage very quickly with very few Diggs.
Although - according to some folks, (https://parislemon.blogspot.com/2006/11/diggs-...it can be frustratingly slow at times to reach that front page.
Yeah "great post" but comparing Google's algorithm to Digg's?
And who is this Matt fellah and what does he know about algorithms?
I personally would not want the Google algorithm to mirror a lame voting system.
Please explain further, thanks.
Aaron,
There is no comparison between Google & Digg's algorithm here. Read it again and you will see that they are only speaking about the Digg type algorithms and in no way saying that Google should adopt something similar.
This is a tough one for you Aaron (and I really don't know you well, so excuse my frenchy impoliteness):
Why are you always playing the devil's advocate on every blog post I see on SEOmoz and SEObook... I mean, it just feels you bash everything ;)
I still love you, don't worry.
Aaron - I'm Matt and I was saying to Rand that the secrets behind the digg algorithm are far more interesting to me right now than the secrets behind the google algo. It has nothing to do with using the digg voting system in Google. RTFA.
How about topic of post? Referring to the Copy Blogger example above, landing on the first page with a story on how to make the first page of digg, marketing content or no, will probably be much harder than anything else. Probably falls in the manual review, but could be flagged algorithmically.
One thing I am surprised they don't track (at least as far as I can see from looking at Digg's code) is the number of people who actually click through to the site they are voting. Seems like an easy way to spot people who are just voting a friends story.
This may or may not be a good measure. I have "dugg" many articles that look interesting. I'm marking them to read later, not "voting" for them.