Although humans power social media, it is algorithms that provide the frameworks that make user input useful. As proven by the countless social sites online, finding the correct mix of participation and rules can be extremely difficult. Below are some of the algorithms that when combined with the right people have proven successful.
Formula:
(p - 1) / (t + 2)^1.5
Description:
Votes divided by age factor
p = votes (points) from users.
t = time since submission in hours.
p is subtracted by 1 to negate submitters vote.
age factor is (time since submission in hours plus two) to the power of 1.5.
Source: Paul Graham, creator of Hacker News
Reddit:
Formula:
Description:
First of all, the time 7:46:43 am on December 8th 2005 is a constant used to determine the relative age of a submission. (It is likely the time the site launched but I have not been able to confirm this) The time the story was submitted minus the constant date is ts. ts works as the force that pulls the stories down the frontpage.
y represents the relationship of up votes to down votes.
45000 is the amount of seconds in 12.5 hours. This constant is used in combination with yts to "water down" votes as they are made farther and farther from the time the article was submitted.
log10 is also used to make early votes carry more weight than late votes. In this case, the first 10 votes have exactly as much weight as votes 11 through 101.
Source: code.reddit.com, Redflavor.com and Hacker News user Aneesh
StumbleUpon:
Formula:
(Initial stumbler audience / # domain) + ((% stumbler audience / # domain) + organic bonus – nonfriend) – (% stumbler audience + organic bonus) + N
Description:
The initial stumbler "power" (Audience of the initial stumbler divided by the amount of times that stumbler has stumbled the given domain) is added to the sum of all the subsequent stumbler's powers.
Subsequent stumbler power is ((Percentage of audience stumbler makes up divided by the number of times given stumbler has stumbled domain) + a predetermined power boost for using the toolbar - a predetermined power drain if stumblers are connected) + (% of the stumbler audience + a predetermined boost for using the toolbar)
N is a "safety variable" so that the assumed algorithm is flexible. It represents a random number.
Source: 2007 Tim Nash at The Venture Skills Blog Please see his blog post for more in depth information
Del.icio.us:
Formula:
Points = (Amount of times story has been bookmarked in the last 3600 seconds)
Description:
Rank on Del.icio.us Popular is determined by comparing points. Points represent the amount of times a story has been bookmarked in the last hour. The higher the rate, the higher the points. Every bookmark counts as one point.
3600 is the seconds in one hour.
Source: Based on my extended observations of Del.icio.us Popular
Digg is different. The company is a lot less transparent than the above mentioned companies. It is fearful of being gamed and in response has created a secritive algorithm that appears to be far more complex than its competition.
At a minimum I expect that Digg's algorithm takes into account the following factors:
- Submission Time
- Submission Category
- Submitter's Digg authority
- Submitter's website wide activity
- Sumbitter's friends and fans
- Subsequent digger's authority
- Subsequent digger's friends and fans
- Subsequent digger's geo location
- Subsequent digger's HTTP referer
If you have any other advice or thoughts that you think is worth sharing, feel free to post it in the comments. This post is very much a work in progress. As always, feel free to e-mail me or send me a private message if you have any suggestions on how I can make my posts more useful. All of my contact information is available on my profile: Danny Thanks!
I originally titled this: Algorithms Exposed! or the Reason Danny is Lonely on Friday Nights :)
Some additional things to keep in mind on a few of these....
With StumbleUpon - three very important factors are categories, tags and reviews. Curious about what tags or categories to focus on? Check out the tag cloud here.
With Digg - one of the most important factors is the domain's authority. Authority in terms of being an established, recogonized and established source for the Digg community (CNN, Ars, Engadget, BBC, etc).
Another thing that always plays an important factor are comments. Comments not only play into the algo but they also go a long way in helping to shape how the story is perceived. I can't stress this enough. More reading on this in this excellent case study by Rebecca here at SEOmoz or this column I wrote on Search Engine Land.
I've 'seen' profiles just under 4 months old climb to the top ranks of Digg by simply being truly active in the community (and having really great content).
It's not about gaming the system as much as it is about just being active in the community. Adding value to the community and truly engaging with them seems to work the best. I don't feel it's about the math or the factors. I feel it is about choosing to spend some resources on building a true social media profile that people want to associate with . . . similiar statements could be made about SEOmoz. Take Sean for example . . . in well under a year, he's become the most popular person on SEOmoz (non-employee). How did he do it? By spending a lot of time and energy to PARTICIPATE.
Danny is another great example . . . the time and energy he puts forth in creating these 'advanced' posts has made him quite the up an comer on the SEOmoz staff within this community.
It's dedication and committment that really matters . . . though I must admit . . . this is, yet another, kick ass article by Danny!!
Payne
I could not agree more Mr. Payne. I would love to have 3-4 hours per day to dedicate towards commenting on SEOmoz and being active in a few social networks. I am not sure how all of these people do it? I have to work on client's and our own company 10 hours a day and have a family. Whew!
Its amazing how much all the social mediaites and mozzers put out. I asked Rand at a conference one time, how do you find the time to put out such well thought out posts and run a company and have a relationship? You just have to be a really really good writer. And probably work alot and be really effective when you do work!
For me . . . it's what I do 'for fun'. I honestly would prefer to hope on SEOmoz and either make a complete fool of myself or pump out some great content than say . . . watch a baseball game, play xbox, etc. It truly is an enjoyment for me.
As for Rand, I believe he is just wicked smart and it doesn't take him long to post a great article. He has a lot of support from his fiance Geraldine and, like most successful people, he surrounds himself with people that encourage and support him. Now, keep in mind, Rand is constantly 'working' but I would hope it doesn't feel like 'work' to him. I know much of what I do doesn't feel like work to me.
Brent D. Payne
Wow, that was a lot to absorb at 7:30am :) One thing that's been striking me lately about social media sites, and that all of these formulas back up, is the newness factor. Sites seem to reward pages that are "hot" and get a lot of attention quickly, but what about those pages that have real staying power and people come back to time and time again? It seems to me that the latter are some of the best resources, and social media does a very bad job of recognizing those resources.
Of course, you could argue that those resources are the ones that people build links to over time and have the most power in the SERPs, which is probably true. I'm just not sure if the "what have you done for me lately?" approach of social media is really helping us find the best content. As a culture in general, we really overvalue what happened in the past 24 hours.
Most social media users are plugged in every day and are pretty current on what's interesting to them. They can go to their profile and easily find something that they read or commented on if that article is no longer where it was the last time they accessed it. And you're right, there is a high value for recent news, but I think that's kind of the point. Bring people back and make them stay connected in order for social media advertisers to get more exposure.
That said, I've been disconnected by travel and inaccessability twice recently, only to find out about my killer Nalgene bottles and Tim Russert's passing through non-social media avenues. SM is great for what it is, but I don't think the intention was to create an archival resource.
Sounds like you may have a good idea for your own version of an SM site though Dr.
You're absolutely right, Tim; the whole point of most social media sites is to distribute news and generally new and interesting material, and there's nothing wrong with that. I think my reaction may be a broader one and is a bit related to the election coverage this year. It seems like the media in general is so desperate to find news (especialy now that we have 24-hour news channels and blogs) that we treat what happened today as the only thing that's worthy of attention. That trend worries me a little.
I agree to a certain extent. Again, it all comes down to money if you want to get really broad. Nothing is ever going to replace that internal filter that humans have, and we're certainly going to need to use it more and more as this now-or-never trend continues in the world of making money off of news and current events.
I probably give the average information consumer too much credit in this area (even though I generally give the average person very little credit), but it seems to me that savvy with regard to how, when and where, someone chooses to consume information correlates well with the ability to filter/censor and take that information for what it's worth.
To me, this ability also indicates some sort of predisposition to fact checking (things that don't seem right), and using more archival sources for finding older material. To your point, even Google is trending towards valuing new information more highly, but at least they settle out their SERP's in favor of more established and Google worthy listings over time.
Great Discussion! Will be interesting to witness this in person over time.
E=WTF(2)
Really interesting post, although given my lame maths skills I'm gonna have to come back and have a read tonight to work out how they actually work!
just a thought...
People may also consider taking the time they spend on figuring out these algorithms and spending that time on creating quality content.
My thoughts exactly. And comments definitly play a big role in the success of a social media campaign on any of these sites, specifically with Digg in my experience. Creating quality content and putting earnest effort into promotions is still the best way to succeed.
There's some systems worth spending time learning how to game (Google) and others that don't really require it (Social Media).
The algorithms exposed! This must be the reason you're at home on a Friday night, lol. Very good breakdown would love to see what some of the social media guys have to say though.
More great food for thought, Danny!
When someone submits a story to one of these sites, their decision to do so is completely their own, but from that point on, it's the combination of human behavior and the algorithms that drives the story's progression. Almost like a ping-pong ball going back and forth - votes, comments, ranking changes, more votes (or not), etc.
At this point, these social media sites are still dependent on human beings to take the first step and submit content. The day that Digg figures out how to find articles that are interesting all on its own...that's going to be really something.
And in a way, that's what Google does already when it serves up the SERPs, which makes the connection between search engines and social media sites clearer to me than it's ever been before.
The only teeny-tiny suggestion I can make to you is to add a "so what?" concluding paragraph to your posts. You've obviously spent an enormous amount of time researching these posts, and I would highly value your opinion on what it all means (especially in the realm of SEO). Sure, you might just be speculating in an educated manner, but that's all any of us are doing most of the time anyway.
Danny, your posts are excellent. This one is no exception.
SEOMoz is lucky to have you.
Reddit launched in June 2005, so the date must have some other meaning. Someone's kid born maybe?
At any rate, thanks for compiling this information!
That's the date/time of the last time they reset all article rankings. They had to do a few resets when reddit launched due to some updates and fixes.
Alright,
Next week I'll be expecting the complete algo from goog. And then a suplimentary post evrey time it changes.
Thanks Danny!
Edit:Here's a legitimate project suggestion, or maybe you could just pointme in the right direction. I'm looking for a good breakdown of the differences between the old and new .js googalytics scripts. Any thoughts?
At the moment, there is no difference in the Google Analytics tracking scripts, except the newer one can be built on for newer features... or so Google say
I appreciate entries like this but I have to take exception to
"algorithms, the quintessential example of all that is not human"
On the contrary, math and logical thinking are among the greatest achievements of humankind. Where would we be without algorithms?
That statement is not meant as offensive, I am just noting what I believe to be the general public's perspective.
Hollywood loves using this idea as a plot device. HAL in 2001, Auto in WALL-E.Even Ask.com agrees.
Maybe the article is old, but the article will remain always here for all who want to learn about Social Media... Thanks for sharing ...
Nice post!
However i am doubful of the 3600s thing regarding del.icio.us
Based on my observation, its somewhere between 1hr - 2hr.
ahem.... 3600 seconds IS 1 hr cough....
Great post Danny, interesting to see how transparent some of them were!
Thx captain O!
But, what i meant was its not exactly 1hr but even upto 2hrs.
Cheers
Freshness plays a large role in googles algorithm of social media bookmarks and profiles. Alan Rothstein
This formula implies that as more time passes on a post it becomes more values. Assuming we have the same vote ups and downs for a story so the log10 (z) is constant and y is still constant, the more ts we have the higher the yt/45000 value which implies higher the ranking
Does this make sense or there is something that I'm totally missing?
Danny,
Excellent Post! I will have to go through the algo's again at a later time. Keep up the great work you contribute to SEOmoz. Now it is time to jump in the pool and bake in the 110 degree phoenix sun.
Since the SEO people have been more obsessed lately with getting *something* on the front page, and less concerned with what it actually is, perhaps now would be the time to discuss whether you're more likely to have something be rated highly by working to make interesting and creative content or by trying to game the system, because, let's face it, that's the whole point of the post, right? I'm just saying it now so I can point to this when it eventually happens: gaming the system destroys its usefulness. Eventually reddit and stumbleupon are going to be just as bad as digg, full of stupid, self-promotional crap that no one except other spammers want to read. The real people will be long since gone to another site, until y'all find us there and start spamming again. This doesn't have to happen. You could all get together and decide to blacklist/bury the people who exhibit the worst spammy behavior, so that actual users will still want to use the sites, instead of always trying to find somewhere the spammers haven't reached yet. But I know this won't happen. No one has every been able to get people to simply agree to stop polluting and wasting any resource, be it water or airwaves. The only solution that has every kinda worked is regulation, so get ready for it, suckas, 'cause it's coming and you're the ones that brought it down upon your own heads.
Your description of the Reddit algorithm seems funky to me.
First, there are typos in the formula for z. The absolute value symbols don't belong in the conditions and are superfluous in the value if x>=0.
Second, is it really the case that a negative value for x yields z=1, but an x of zero yields z=0? An x of zero should give a z higher than (or at least equal to) a negative x.
Third, your description of the log function's purpose doesn't match the equation. The log function isn't applied to any variable that tracks the time of the vote.
It's not surprising that Paul Graham's formula is the simplest.
Maybe that the original post changed in the menatime? but now from the formula
z=1 if x=0
and final F has the same sign as the rating so for positive ranking it has positive value and for 0 ranking it has value of 0 and for negative it is negative.
I see y=0 if x=0, not z=0 if x=0.
And I looked a bit on code.reddit.com, but couldn't find these formulas.
This post is wrong on the Reddit algorithm. The actual algorithm is:
f ( t, y, z ) = ( log10(z) * y ) + ( t / 45000 )
While this post states:
f ( t, y, z ) = log10(z) + ( y * t / 45000 )
The "y" is misplaced.
Sources:
https://github.com/reddit/reddit/blob/master/sql/f...
https://bibwild.wordpress.com/2012/05/08/reddit-st...
Math hurts my head :( There's a reason I became a web designer ye know.. :P
Great post though, good for you for figuring all that out.
Killer post Danny... I'm really loving the stuff you're putting out on here. The point about "subsequent digger's geo location" is something I haven't even considered before.
The following day we took leave of our hosts. We decided toreturn, inasmuch as our mission was accomplished; and TushegounLama explained to us that he would "move through space." Hewandered over all Mongolia, lived both in the single, simple yurtaof the shepherd and hunter and in the splendid tents of the princesand tribal chiefs, surrounded by deep veneration and panic-fear,enticing and cementing to him rich and poor alike with his miraclesand prophecies. When bidding us adieu, the Kalmuck sorcerer slylysmiled and said:"Do not give any information about me to the Chinese <a href="https://cocinatuidea.org/">Cheap Jordans</a> authorities."Afterwards he added: "What happened to you yesterday evening was afutile demonstration. You Europeans will not recognize that wedark-minded nomads possess the powers of mysterious science. Ifyou could only see the miracles <a href="https://centoolio.de/">Diesel Jeans</a> and power of the Most Holy TashiLama, href="https://www.noltecanada.com/">Canada Goose</a> when at his command the lamps and candles before the ancientstatue of Buddha light themselves and when the ikons of the godsbegin to speak and prophesy! But there exists a more powerful andmore holy man. . .""Is it the King <a href="https://www.dreamstock.us/">Livres</a> of the World in Agharti?" I interrupted.He stared and glanced at me in amazement."Have you heard <a
href="https://showlace.com/">Pittstown New York Website</a> about him?" <a href="https://www.yourselfesteemcoach.com/">Coach Outelt</a> he asked, as his brows knit in thought.After a few seconds he raised his narrow eyes and said: "Only oneman knows his <a
href="https://www.alphatools.ca/">Cheap Jordans</a>
holy name; only one man now living was ever inAgharti. That is I. This is the reason why the Most Holy DalaiLama has honored me and why the Living Buddha in Urga fears me. <a href="https://www.diesterwegstiftung-solingen.de/">Stiftungsneuigkeiten Solingen</a> undefined
Dude absolutely loving this post - well done Danny!
But Danny, it's Wednesday...
He posted it on Wednesday to have more chances to get dugg. ;-)
And it worked! Congratulations!
you r clever , umm like me :P
The algorithm from StumbleUpon doesn't make any sense. I mean, for example, the "Organic Bonus" doesn't count at all.
Take a look:
(Initial stumbler audience / # domain) + ((% stumbler audience / # domain) + organic bonus – nonfriend) – (% stumbler audience + organic bonus) + N
= (Initial stumbler audience / # domain) + (% stumbler audience / # domain) + organic bonus – nonfriend – % stumbler audience - organic bonus + N
= (Initial stumbler audience)/ # domain + (% stumbler audience)*(1/ # domain - 1) – nonfriend + N
The rest has kind of logic, if they want to prevent the community from spammers
I wasn't going to comment since the article is so out of date, but just to clarify the organic bonus was related to each person, not a general bonus and so does not cancel out ;) if you read the full article it would have made more sense. That and some LaTeX as you can clearly see it was simplified for the blog to the point of being unusable as anything but a guide.
We may well release the complete math at some point now that our accuracy has dropped below the 60% mark.