INTRO From Rand: Although my grandfather, Si Fishkin, has attended many industry events, provided coverage and helped with premium Q+A as well as some of our consulting work, this is his first post on the SEOmoz blog. Si was in town for the Thanksgiving holiday and generously contributed some time to the blog. Please welcome him!
Last week, Rand posted a video and some diagrams that I believe may be misinterpreted or misleading. To help make PageRank more clear, I've enlisted his help to construct some diagrams that should help to explain the issue succinctly.
First, a simple and general explanation of PageRank:
For those who are curious, the original PageRank formula is documented here, and I also like Ian Rogers' PageRank explained, here. Below, I've shown how pages acquire PageRank:
Next, a look at the ability of pages to pass PageRank:
In order to understand PageRank deeply, a few examples follow, moving from simple to slightly more complex:
In the original PageRank formula, link weight is divided equally among the number of links on a page. This may not hold true today, but is still valuable to understanding the original intent. Next, a more complex example that shows PageRank flow back and forth between pages that link to one another:
Finally, an example showing how PageRank can be "leaked." This diagram more accurately illustrates the concept Rand attempted to describe. The leak is not occuring due to a "leaky bucket" scenario, but rather, because PageRank that could be flowing to pages on the site is now lost to Wikipedia:
The PageRank "leak" concept presented a fundamental flaw in the algorithm once it became public. Like Pandora's box, once those creating pages to rank at Google investigated PageRank's founding principles, they would realize that linking out from their own sites would cause more harm than good. If a great number of websites adopted this philosophy, it could negatively impact the "links as votes" concept and actually damage Google's potential.
Rand and I both tend to believe that it is likely Google has changed and refined the PageRank algorithm many times. However, familiarity and comfort with the original algorithm is certainly a responsibility for those who practice optimization of Google results. As a caveat, I've included this graphic that Rand created several months ago for the blog to help show that while PageRank may present one way links as passing value, other concepts certainly exist.
Several resources that proved valuable during my investigations into PageRank include:
- Google Page Rank Whitepaper from Ian Rogers
- Braintiques' Chapters on Google & PageRank
- The original Google paper - Anatomy of a Large-Scale HyperTextual Web Search Engine
- An impressively detailed PageRank Calculator from WebWorkshop
I am currently working with Rand on another blog post about using nofollow to control the flow of PageRank. I hope to have that entry up soon.
UPDATE: I've noted that many in the comments seem to be confused about the relationship between the information in this blog post and the PageRank Google shows in their toolbar. I have NOT studied the toolbar PageRank - this information relates only to the PageRank formula. The rounding and inconsistency of updates to toolbar PageRank makes it very difficult (perhaps impossible) to connect with the formula.
This is an excellent explanation.
This is in line with my critique to Rand's post. Only direct or indirect circular references can indirectly affect the PageRank of the linking page.
I believe this issue was addressed when they added the teleportation matrix E to the equation. See "Deeper Inside PageRank". This modification to the basic model was introduced to address the problem presented by dangling nodes, but serves the same purpose for whole websites that don't link out. Basically what it does is it creates artificial links so that the 'random surfer' doesn't get stuck when following links. Part of the time he will follow links and part of the time he will 'teleport' to a new place at random.
Good point Hamlet - the teleporting stuff is needed to achieve true stochastic convergence, I think. Without it, a random 'sink' will gather all the page rank on each run of the algorithm with probability 1.
That is correct, Will. I found a better explanation here
Fascinating stuff... I would have expected f to bear some relation to the pr distribution that we are iterating towards (but maybe that would result in non-convergence - I haven't thought through the maths properly).
Originally, they used a uniform distribution for E, but they also mentioned another key use for the teleportation matrix: to generate personalized PageRanks (See section 6 of the PageRank paper) I believe that was the approach taken by the Kaltix guys.
Yep. Nice approach. It just gets your head in knots ;)
Good stuff. Thanks Si - great to see you weighing in with some technical analysis.
I'm toying with algorithms similar to page rank on small, closed subsets of the web (for analysis similar to Rand's whiteboard Friday about finding the source of a story) and the iterations are what's killing me at the moment. Every time I play with something like this, I get more and more amazed at how much is achieved on the fly when you carry out a search query (I know PR is pre-calculated, but still)...!
Also check out the research paper just released on maximizing page rank through "outlink optimization" - the first version of which is here
https://arxiv.org/PS_cache/arxiv/pdf/0711/0711.2867v1.pdf
I don't finger-trace the lemmas and proofs most of the time, but it points out some things I didn't know, and that seem related to discussion above.
I would love to know if Si agrees with this method of optimization.
woah. that pdf is scary. maths wise I think Si's post is just about the limit my puny brain can deal with. Would love to see you do a youmoz post on 'outlink optimisation for dummies'...
Thank you Si, great article. This is the way I understand PR as opposed to the leaking bucket theory.
Another reason, beyond the math, that I believe PR cannot be truly "leaked" (leaky bucket) is as follows.
Say PR did actually "leak". I am assuming that the amount of PR that was leaked would be determined by a certain percentage, or a sliding scale percentage based on the actual pages PR value. If that is the case, then no matter how many links are on a page, the same amount of PR would be leaked away from the page. Therefore, unless there are zero links on a page (internal or external), then the page ends up with the same amount of PR because it leaked a certain amount, just divided by X links.
Now say it was a sliding scale and the more links, the more PR is leaked from the bucket? If that was the case, the whole, "how many links on a page determines the value of PR a link gives" theory becomes a lot less important, if not completely debunked. Because if the more links a page has, the more PR it gives out, the amount of PR being sent to each link is not diminished (as much?).
Could the above be totally incorrect, and the same with the math? Sure. But I have yet to see any documentation or study that shows the PR formula has been modified to actually leak. Nor have I seen any documentation or studies showing that the PR calculation for outbound links is on a sliding scale based on the number of ourbound links or amount of links on a page. If that were the case, what would be the cap? 2 links would not get 0.5 PR of 1.0, instead it would pass 0.6 PR to each out of 1.2 where 1 link would get 1.0 of 1.0 PR?
As you said, other qualifiers have been added and that's where the big difference lies from the original formula.
+1 insightful.
It's interesting that people seem to accept that the toolbar PR and the internal-google-sooper-sekrit PR are different. Or, more accurately, that they differ in a non-linear fashion.
Perhaps PR was the gunman on the grassy knoll.....
-OT
Welcome, Si... and great post! I love stuff like this. Sometimes, when I try to explain the more complex elements like the PageRank formula, my attempt to clarify things often leads me to go in circles and end up confusing everyone, including myself. Your explanation coupled with the diagrams make it very easy to follow. Looking forward to your future posts!
What an excellent article. I would love to see similar treatment or comparison with Yahoo's algo for example
my head hurts. you really took all of the fun out of coming back to work after a long weekend. ;-)
just kidding. great post but i'll just take your word for it. the math is all greek to me.
Very helpful and insightful post and an interesting examination of the flow of link values. Thanks to Si for taking the time to put this together with Rand during the holiday! Can't wait to read your follow up to the 'nofollow' post, this is a very interesting subject.
Any updates to the algorithm since this was originally posted?
Si!
How's my fellow China traveling buddy! Great post, I totally agree.I also agree with Megans comment above - it would be useful to have a follow-up article to explain what PR is used for...
It's all suprisingly technical. I like to learn about this kind of stuff though, so thanks!
Harry
Thank you for this interesting post, Mr. Fishkin. Your post about the nofollow stuff will be expected with anxiety. The last Rand's post about the link juice was really surprising for me because I didn't know I could be harming my sites by having many links to other external sites... I hope your post could clear this topic up even more. Thanks again!
Great Post.
These explanations with the diagrams surely passes full PR into our heads.
Looking forward to the other post of "nofollow to control the flow of PageRank".
Sir Si should write more on the blogs sharing his valuable knowledge.
:)
Si,
Very nicely done. I'm a big fan of the KISS (Keep It Simple Stupid) principle, especially when I'm reading at 1:30am! You've taken a very complex subject and presented it in what I believe to be the most simple and articulate manner possible, and with all due respect to Mr. Ian Rogers, have provide a framework that I can actually use in explaning PR flow to customers. You've done a mitzvah for the SEOmoz community! Thank you for taking the time.
Very useful post about pagerank.
Very well written.
Thanks Si,
An excellent explanation. Two things make it my favourite treatment of PR and possibly my favourite post on SEOmoz so far. (Although I have certainly not read them all.)
1) It deals with a subject with a mathematical and systematic basis at it's 'root' that through it's practical application and the filter of our (or my) perception has become an opaque, confusing web 'metric' of questionable meaning or value on a toolbar. You have given me an a clear understanding without needing to coin a new label or buzz-word to simplify it back beneath it's original and important complexity.
2) It, unfortunately but necessarily, leaves the value of PR open to question. I have found it all too easy to read other explanations of things like this in the SEO world and have an "Aha!" moment only to find that with time, re-consideration or further reading I have the "oh..." moment that turns what I thought I understood on it's head or drops it back into the 'confusing' bucket.
Also the diagrams are excellent too. Thank you to whomever helped you produced them in such a clear manner.
Thank you very much. If there were a 'BIG thumbs up' button I'd be pressing it now :)
Very good explanation! I didn't know that pagerank leaking can affect internal ranking... thx!
I have just added new tool to check all Google DataCenters for PageRank
While I would agree that there are other algorithms at work, many of which are critical to producing the SERPs, I do not believe that such a fundamental change has been made to the way that PR is calculated.
How much of this article is still relevant today? Being it was written almost 3 years ago, have the algorithms changed since then? Does it still matter?
Thanks!
I still refer to this post today! For looking at the singular, naive metric of PageRank, I think it's pretty top notch still (though certainly there are lots of other features in Google's rankings).
This article, while somewhat interesting, is of little relevance. What have you learned from it? To nofollow? Everybody does that already.
I've learned that this page simply describes the mechanism of PageRank; it's not just a link that you got from a high PR page. As for DoFollow/NoFollow, it's up to the webmaster
p.s. I know, maybe I'm late in commenting (2009) but their was a bad traffic jam and great that I recently joined SEOmoz :)
I never knew what passable pagerank even was! Thanks for helping me understand that
Woo seems awesome really nice article with well explanation specially formula :) thanks for the post
Great information and graphics. Even for an individual that would read this with no prior knowledge of SEO it is presented in such a way to make since for all levels of expertise.
Thanks Si Fishkin, Literally yoy did well disclose informatio here and before reach your post, I know about pagerank is that it is useful to indetify which site have high PR but from post i got that how pagerank working as well as it is count number of link on our web page..
--------------
Internet marketing
Great Post, Great Explanation :)
For me it would be nice to publish pictures of this post in my italian blog about SEO and Web Marketing (the idea is to collect them in a slideshow with an italian translation).
Can I proceed to make a slideshow?
Let me know!
Thank you so much :)
Marco
Hi I'm a fresh newbie in this kind of seo stuffs. But I started to write a blog. My first post dated 15 Feb 2010 and on 3 Apr 2010 the Google Toolbar shows that my blog got PR 3/10. I really love this but the more important thing is I need to know what really happened to my blog. What is PageRank? How does it work? I've searched around the net hoping that I can get a clear explanation on this matter.
On Feb 2008 I built a product selling site and it got PR 3/10 also. Not so long after the date it's built. The problem is I really don't know what and which of all I have done that made the PR changed.
This post got me get the idea, not completely, but it does help me. So thank you very much indeed.
That was the most helpful article for page rank that i have ever read!
Thank sou so much for this. But i have a question: Is what we know about page rank the original Formula which Google invested ? Or goolge let us know what it want us to know ?
I guess no one can answer that :p
Page C is a leak as well because it is a "dangling" page
Si, what evidence do you have to back this up? I don't mean to ask what evidence you have that other unrelated algorithms are also in use and may be of equal or greater importance. What evidence do you have that the link weight passed from a page is no longer equally divided among the number of links on the page? Other than nofollow links (which likely are ignored from a PageRank standpoint) I haven't seen anything that would make me conclude that they've changed this element of the PageRank calculation.
Are you suggesting that if you only have one external link on the page that you will bleed off less PR than if you have multiple external links? Does this mean it no longer matters what the number of external links on a page is, from a PR standpoint? What about pages with > 1,000 links? Would they pass, in aggregate, more PR than the page has?
While I would agree that there are other algorithms at work, many of which are critical to producing the SERPs, I do not believe that such a fundamental change has been made to the way that PR is calculated.
I don't have any data to share here, but just thought I'd share my reading of Si's comment. I'm not sure he is saying that this *has* changed - just that it *could* have changed. Given how much the algorithm is tweaked all the time, I think it's a fair bet that there could be some additional factors at work.
Si, I cannot wait till you do the nofollow post, I have an easter egg for you.
Thank you for clarifying what pagerank and passable pagerank is.
For a few days we're having a discussion here about the usefullness of the PR. Some here say it is only a gadget, nothing more. Your article clearly defines what PR is all about. I want to thank you for this and therefore I included a link to this article for our dutch speaking friends, who are not convinced of the importance of PR. The comment is in english, the blog in dutch.
https://blog.motionmill.com/google-page-rank-naar-beneden-herzien/#comments
Wow, what a wonderful explanation. My eyes usually gloss over when I see an image with arrows on it, but your diagrams and explanations were great, thanks.
Si that was an awesome article.... guess I know where Rand gets his web savy from.... though his mother is another major influence....
Hi, thank you for an interesting presentation and the authentic supportive documents.
But, I am desagree with your your opening statement that the Original PR Formula May be Flawed.
The original PR formule thought of by Sergey Brin and Lawrence Page is a democratic process.
I present to you an antithesis that people are flowed not the original PR formula.
The OPRF was designed to destribute wealth around in a democratic process by utopian academians, but the sociolagic mind set today is differnt then what the original designers have intended.
We are becoming a society of hoardes, holding on to power instead sharing it with others and having come back to us, like the ORPF was designed to do.
If we examine societies in history, all that have florished and vanished have colapsed on themselves from greed of power.
Maya, Roman, Babalonian!
If we do not share PR we will destroy out we love so dear. That is Life!
So donate PR, donate Blood, donate Knowledge, donate Love!
And you will get back in multitudes.
When you give a PR link to WikiPedia you are supporting the editors who collect the knowledge of information and disseminate it to society.
Giving PR to WikiPedia is giving PR to society! You give PR to society, society will give back to you 1,000 times more. You will have an educated consumer not a clicking Zomby!
Except wikipedia won't return the favor as it nofollows the very reference links that give it any (perceived) credibility.
I see the situation differently. Arguing that the utopian formula was flawless, fails to acknowledge its intended purpose. The original formula was intended to accurately organize and decipher the internet. The internet, like people is disorganized, conflicted, and impassioned. The Google Guys should have realized that a technological entity as diverse as the internet would need to be stringently reviewed in order to organize it fairly.
The internet is an entirely different entity than the people who supplement it. Trying to make up for the weaknesses of people fails to compensate for the internets own unique downfalls. Thus the formula according to its original intention was flawed.
Igor as always I enjoy the refreshing perspective you bring to the blog. Keep posting so I can keep disagreeing. ; )
A Guru like you talking about a topic like this... that means that Pagerank really matters?
Rand, do you belive that PR is important to get good positions at Google?
Cedano, see my reply to Dustin, above. The distinction between toolbar PR and real PR is crucial here. Real PR (or whatever they call the concept internally, now) is definitely still important.
Rand you changed! New face, new name - what's next?
(Just messing with you Will ;) )
Thanks for the article. I always try and get high PR backlinks to make sure I get a high PR.
Wonderful and very detailed article on pagerank, I just don't know if any of it matters. It is pagerank and pagerank is basically meaningless unless you are trying to get advertisers to buy in to your site.
Well, there you go. You've just given one of many good reasons why Pagerank is not meaningless.
I think you are confusing toolbar PR with the PR element of the underlying algorithm - something that is widely believed to still be a very important part of the ranking algorithm.
I totally agree. Sometimes I even think that toolbar PR is meant to puzzle the webmasters :)
A lot has been said about how Google's PageRank isn't valid anymore, but it's still one of the factors that helps elevate a web page up through Google's search results.I appreciate that PageRank has many criteria to its algorithm, but if I have one criticism (or at least should I say a suggestion?) is that I'd like to have seen how anchor text and affects things...
I think it would be useful to have a follow-up article to explain what PR is actually used for. I still see a lot of people obsessing over it (meaning toolbar PR, these people aren't aware of the difference) and not focussing on their actual search results. This would probably involve a clarification of the differeneces between "toolbar PR" and internal PR.
It would be nice if there were different names for those two things - calling them both PR is sure to get people confused...