This week Rand gets a bit technical with a whirlwind walkthrough of how the search engine link graph works. Fully-qualified domains, pay-level domains, internal juice, external juice, domain trust, etc. It's all explained in this fast and furious installment of Whiteboard Friday...enjoy!
SEOmoz Whiteboard Friday - How the Link Graph Works from Scott Willoughby on Vimeo.
Here's a link to Michael Gray's post that I mentioned in the video.
Am I the only one who has serious issues with Vimeo? I click play, nothing happens, I click play, repeat ad nauseum... (and yes, I wait for it to buffer)
I know that there was a reason you guys chose Vimeo but can I humbly suggest that you create a YT version of these as well? Quality may not be as good, but it's a lot more stable (IMO): using something like TubeMoguil means it shouldn't even take any more time..
The problem with YouTube is size and time limits...anything over 10 mins has to be broken into smaller chunks, which means a whole 'nother batch of editing, rendering, and encoding time, which is honestly the bulk of the time effort for these videos.
TubeMogul does look pretty cool actually, I may give it a shot...thanks for the suggestion.
I like it in Safari. It loads faster than the old YT version and MUCH MORE faster than Google Video. But I'm on a mac...so that might not help you ciaran...
Rand, So if I understand this correctly... Domain PageRank is theoretically calculated from a web graph where: - Each node represents a pay-level domain. In other words, all pages from a given domain are "compressed" into a single node. - Each edge represents a link from domain A to domain B. In other words, all links from all pages on domain A to all pages on domain B are "compressed" into a single edge. Is that an accurate understanding? I don't understand how the results of that calculation would be useful data.
Well - if you're a search engine and you want to calculate which domains are more important than others for things like:
Likewise, as an SEO, you'd need this data to figure out how important a potential domain is in comparison to others, whether your "domain" has made/lost progress in relation to competitors, etc. (and a million other things that accurate link graph information would help inform).
Rand, I agree that "importance" could potentially be used to determine the outcome of those situations, but what I don't understand is why a search engine would calculate importance at a page-level (e.g. Google's PageRanks) and then sacrifice that page-level granularity by combining that data into a domain-level lump sum. In other words... I can't think of a practical use for domain-level PageRank that page-level PageRank wouldn't be better at. Using your examples: if there's duplicate content, rank the page with the highest PageRank if new content appears, rank that content just like you would for any other page if new content appears, recrawl that page according to its PageRank and update frequency (just like you would for any other page) if content exists on two domains targeting the same keywords and there's not much difference in page-level factors, the page with the highest PageRank should rank first If I were Google... I would be very hesitant to program domain-level signals into my algorithm, because they simply can't be relied on with the same certainty that page-level signals can. In other words, I would never want a bad page (from a good domain) to outrank a good page (from a bad domain). Google puts their users first, and a user doesn't want to see bad search results... regardless of what that domain's other pages are doing.
Just because a page has low PageRank doesn't mean it's "bad". It might not yet be popular, but that doesn't mean it's not good content. In the absence of direct editorial endorsement, there's something to be said from new content coming from an authoritative source.
It can be quite easy to manipulate page-level PageRank (imagine creating many pages on the same domain which all link to a single target page), but slightly more difficult to manipulate domain-level PageRank (imagine creating an equal number of pages on domains you have to pay to register). There's some research which theoretically and empirically backs this up.
@Nick... I don't mean "bad" as in low PageRank--I mean "bad" as in low value/quality according to Google's average user. In other words... Google doesn't want to show its users crappy search results, just because those results came from domains that have historically provided quality content. Sure, the correlation between "good pages" and "good domains" exists, but it isn't strong enough to actually program into Google's ranking algorithm. There would be too much noise. Regarding page-level PageRank manipulation... you're right, it IS quite easy, but there are alternative methods for preventing the abuse you described. Google doesn't index every URL it comes across--each one has to meet certain quality criteria. We can assume that the mass-production of webpages for PageRank manipulation would result in low-quality/duplicate content that Google wouldn't put on its link graph in the first place. Rate of content generation could be another signal of abuse. Hey BTW, Nick... how's your Friday night going? Crazy party we got goin' on in here, eh? LOL!
Darren - I don't mean to be rude, but I think it would be very presumptious to think that we could guess, without looking at the data, which potential metrics would and wouldn't be valuable. As Matt Cutts once said, "if we thought we could get a better algorithm using phases of the moon, we would." I honestly can't picture domain level link popularity metrics not being valuable at some point or another. Especially when we do know (and have heard from search reps) the importance of having a good "domain" (not just good pages), it seems strange to actively ignore potential concepts of domain-level metrics.
I'm with Darren on this one... About this party we're having. I hope you guys had some beers while we had that virtual night at the bar ;)
...I think it would be very presumptuous to think that we could guess, without looking at the data, which potential metrics would and wouldn't be valuable. I don't find that statement to be rude, but I do find it difficult to accept. Isn't that basically what SEO is all about? I mean... hell, that's what I thought we were doing right now: discussing what we know... to figure out what we don't know. I don't know if search engines use domain-level link metrics or not--that's why I'm asking you and Nick to provide more details about this theory, so I can decide if I want to change my guess from "no, they don't" to "yes, they do." That aside... I'd like to mention that my original comment wasn't doubting the existence of all domain-level link metrics--it was only doubting the usefulness of your Domain PageRank metric (as I understood it). Somehow that comment led to me defending the need for educated guesses in SEO. I'll return the focus to my original comment now. My argument is that major search engines do NOT use a domain-level link graph to determine the importance (or relevance) of a single page. Or in other words, I do NOT believe that a bad page (from a good domain) would outrank a good page (from a bad domain) merely because the bad page has a good host name. Reasoning that supports my guess: There would be a substantial increase in cost associated with adding a domain-level link graph (or any domain-level metric for that matter) to a search engine. It would require the search engine to store redundant data and to perform redundant calculations (since no major search engines are going to replace their page-level data). Considering how incredibly anal Google is about cost/performance... there's just no way they would build this into their search engine, unless it provided heuristics they couldn't get from page-level data (I'm still waiting for an example, BTW). The only reason I can think of to use a domain-level link graph is if a search engine can't afford a page-level graph. For example, here is an excerpt from section 6.1 of the TrustRank paper: To evaluate our algorithms, we performed experiments using the complete set of pages crawled and indexed by the AltaVista search engine as of August 2003. In order to reduce computational demands, we decided to work at the level of web sites instead of individual pages. (Note that all presented methods work equally well for either pages or sites.) We grouped the several billion pages into 31,003,946 sites, using a proprietary algorithm that is part of the AltaVista engine. Although the algorithm relies on several heuristics to fine-tune its decisions, roughly speaking, all individual pages that share a common fully qualified host name3 become part of the same site. Once we decided on the sites, we added a single link from site a to site b if in the original web graph there were one or more links from pages of site a pointing to pages of site b. There are 2 notable points in that excerpt. 1.) Despite having access to Alta Vista's proprietary search engine data, the researchers had to build their own Domain PageRank-style link graph, which subtly suggests that Alta Vista wasn't using that metric. Yes, the paper is from 2003, so this doesn't really have anything to do with a current search engine like Google, but I think it shows that domain-level link metrics are NOT as obviously useful as one might assume. 2.) The researchers include a reason for working at the domain-level, which suggests that this low-cost strategy falls outside the norms of conventional web document IR. In other words, the researchers' counterintuitive choice to NOT use page-level granularity was one that required an explanation.
Rand this is very logical. Excellent explanation.
Correct me if I am wrong - it's not the numeric page rank of a website that matters but it's the hidden domain linkage attributes behind the green bar that Google uses to decide which domain to rank higher.
At the same time, if we were to decide the ratio of deciding factor that Google considers while deciding which page to rank higer between onpage factors (LSI, h1, title etc) & offpage factor (trust rank, domain level PR etc), what would it be like ? I mean which one would rank higher -
Page of a Domain with Strong Onpage factor - A domain with good internal linking, rich & orignial content following all the rules of Latent Symentic Indexing but a weak domain level PR?
Page of a Domain with Strong Offpage factor - A domain with weak onpage factors and less topical content but strong trust strong domain Page Rank?
An ideal balance of both - In this case what would be an ideal balance for Search Engines? - 50-50 or 30-70..?
Great video, very refreshing to see all the stuff summed up in 10mins.
@Rand
Still confusing after going through all these comments, I will be waiting for the blog post to get a good hold at it.
 Thanks for the wonderful information.
I'd like to come back to the difference between Domain Level PageRank and Domain Juice... here's how I understood what Rand said in the video:
Domain Level PageRank - bundling all content on the domain into an equivalent single document, with this document assuming the entire domain's link profile.
Domain Juice - the sum of all the individual PageRanks of all pages on the domain.
As far as I can see, both Domain Level PR and Domain Juice are just functions of all the inbound links into the pay level domain.
Take this analogy - you're asked to weigh 50 lumps of plastecine. Now you can either weigh all the individual pieces and sum them, or you can bundle it all up together and weigh the combined mass. Am I missing something, or is this the only difference between the above measures?
EDIT: Forgot to add, excellent video. I'd love to see more advanced stuff like this :)Â
Luke - maybe I can explain it this way.
Let's say that tons of pages are linking to you from tons of different domains, but none of them are particularly high PageRank pages. They are, however, from very important domains. Your Domain Level PageRank, in this instance, might be very high, while your Domain Juice would be considerably lower.
Thus:
If the numbers are very close (Domain PageRank and Domain Juice), you've got a balance between important pages and important domains linking to you. If, however, they're out of balance, chances are you either have a high concentration of links from just a few domains (giving you high Juice in relation to Domain PageRank) or you've got a very diverse domain link profile, but they often come from not very important pages on those domains (giving you the low Domain Juice vs. Domain PageRank).
Does that make sense? Maybe I should try to illustrate this on the blog next week.
Rand, that does make more sense - cheers.
So let me get this right. Domain Juice - By summing all the PageRanks on your pay level domain, you get an idea of the PageRank of the pages linking to you. High numbers mean you're being linked to by important pages.
Domain Level PR - By effectively assigning the PageRank of all pages on the site to every page, a kind of 'implicit' PageRank, or domainwide importance indicator is generated. Lots of links from pages with high numbers mean you're being linked to by important domains.
I'm sure there's an excellent illustration to be drawn on the subject, but in the mean time...
Hey rand - The more I think about this (domain PR/Juice), the more attractive the notion of one of your illustrations becomes! Great vid though - Great content and very well delivered.
Hmm. I'm surprised this was considered 'confusing' by people. It's just like the Electoral College.
Although Yahoo! and Google have slightly different algo's it's simple. Quality links rule the game. :)
Right - managed to get it to work on IE. Great stuff Rand and already sent to all our designers & devs as a whistle-stop tour to links in SEO.
Only really got one niggle - green ink on the whiteboard really doesn't show up that well!
Great videos, really enjoyable and well explained, Thank you! LT
Hi scott, I've got a question: The URL metrics of a specific domain tell me that moz rank is e.g. 6.47 and that 18,27% of mR are passed on..now when doing the math 18,27% x 6,47 it gives me 1,18 rank points which are being passed on. But when do a sorting e.g. on the Dashboard according to "Top 5 Links to the URL www.xyz.com (ordered by mozRank passed)" I get sites ranking higher despite the sum of (mR x mR passed) being lower? What is the explanation for that? Thank you for helping here.
Regards, JC
The % mozRank passed values are confusing. We're conflating a few things here because mozRank has a logarithmic scale.
You should focus on the % not as a % you can use, but instead as a relative measure of the importance of a link.
If one link says passes 15% and another says it passes 5%, then the former is 3 times as powerful as the latter. But the actual % doesn't have the significance it should.
Sorry for the confusion.
Google Guidelines For Quality Raters give a little insight...Is someone here quality rater by Google?
Tnx for video Rand!
 I am just not sure about bad neighbourhood stuff.
 Does Goole have those king od filters and how they work.
Great video, well explained and easy to understand, even if not easy to implement.
Great video Rand, you sure know how to talk to people in regular terms. You should drop some F bombs in the videos to spice em up :)
f bomb fridays.
Great Whiteboard Friday. Confusing not in the least and will definitely help as I have to have this conversation with our IT early next week. Thanks again Rand for making my life easier
Hey Rand,
I really enjoyed this video. I was wondering where you came up with the idea of Google's starting with a top X (100, 500, 1,000) sites and using the measure of how many links away from those sites another site resides. Is it in a patent? Just a hypothesis? Did a little birdie whisper it in your ear?
It makes a lot of sense to me, but I was just wondering where you heard this or how you figured it out.
Willy - I think this is actually just common practice in the IR field when building/talking about creating a search engine. You have to start crawling somewhere, and all the papers we've ever seen on the topic suggest that crawlers tend not to choose "random" sites, but instead compile a list of relatively trusted, authoritative (according to human intuition) sites as a starting point.
Yahoo!'s research paper on TrustRank also notes the fact that starting from this trusted crawl, you can see that 1 jump (level of links) away from these domains incurs a low level of spam, 2 jumps incurs a higher level, 3 jumps starts to get significant, etc. It's the principle TrustRank is based on :)
@WillyF... Trust Rank
@Darren @Rand - Thanks. Those are exactly the answers that I was looking for.
Hmm...I'm not understanding the difference between Domain-Level PageRank and Domain Juice. Anyone care to go into more detail for me?
Let me see if I can essssplain on a high level.
Domain-level page rank is based on paid domains that link to one another...
example
hp.com linking to nbcolympics.com = how domain level page rank is built up.
Domain Juice is if you were to add up the pr of all the pages within a paid domain.Â
example
hp.com/printer (pr 4) + hp.com/cameras (pr 2) + hp.com/copier (pr 3)= domain juice of 9
clear as mud, right?
Exactly right, Levi - though I'd use the phrase "pay-level domain" rather than "paid domain" which could be confused with "paid links." :)
You are absolutely right.
From now on I shall call them all pay-per-domains. That should clear things up. HA!
Man, the more I think about it the more 'tired head' I get.
I think my confusion may be along the lines of what Darren Slatten commented about further down the page.
Thanks Rand,
I've been debating on building a link web analysis for our internal sites and this pretty much put the nail in the coffin.
In other news, what you mention about distance from highly trusted domains is a great reminder of why you shouldn't disregard a link relationship just on the number of links another site/page has. Look at who those few in-bound links are and what kind of trust you're looking at before dismissing it.
Good vid, I will have to watch it a few times more and maybe one or two of those at half speed to get the most out of it, but very informative.
Off topic, but this got me thinking, most of us know that people don't really read on the net or like the info in short bits, but what are the stats on videos and their viewership. What is the video attention span?
We tend to see a fairly precipitous dropoff when videos run over 10 minutes. Also, we tend to see lower viewrship when they're under 6 minutes (which is surprising).
Great Video Rand,Â
 I didn't find it all that confusing.  I did find it to be a great glimps at what Search Engines might be doing to rank our sites. Â
Fantastic video Rand. As you rightly point out, when we think/talk about understanding search engine algorithms our focus is so often zoomed in on one of those tiny little pages you so carefully drew on the whiteboard (great illustrations btw!) that its great to zoom out and see it in terms of the big picture.
It was worth staying an extra 10 mins on a Friday evening to watch this! My brain has now reached a higher level of consciousness and I will not just be looking at my own pint of beer tonight but the connections between all the pints in all the pubs in town (i may even make some external links while I'm at it)...
Rand, great video on the bigger picture of pages and domains. Too often I get focused on page by page concepts with out thinking about the whole as it's made up of all the page parts.
Great presentation Rand. It wasn't confusing at all the way you explained it. I like watching these presentations because I find trying to explain these things to clients very challenging. You're definitely helping out heaps. Thanks as always! This will be a good one for the next client who is positive they want to start a microsite on a separate domain (and why they shouldn't do it ... obviously!)
Great job Rand as always.  You explained the bigger picture of linking really well. I like how you broke things out from pay level domains and so fourth.