The long awaited Linkscape index update is here. We've gotten a lot of feedback, we've heard about a few success stories and we have a few thoughts from the development side to share with you.
First, we've included about 38 billion URLs, from about 230 million sub-domains (e.g., twopieceset.blogspot.com) inside about 48 million second level domains (e.g., *.blogspot.com). As Live's Nate Buggia recently pointed out, there's a Netcraft survey which suggests that there are ~75 million "active" domains. So we're certainly reaching a scale which gives us a comprehensive view of the web. 38 billion URLs is not double the previous number of URLs in the index (nearly 30 billion); however, it reflects that we are doing some deeper crawling of URLs and domains we already had indexed. Really, of more interest is that we have about 450 billion links, which is more than double our previous index of approximately 170 billion links.
We're also making the top 3000 links available per URL and domain in advanced reports. These links are also filtered so that no more than 10 links from any domain are shown. This dramatically increased volume and diversity of links gives you the opportunity to see many more of the top links along many dimensions (mozRank, mozTrust, etc.). And the anchor text analysis is much more representative of your presence on the web as a whole.
To illustrate the variations in our link counts, consider these sites and pages. You can see, almost across the board we know about substantially more links for any site and page, and have used this broader view of the web to update mozRank. The small general decline in mozRank is an indication that we've spread mozRank across more pages. In general we've found a higher correlation in our latest data to Google's toolbar PageRank, when excluding penalized sites.
You should note that because our index has grown substantially, these additional links and changes to mozRanks do not reflect growth in new links, but rather in new links we've discovered. It would be unwise to compare link counts from the old index with counts from the new one. Instead comparisons should be confined to metrics for sites and pages drawn from this latest update. This artificial update effect will diminish as we refine our processes and reach the end of the beta period.
How does this benefit SEOs?
-
A bigger and fresher index means:
- Greater accuracy in link counts and domains
- Greater representation of what the search engines see and how they might interpret and use the data
- More accuracy in mozRank & mozTrust, leading to better data comparisons & analysis
- More fresh data that helps understand what's happened in the recent past
-
Up to 3000 links per URL in the report means:
- Know about more links that point to you, so you can request anchor text changes, conduct better self analysis, or fix links that are broken
- Reverse competitive strategies more comprehensively to analyze how they're winning
- Find links that you could possibly acquire from your competitors
- Get better anchor text distribution data
-
URL normalization means:
- Link counts aren't biased by sites and pages that create duplicates
- Our data is more like the major search engines who also do this stripping and canonicalization
-
Limiting to ten links per domain means:
- You can see a wider variety of links from different domains
- 3,000 links will show you at least 300 unique linking domains (often many more)
Here's a quick list of some of the things people have used Linkscape reports for:
- Analyze their link counts, mozRank, mozTrust against those of pages ranking above them in the SERPs
- Look at anchor text distribution and numbers to see why a site might be ranking where it is for a given term/phrase
- Reverse competitor links to find sources they can get themselves
- Look at the relative value of particular links based on the juice they pass and the quality of their domains/pages
- Compare mozRank to PageRank to see if there is a large discrepancy (often indicating a penalty if mR is much greater than PR)
- Use link counts in conjunction with traffic data from sources like Compete, Hitwise, Quantcast, etc. to see how link numbers, mozRank, mozTrust, etc. map to traffic
Speaking of link counts, there's a lot of ways to interpret links, especially when you're adding them up. Here are a few thoughts about how we count links:
- We do not double count duplicate links from and to the same page. For instance, we don't consider the two links to our homepage in our header and footer as separate links.
- We do a great deal of URL normalization. We strip common URL parameters (e.g., SESSID, jsessionid, redirect, etc.) and remove any resulting duplicate links.
- We do not collapse the source and target of 301s, 302s, meta refreshes, etc. for the purposes of link counts. Of course, we do pass the properties (e.g., mozRank) of the source to the target.
That last point has been a controversial design decision and has led to some confusion. To get a full view of links to a page you should run reports for several versions (e.g. www and non-www). However, one advantage to this approach is that it lets you analyze your link profile at a very fine granularity. For instance, we can see who's linking to "https://moz.com/web2.0", "https://web2.0awards.org", and "https://web2.0awards.com" all separately from each other. This helps us to understand our marketing efforts and quantifies the contribution of each of these different URLs which point to the same content. Also if we wanted to remove the 301 and rebrand one of these domains, we have some idea of where we would be starting out from. We do list 301s as single links in advanced reports for the destination of the redirect.
Unfortunately, this makes some of our link counts look smaller than you might see from some other tools. Because we're consistent within our tool, you can compare the numbers you see for different pages to get a relative sense of popularity. But you can't, unfortunately, directly compare our numbers to other tools. I suppose this is the sort of thing you come to discover in any beta ;)
It turns out that most the technological challenges with the back-end revolve not around scaling our data collection, but rather around processing and serving data. So we back-end developers have been very busy re-writing our processing pipeline and completely distributing our API architecture, which is why this update took so long to get out the door. You guys probably care about this work because of our substantially improved performance for our PRO toolbar, which we're also publicly announcing today! I'll let Danny tell you more about the toolbar, but both of these back-end changes should support our API product and help us to provide you with much more frequent index updates.
We'll probably see quite a few other changes to the product both visually and in terms of the data throughout Linkscape's beta period. Obviously we'd like to continue to improve our coverage of the web while keeping the quality, relevance, and freshness of our data equally impressive. If you have any feedback, feel free to post comments on our feedback thread. We always appreciate it, and I hope some of you can see that some of your feedback has made its way into this update.
Fantastic work there, but how is the update rate on the old index? Are you crawlers mainly working on indexing new links and discovering new sites, or do they also crawl to the old ones to see for updates?
Great question. This update mostly ads new pages, and doesn't do much to improve freshness of old pages. The current internal plan (this could change) is for the next update to mostly improve the freshness of pages already crawled, and then the next crawl (and crawls going forward) to provide a more balanced mix of priorities (new pages, freshness).
Choosing what to crawl, and when to crawl it, is a fairly interesting problem that we are still learning a lot about doing better.
That's good to hear and I can understand choosing what to crawl can be a pain in the head. However, I see that a lot of the sites that didn't return any results before now are returning results - which is fantastic! Makes my job a whole lot easier..
This is brilliant - I am late to commenting here, but just wanted to add that when Ben says something is an "interesting problem", I think me means "really really hard".
Yes this is great. I've just run a report, and I saw an improvement link data of my website.
I tried to read the 'page-level metrics' and I couldn't read each category. Could you maybe make that picture bigger?
Great news guys! I'm excited. Awesome!!!
I'm still missing a feature where we can suggest URLs we want crawled.. Any chance that might come ?
I think that for the foreseeable future, no. Linkscape operates similarly to the major search engines and crawls URLs it comes across rather than having sites submitted to it.
Kenneth - yeah, what Rebecca said, although we are looking at ways to allow for more on-demand crawling in future versions.
I'm glad to see the update was completetd! I also can't wait to see where this tools goes in the future as it has already helped me in so many ways. I love pulling up my competition in Linkscape and see what efforts they have put towards SEO.
https://www.majesticseo.com/research/competitors-analysis.php ^_^
great work, if you could also look at some more geo locations such as Australia & UK would be great, just a focus on .com.au, org.au...
this update has made two of my websites for which i couldnt get any results work. It's such a great tool, i love it. My only request would be to make it more multiple geolocations friendly: Uk, france, spain, italy.... but that might be too much to ask
Nope - not too much to ask. We expect that by Q2-Q3 of 2009, we should have very good coverage worldwide, in virtually every language and geography.
Apparently the issue above is not an issue at all...
I'm not sure if you guys have information on this. But is MozRank and MozTrust like two different anaylsis of the link profile?
Like MozRank similar to Google and MozTrust is Similar to Yahoo? Or, is it all closely related to Google.
I'm asking this becaus I'm confused on how some sites have LESS trust than Rank in most cases but when you get up near 6's and 7's, the sites tend to have more Trust than Rank.
Maybe I should ask site support...either way, I'll give this a shot for a few days. :)
This is a great question.
mozRank is a measure of global link popularity. By this we mean that every page gets a chance to vote for other pages they link to. We believe this is very important for indexation and also has impact on rankings.
If you check, you can see that we have a high correlation with Google Toolbar PageRank, although we're expressing more digits of precision and don't include penalties, which have different effects on rankings and indexation in different situations.
mozTrust is a measure of global link trust. By this we mean that each page is treated very differently when voting. We have a set of "trusted seeds" which we believe are very reliable and trust-worthy. The pages they link to are also considered trusted to a lesser extent and the pages those pages link to are trusted to an even lesser extent.
We believe comparing mozTrust to mozRank gives can give you an idea of how much a page or site is pushing its link building strategy. A large disparity may indicate black-hat techniques, although not always. Completely white hat sites often do not need to worry about mozTrust.
Thanks Nick. So, sounds like I was right. MozRank=Google, MozTrust=Yahoo! You just can't officially say it. :P Haha. Thanks for clearing things up for me.
It would surprise me if there's any major search engine that isn't considering metrics like both of these. Google did put out the PageRank paper, and there is a paper out of Yahoo! Research about the TrustRank algorithm (which may use some similar intuition to mozTrust).
So it may not quite be a Google/Yahoo! split here.
FYI: It still states the data was last collected 2+ months ago.
Yeah - that's a bug we're working on. Should be fixed soon, along with a couple others around inaccurate numbers for DJ and mR passed.
congrats on a job well done! i'm really excited to start comparing new reports against previous advanced link intelligence reports. thanks for all your hard work!
Let me also point out that older versions of the early beta toolbar will no longer work. There are several recent versions which will work. I suggest that you update your toolbar to the most recent version.
Congratulations to the team - this looks really impressive! I particularly like the 3000 links and no more than 10 from one domain thing - this sounds like it will give much better data to work with on a day to day basis.
I look forward to playing around with it!
My congratulations! But unfortunately I've been experiencing some problems with making Advanced Report in Linkscape within last few hours. Are these things connected?
They very well could have been - there was a period of time when some portion of the reports were failing after we rolled out, and this could be what you noticed. Are you still seeing any problems?
Unfortunately yes, when I'm trying to run an Advanced Comparison Report, I'm redirected to page https://www.seomoz.org/linkscape/compare/advanced that is absolutely empty and credit is not charged.
Great work guys. I've been waiting for this so it's great to see it's now here. Going to run my reports again to see how things have improved. The previous reports gave some great results so I'm sure this will now be even better. Keep going. The quality of the index is what will add value to this tool above anything else.
I've been using LinkScape for awhile now and love it, I can say the same about the toolbar.
One thing that has me concerned (forgive me if this has already been addressed) is the amount of data we're given you when using these tools. It's obvious what sites are ours by looking at the data we're pulling with our member ID. Also, when we surf, is this data also being collecting (of course). I guess you have to have a lot of trust in seomoz to use LinkScape and the toolbar.. I guess my question is how can we feel safe using the tools?
I'm not on the moz staff but have been apart of this community pretty avid recently. I trust Moz a lot. Given that. What should I be worried about? Like, what are your concerns, solitude?
I'm not asking the questions but I like to know other's concerns since I'm getting into making FF plugins. Thx.
Hey Joshua I hope you don't mistake my question as having an untrustworthy feeling about how the data is handled that seomoz collects, I'm a member here for a reason. I think the group here does great things for the community and I've pointed that out a number of times with other people in the industry.
My concerns we're more aligned with talks that I had with Jon Henshaw at SEO Raven, he also has a kick ass product but at the same time people using it are also providing a lot of information about the sites that the do seo for which he acknowledge and said the topic came up at Pubcon (Vegas). Even though both sites house this information I was more interested in the fact that another seo or person on staff would have data about which sites I own and how _ ... etc. etc. and with Moz's tools I think its more of an issue of who looks at the info? For instance Jon's taken measure to answer a number of questions, to quote a conversation we had he said:
"The things we've did from the very beginning, other than making sure our data was secure in general, was severely limit access to client data in Raven to a very small group of trusted people, none of which do any type of SEO or other marketing work for Sitening. Another change that we recently made was to remove me completely away from any Sitening work. As of a few weeks ago, I'm 100% Raven and I don't have any contact with any Sitening client work or projects. The other thing we did was assign a dedicated lead developer that is also 100% Raven and is for all intensive purposes, completely separated from Sitening. Lastly, we've been discussing for a while creating a legal separation of Raven from Sitening."
I think he's done a lot to help the users feel comfortable using his product, and I was only asking what measures SEOmoz is taking to do the same?
Hmm. That is a good point well made. Thanks for the clarification. :) Now you got me asking it too. :D
solitude & Joshua - We do keep a lot of information about the reports that are run so you can see and access your past reports. However, we don't sell or re-use that data in any way other than self-analysis (to figure out how many reports certain types of users are running, what kinds of sites people care about, etc). In terms of sharing the data; we've never done that and I can't imagine we ever would - that would be a clear violation of trust.
You can check out the privacy policy and terms of use for both SEOmoz PRO and the toolbar, but I can also try to have Sarah and the tech crew put together something in more friendly language for the future. Thanks for the suggestion!
Congratulations, I look forward to testing the new improvements.
Congratulations SEOmoz team on the update.