Howdy gang! As promised, last night we launched our 45th Linkscape index. You'll find new data in Open Site Explorer, the Mozbar and the PRO Web App as well as in our API. We've also started to address some of the challenges discussed in prior Linkscape data, which I'll cover below.
Here are the metrics for this month's update:
- 44,210,612,409 (44.2 billion) URLs
- 452,126,131 (452 million) Subdomains
- 104,185,923 (104 million) Root Domains
- 360,491,328,983 (360 billion) Links
- Followed vs. Nofollowed
- 2.21% of all links found were nofollowed
- 58.95% of nofollowed links are internal, 41.05% are external
- Rel Canonical - 10.12% of all pages now employ a rel=canonical tag
- The average page has 77.47 links on it (down from 80.08 last index)
- 65.23 internal links on average
- 12.24 external links on average
As I noted in the September index update, we have had some serious issues when crawling deeper on large domains and encountering binary files that contain code our crawler recognizes and treats as a link. To help stop this problem, we applied a black list to this index to stop a large number of the files folks had reported to us (our estimate is that ~40% of binary files are now removed). However, we know there's still more than a few of these in the database of links so we'll continue cranking away on solutions to remove them all. Our hope is to have them reduced in the next index (November) and nearly eliminated by the December index. If you're ever curious about the next/previous updates, you can always see data for them on our Linkscape calendar.
I'm excited to announce that we're also just a couple months away from showing historical Linkscape metrics data in the web app. In the next 60(ish) days, we'll be launching a tab in the Link Analysis section showing topline link metric history for your campaign's site and its competitors. There's also tons more good stuff coming to the App before year's end, but I'll save those announcements for other posts.
But, perhaps the biggest win with this index is the full functionality now available through the domain "drilldown" feature in OSE:
You can now click on any domain in the "linking domains" view to see a list of all the URLs we found from that particular site pointing to the page/domain in question. It's a UX upgrade that, IMO, completes the clean, usable experience inside OSE and provides a view that marketers consistently want to see. Many thanks to the Linkscape + OSE teams for getting that included.
As always, if you've got feedback about our link data or the latest index, please leave a comment. Our engineers take suggestions very seriously. Thanks much!
p.s. I'd incorrectly labeled this as the "November" update, when it's obviously still the middle of October... Doh! Fixed in title, but URL will be a reminder of my not-so-smart move.
Awesome. I didn't even know it was a bug I just thought it was a trick the compeitor knew that I didn't. I was trippin'. :)
I use OSE everyday. Thanks!
Great news with the historical link data. Probably the feature I am missing the most in the software.
I look forward to seeing how your historic index will hold up against that of Dixon's at Majestic SEO!
Just to be clear - we won't initially be offering the same functionality that MJ has (where they show all the links by date and a count of those links), but we will have the top-line metrics for root and subdomains over time for comparison purposes. In the future, we hope to do lots of cool things like showing links "gained" and "lost," as well as the sorting on who in a given sector has gained/lost links over time.
Oh, I can already see how lost links report could be useful to my link building team.
Really like the idea of the history feature, it'll be useful for tracking when positions drop to when links have dropped off.
I totally agree with Thomas, and great job Moz team. I'm really looking forward to playing with the domain drilldown feature as well. This is something I've been wanting an easy way to do for a while (though it's not too hard with a little Excel magic). This unexpected little gift is great!
Keep up the good work!
Hi,
I've noticed that a lot of the back links I've created over the past few weeks have made it on this new index, which is good.
However I've noticed that my competitive domain analysis has not been updated with these new links, it's still showing with the old amount of links and domains.
Another thing I'm curious about although not exactly relevant to this actual index is my domain authority.
My domain authority is showing as measly 23 yet I am ranking 3rd for competitive keyword where those either side of me are in the region of 40-55.
Would their domain authority be more indicative of my own which I presume to be out of date?
Finally I'd like to add that I was very impressed with my trial subscription and I am now a fully fledged member.
Good work guys!
Thrilled to have you as a member!
In the web app, the new index data may not yet have propogated. It should only be 12-24 hours behind, but could be cached longer. I'll look into that. In terms of DA - plenty of lower DA sites outrank higher ones; the correlation's decent, but certainly not huge (suggesting that Google's algo is quite a complex beastie).
YEAH! Yet again your index has failed to crawl the BBB sites! Woo Hoo! (Sarcasm)
Thanks for the update..Sir..Rand ... just wanted to ask you a single question...what's the difference between your OSE and Majestic SEO's site explorer...Just asking in order to make my view clear between OSE and MS's site explorer...!
Just to share my experience. I am watching seomoz and majestic metrics for several projects and majsetic showed much higher numbers (LRDs, Links) than seomoz. Most probably seomoz system is filtering much more than majestic. But, to realize what is best for my projects I combine seomoz,majestic,links stats and SERPs. Mixture of this "ingredinets" may determine SEO strategy which will prevail and acomplish goals.
Thanks ...Damirv.. I have got my answer as I expected.. thanks a lot sir!
Very good, the api is already working!
Since this latest crawl my authority went up to 22 for a couple of days then went back to the previous one of 17. Also no new links have apeared (google is showing them in webmaster tools), some of which were showing in the time before last's (aug) update! Seems since your outage strange things are happening.
Hi 5 for SEOmoz \m/ it was simple awesome..
Looks like a nice update, can't wait to see what other new features you have lined up for the future!
Thanks to SEOmoz for the drilldown feature!
Roll on historical data. :-) As always great job Seomoz.
HomeFinder.com has some very strange results. It looks like SEOMoz is picking up links out of files (versus on web pages).
For example: Â https://www.opensiteexplorer.org/domains?site=www.homefinder.com
The 2nd domain is GNU.org and the source of the "link" according to LinkScape is this file:
https://ftp.gnu.org/gnu/m4/m4-1.4.4-1.4.5.xdeltaÂ
This seems very odd.
Question for those more knowledgable than me... Are links from FTP locations regularly crawled and indexed by Google / Bing / etc?
Yep - those are the binary files that we've not yet been able to extract. You should see less of them with this index than the prior one, but we expect it to be a problem for another 1-2 indices before we can fully clear them out.
As far as how search engines treat them... Tough to know. Since we're now ID'ing the suckers, we could try to run some correlation analysis, but my personal guess would be that they're ignored by Google/Bing.
Good to know this is getting sorted. I keep finding strange files and what's strange is that on source view I cannot find any links references or domain mentions at all. Why would it show up as a a backlink?
Cool looking forward to the bug fixes and certainly the historical data, much coolness!
Wow, this is a sick update. Bravo SEOmoz, Bravo.
Also, why do you guys only update the index every month or so? The only reason I use Yahoo Site Explorer over OSE on occasion is because the index updates on a daily basis.
It's primarily the technical challenge of the scale. Our crawling takes ~2 weeks, processing of the data (including calculating metrics like mozRank, mozTrust, anchor text views, drilldowns, Page and Domain Authority, etc.) is another 2 weeks. Hence, our indices are released monthly.
Our first priority is improving index size and quality. From there, we'll be working hard on speed. Honestly, though, I'd suspect it will be sometime in Q3 or Q4 of 2012 before we'd get dramatic freshness improvements. The search engines have a much larger budget, far more engineers, and dramatically more hardware easily accessible. Majestic produces much faster updates as well, but doesn't do metrics calculations or as much processing on link views/sorts/filters/etc nor filtering/de-duping/canonicalization. It's a trade-off to be sure, but we think we can eventually get to a point where we have our cake and eat it too.
Having used both SEOMoz and Majestic, I agree with how you go about things - would much rather have excellent quality data that's updated every few weeks than a massive data set that's much harder to analyse. I've tried to stick with Majestic a couple of times and so far ended up coming back to SEMOz within a few weeks.
Same here. Majestic did hint to me that they were working on a metric better than ACrank but even after they roll that out their only advantage will be frequency.
Historical data should be great to test =) Thanks for the speedy updates SEOmoz =)
SEOmoz, you made happier today.
Nice Update Rand. You guys are really on a roll lately. The advanced reporting and URL comparison features are just two of the newer features that our team is really enjoying. Can't wait for the historical's, it's the one feature I still use Majestic for.
I totally agree with Thomas,
Cool looking forward to the bug fixes and certainly the historical data,
Hi, its great news for me. Can u tell me any new techniques are used to decreasing alexa rank for the site. And also i want to know new techniques to increasing website vistiors. My website alexa rank and vistors are now going down.
I think that facebook metric are kind of wrong... or out of date. At Least on my site it is..
Can you provide details?
Also - sorry for all the thumbs down on this thread. Seems we've got someone abusing the system :(
Hi,   First thank you for you response, second, my english is not so good, third i loved your presentation in EXPON 2011 in Brazil, ive seen all! Now the bug!  I think the bug is on facebook likes.  1 - Go to https://www.opensiteexplorer.org/links?site=www.soprojetos.com.br and you will see around 58 facebook likes  2 - Go to https://developers.facebook.com/docs/reference/plugins/like/ and put https://www.soprojetos.com.br and you wil see around 376 likes.  3 - Go to https://www.soprojetos.com.br and click Like on facebook like box and it would make me very happy :D   I think the two numbers is the same metrics. I`m wrong?Thank you for your time.
Intrigued as to how it works - I've noticed OSE is picking up some new links, but other, older links from quality sources - such as Wikipedia - are yet to be seen by OSE. Any idea why?
I thought that the latest crawl would eliminate the 'fake' links that crept into the results bacause of your new technique for crawling deeper? Any sites that I look at, if I look at referals to the their help page for example, then I see links that just don't exist from flash files or jpegs. This makes it very difficult to use OSE results properly.Â
Is there any news on when this will get fixed as it has been a problem for months now?
Yeah - it's frustrating to be sure. My apologies. We basically blacklisted everything anyone ID'd by URL extension this round, but we think that's only killed about 40% of them. In the next index ~Nov 15, we hope to be somewhat better, and then in the Dec. index have it fully fixed (but Linkscape's complexity makes it tough to commit).
It's a tough job, no doubt about it! Can you imagine the problems that Google were having when they were Moz's age?