The Big Data Team at Moz has had one of the hardest periods to date producing an index (and we've had some nightmarish doozies in the past). In June, after our funding from Foundry & Ignition, we started running 4 simultaneous indices on Amazon while also starting to set up our new hybrid (our processing and Amazon storage) cloud data center in Virginia. This tactic mostly worked until August, when we had 4 indices collapse on us due to a high rate of Amazon EC2 disk failures. Naturally, this makes us want to move off the cloud more quickly, and it also meant our bill was higher than ever - nearly $700,000.

But, despite those 4 lost indices, today we have one that survived. It's now available through Open Site Explorer, the Mozbar, the PRO web app campaigns, and the API.

Here are the metrics for this latest index:

  • 64,023,562,478 (64 billion) URLs
  • 1,282,691,523 (1.2 billion) Subdomains
  • 148,634,588 (148 million) Root Domains
  • 651,894,828,133 (651 billion) Links
  • Followed vs. Nofollowed 2.28% of all links found were nofollowed
    • 55.53% of nofollowed links are internal
    • 44.47% are external
  • Rel Canonical - 13.74% of all pages now employ a rel=canonical tag
  • The average page has 71 links on it
    • 60.74 internal links on average
    • 10.87 external links on average

And the following correlations with Google's US search results:

  • Page Authority - 0.34
  • Domain Authority - 0.24
  • MozRank - 0.20
  • Linking Root Domains - 0.24
  • Total Links - 0.20
  • External Links - 0.24
The Big Data Team knows you want fresher and larger indices and they have been dedicating all their time trying to deliver. To that end, we have moved to Amazon’s new crazy-fast cluster compute machines that have 6 times the computing power we have now, allowing us to reduce the hardware needed to process! It should mean fewer failures due to fewer boxes required to run the processing cluster. We are also continuing down the path of standing up our own hardware in our cloud data center in Virginia. Huge thanks to the team for aggressively working on the processing and on our own solution.
 
As always, we appreciate and look forward to your feedback!