It's that time again - the November Mozscape index is now available! Check out the Mozscape data that is now fresh in Open Site Explorer, the MozbarPRO campaigns, and the Mozscape API.

The November Mozscape index is launching a few days later than scheduled. A miscalculation in the amount of crawl data initially included, and the fact that our crawlers are extremely efficient, led to our first index attempt this month to be about twice the size of our 77 billion URLs goal. Had we not made this miscalculation, we would have been able to hit our original release date of 11/5, but restarting the index caused our release date to slip a few days. 

Another hiccup we ran into this month was processing 76 billion URLs. It took a bit longer than our previous October index, which was only 55 billion URLs. This became glaringly apparent in one specific step of our index processing. Periodically throughout processing, we checkpoint the files that have been processed so we can roll back if something catastrophic occurs (a machine failure, file corruption, etc.). With the larger index this month, these checkpointing steps were taking noticeably longer; in some cases, it took days to checkpoint some of the larger steps. Thanks to the genius engineers on the Mozscape team, Martin and Brandon were able to come up with a solution that drastically reduced the time spent checkpointing. With Martin's update to the processing software, the time spent in some of these steps was cut from days to just minutes! Once again, taking a step back brought the Mozscape team two steps forward.

The Mozscape team is continuing to make some significant progress finalizing our private cloud solution in Virginia. We are on track to have indices produced in both the AWS cloud and our own private cloud by the end of the year. After a successful test index completed, the first Mozscape index is now in progress, running in our own private cloud. It's an exciting achievement for the Mozscape team!

Here are the metrics for this latest index:

  •  76,734,608,461 (76 billion) URLs
  •  776,343,422 (776 million) Subdomains
  •  134,499,372 (134 million) Root Domains
  •  878,838,592,381 (878 billion) Links
  • Followed vs. Nofollowed
    • 2.69% of all links found were nofollowed
    • 56.69% of nofollowed links are internal
    • 43.31% are external
  • Rel Canonical - 13.65% of all pages now employ a rel=canonical tag
  • The average page has 71 links on it
    •  61.28 internal links on average
    •  10.13 external links on average

And the following correlations with Google's US search results:

  • Page Authority - 0.36
  • Domain Authority - 0.19
  • MozRank - 0.24
  • Linking Root Domains - 0.30
  • Total Links - 0.25
  • External Links - 0.29

This histogram shows the crawl date and freshness of results in this index:

Crawl histogram for November Mozscape index

The freshest data in this index will be from October 16th (when processing began), and a good portion of the link data will be from late September to mid October. This index will reflect link data that dates back to about mid-September, but the majority of this index will be the first few weeks of October. As we continue to improve on the length of time it takes to process an index, this freshness will keep improving!

Another exciting announcement is our new App Gallery that launched a few weeks ago. Check out all of the great tools our users are building on top of our Mozscape data. If you have a free tool that you would love to see added to this page, submit a request to have it added to the gallery - we'd love to hear about it!

As always, we'd love your feedback. Hope to hear from you in the comments, where the big data team will be reading and responding as usual.

P.S. Remember that if you're ever curious about when Mozscape is updating, you can check the calendar here. We also maintain a list of previous index updates with metrics here.