Over the last few months our engineering team has been working feverishly on a new index. Our crawlers were extremely successful, but, we ran into a few bumps in the road along the way. Those proverbial bumps shifted our expected launch date from the beginning of April... to today. It's not all bad, though.

The new index went live today, and it's big. 159 billion URLs big.
 
 
That said, we learned an important lesson in all of this -- maybe bigger isn't better. Our community has voiced their opinion and we agree, consistent index launches are essential.
 
That's why we've made changes to dial back our crawlers to a manageable size in order to address the community's number one concern, releasing indexes on a reliable schedule. Our overarching goal is to increase index size over a long period of time, all while updating our processing architecture to maintain a reliable release schedule, starting with our next index that'll be launching in the coming weeks.
 
Before jumping into the juicy details of this index, I wanted to first point out that the other news (ahem) we announced today will play a large role in alleviating these delays in the future. Plans include both growing our team of engineers to solve these complex scaling problems and investing in the necessary computing resources to consistently produce a larger index. Money can't do everything, but it sure doesn't hurt when it comes to index consistency. :)
 
Here Are The Latest Stats
  • 159,751,604,443 (159 billion) URLs
  • 1,114,893,161 (1.1 billion) Subdomains
  • 153,439,996 (153 million) Root Domains
  • 1,768,519,682,804 (1.7 trillion) Links
  • Followed vs. Nofollowed
    • 2.47% of all links found were nofollowed
    • 64.05% of nofollowed links are internal
    • 35.95% are external
  • Rel Canonical - 11.13% of all pages now employ a rel=canonical tag
  • The average page has 82.90 links on it
    • 71.75 internal links on average
    • 11.15 external links on average

A Few Caveats
 
I know, I know -- there's always got to to be a catch. So, the index isn’t 100% up-to-speed right off the bat. But it's close. In an effort to get the new index out, we had to make a few sacrifices. Namely, our Anchor Text call will still be indexing the old index when queried, meaning that if you request Anchor Text information it will be slightly dated. The legacy Anchor Text funnel will only hang around for 6 - 8 days from now, until we roll out a refresh to the index. Then, all will be back to normal.
 
What's This MozScape Stuff?
 
Finally, you may have noticed Linkscape's shiny new API pages and snazzy new name. The short of it is that Linkscape is now Mozscape, both to better scale our API naming conventions and to refresh the brand. Along with that refresh came a drastic increase in speed, bumping the API rate limit from one request every 10 seconds to 10 requests per second on the paid levels, some great case studies of folks using our data and a simplification of our pricing model. It's just the beginning of our big plans that we have in store.
 
Enjoy the updates, and if you've got any questions about the new index or Mozscape, drop me a line in the comments or via email.
 
 

Index update 5/9/12 (from Carin)

The full index is now live! This index is exactly the same data as what was released on 5/1, but includes the updated Anchor Text views.