Happy Holidays!! The December Mozscape index is now live! The latest index has just been released and you will see fresh Mozscape data in Open Site Explorer, the Mozbar, PRO campaigns, and the Mozscape API.
The Big Data team was hoping to provide a special holiday treat launching two indices in one month again, but, unfortunately processing was bitten by a full machine failure. We've had really good luck running Mozscape processing on the larger, high compute AWS machines, but, sadly, just a few days before the index was complete, an entire computing machine failed which forced us to have to re-run a few steps. Even with the failure, the December index is a few days earlier than our scheduled release date on December 27th - a pre-holiday treat for everyone!
In even bigger Big Data news - our private cloud is fully up and running in Virginia and we are about 25% done with our first production ready index! If all goes well, we'll be releasing the first Mozscape index created in our own private cloud in mid-January. What a way to bring in the new year!
Here are the metrics for this latest index:
- 78,671,787,078 (78 billion) URLs
- 687,827,137 (687 million) Subdomains
- 136,539,340 (136 million) Root Domains
- 917,094,026,686 (917 billion) Links
-
Followed vs. Nofollowed
- 2.32% of all links found were nofollowed
- 56.69% of nofollowed links are internal
- 43.31% are external
- Rel Canonical - 14.07% of all pages now employ a rel=canonical tag
-
The average page has 72 links on it
- 61.38 internal links on average
- 10.45 external links on average
And the following correlations with Google's US search results:
- Page Authority - 0.36
- Domain Authority - 0.19
- MozRank - 0.24
- Linking Root Domains - 0.30
- Total Links - 0.25
- External Links - 0.29
The histogram for the freshness of the index's crawl data shows a pretty high volume of fresh crawl data coming from middle of November. This index will have data ranging as old as the end of October, but a large volume of the data was crawled from the middle to end of November.
We'll be keeping an eye on things over the holiday, so send us your feedback - we always love to hear your thoughts! And remember, if you're ever curious about when Mozscape is updating, you can check the calendar here. We also maintain a list of previous index updates with metrics here.
WOW !! 78 Billion URL's and 136 Million root !! How do these numbers look against the likes of Majestic and Ahref ?
The most interesting fact is " 14.07% of all pages now employ a rel=canonical tag " which looks quite high to me. Can you check what % of the total domains had pages with rel=canonical tag ?
It might be that one big site with millions of pages is using the tag and Inflating up the percentage numbers ?
I know I haven't got around to half of the stuff I need to take care of on my sites. I'm in the 'never finish, always in progress' category of webthusiasts, but if there are so many people using the rel=canonical tag, doesn't that mean there are a lot of sites containing similar or duplicate content on their pages?
I've noticed a fair bit of new data that isn't showing in OSE.
For example, links to one of my sites (DA 51) from several pages on Huffingtonpost.com published 2 October (PA 51) and 10 November (PA 48).
Only a couple of examples, but there are others, and these are leading me to doubt the recency and therefore relevance of OSE data.
I am also experiencing a similar thing, but the links in question are also not showing in other tools such as Ahrefs. Could be an issue of the bots not crawling deep enough, though that is unlikely to be the case with Huffingtonpost.com of course.
Hey Mark,
Sorry to hear your new data isn't showing up in OSE. There could be several reasons for this - are these pages deeper on your site? In order to get a diverse snapshot of the web, we do have to prioritize what is being crawled and sometimes that means crawling less deep into some domains. You do have a relatively high DA which means we would crawl deeper into your site, so it's interesting these are being missed. If you want to send me some specific page examples, we can investigate further!
If you're not comfortable posting these pages on the forum, feel free to email me directly at [email protected]. The team is on vacation for the holiday, but we can look into this after the New Year holiday.
Thanks!
Carin
Did not seem to work that well - completely missed data dating back over a month ago
Can you let us know more about what data is missing? Either let us know in a comment here, or with an email to [email protected]. Thanks!
Love data!
Thanks for all the hard work MozTeam. I sure do love fresh data! :)
Thanks for the Present. I'm glad that Moz is back on track with their indicies.
My dream is to have one every week :)
MC Day post cheers!
Looks very interesting, thanks for provide the great and fresh data.
Have lots of fun Enjoy.. Happy Holidays!!!
Nice Christmas present from SEOmoz :)
Great Stuff !!!! Thanks SEOMOZ for Sharing very useful data.
It's really like that I hope fresh data set. Thanks for your post.
This is very impressive set of data. Thanks for this post.
Carin i have seen every month mozscape provide really impressive data because data is really transparent, there are minor changes in 1st points but all others are same so heads of to you Carin for providing these data with transparency.
Followed vs. Nofollowed (Nov)
2.69% of all links found were nofollowed
56.69% of nofollowed links are internal
43.31% are external
Followed vs. Nofollowed (Dec)
2.32% of all links found were nofollowed
56.69% of nofollowed links are internal
43.31% are external
This data is interesting..
fab news! I check my site's Authority every day, even though i know they only change monthly :C
Any news regarding making the Authority updates more frequent? even twice a month ?
Hey Yoav,
We're working toward more frequent updates in the new year - our goal is to update all the Mozscape index metrics more frequently, especially the counts and authority metrics. It's the top goal for Big Data right now!
This is awesome! Not sure about this but I was under the impression that Google does not want to be scraped anymore (even though they scrape)... Are these sites being crawled by a separate (private) index?
Hey there!
Mozscape index data is collected by crawling the web - this data isn't collected scraping Google. Hope that help clarify!
Thanks,
Carin
Whoooohooo love the fresh data! Excited for your private cloud to get up and running in January too!
Love the fresh data! Thanks Moz Team!
Cool! Very helpful. Thank you and Merry Christmas!