It's that time of the month again! Mozscape's index is updated with fresh link information and Open Site Explorer, the Mozbar, PRO campaigns, and the Mozscape API all have new data.
This update featured some interesting work from our engineers, who've switched to new machines on AWS, which appear to be more stable and able to process our indices (unfortunately, they're about 4X the cost, but given our prior challenges with EC2, this is worth the price). We're still experimenting with our hosted cloud solution in Virginia, and we're hopeful that an index will be produced there by the end of the year. We'd originally hoped for earlier, but it's proving to be hard... Very hard :-)
Here are the metrics for this latest index:
- 55,852,954,589 (55 billion) URLs
- 1,017,642,509 (1 billion) Subdomains
- 114,764,072 (114 million) Root Domains
- 594,059,681,856 (594 billion) Links
-
Followed vs. Nofollowed
- 2.23% of all links found were nofollowed
- 56.47% of nofollowed links are interna
- 43.53% are external
- Rel Canonical - 12.98% of all pages now employ a rel=canonical tag
-
The average page has 71 links on it
- 64.15 internal links on average
- 9.16 external links on average
And the following correlations with Google's US search results:
- Page Authority - 0.34
- Domain Authority - 0.24
- MozRank - 0.19
- Linking Root Domains - 0.24
- Total Links - 0.20
- External Links - 0.24
I've also got a histogram to show the crawl date and freshness of results in this index:
As you can see, most of the index was crawled in the beginning to middle of September, and there's ~1/3rd of the index from the latter half of August as well. Thus, this index will reflect links that occurred during that timeframe. As processing cycles get faster, the data becomes fresher, but we've got a lot of work to do to get there (thankfully, we also have a great team working on the issue).
As always, we'd love your feedback. Hope to hear from you in the comments, where our big data team will be reading and responding as usual!
p.s. Remember that if you're ever curious about when Mozscape is updating, you can check the calendar here. We also maintain a list of previous index updates with metrics here.
Hmmm most metrics seem to be the same but Followed Links appears to have dropped by 9-25% for some larger sites I monitor, does that mean OSE is storing less stale/dead links now?
This index is about 9 billion URLs smaller than our prior September index: https://www.seomoz.org/blog/september-mozscape-update-is-live, so on average, seeing raw link counts drop ~8.5% wouldn't be surprising.
Thanks for sharing this. I think it would be awesome if you give such guidelines with each update. Easier to refer clients to your statement :)
You can calculate off the analytics page: https://www.seomoz.org/api/updates for any prior index. Just take the previous index size divided by the current one to see macro level growth/shrinkage.
How much smaller is the OSE index gonna get ?
We don't plan to go below the 40 billion mark, and we may go as high as 70-75 billion in the next few updates. Long term, the plan is to go back to 150B+ URLs, but we need better quality metrics around what we crawl and faster processing before we can get there.
Looking forwards to that day. It would be fantastic to rely on just OSE. At the moment I am having to collate data from OSE, Majestic, AHRefs, Webmasters etc and merge into one spreadsheet - as I am sure a lot of others are. Its a bit of a pain but I would prefer to have it this way and know that you are improving the metrics, long term they will be much more beneficial to most SEO's I would imagine...
To be fair, on most of the sites I am working on, AHRefs and Majestic aren't picking up that many more links than OSE at the moment anyway.
That's been my experience, too.
Unfortunately, for the sites I work on Ahrefs and Majestic pickup 25% to 50% more. Sucks having to pay for two of the same services. But I prefer OSE's reporting and metrics, which is why I'm hanging on for now.
I use them too (well Majestic anyway). I'm still getting a problem of many links that A) don't exist when spot-checked and B) come from sites/pages that I'm not sure have any impact (lots of duplicates and lots of stuff that doesn't seem to be in Google's index). But yes, I'd agree that 5-10% of those links are real and are ones OSE should have that we've missed, and that sucks. Promise we'll keep working to get there.
That's great.
RE: Majestic ..it helps to filter out "mentions", and "deleted" in their reports to get a better picture.
I also use Majestic and GWT's link count monthly (in addition to OSE), simply to not have to depend on one data source, I think it provides a better top level picture also.
Crawling the web is a gigantic task. That has to be frustrating but very interesting and rewarding. Keep it up!
This is why I love SEOmoz. Always keeping it fresh!
Yeah, that's almost certainly the case. As the index changes size, the relative amount of links seen for any given domain/page will change in approximate proportion. This is the same reason why you can grow your link count, but Google Webmaster Tools might show fewer because they're not indexing as many.
In terms of Domain Authority - this is also relative, so if the rest of the web is growing their link quality/quantity more quickly than your site, you'll generally go down in proportion (the reverse also being true). I wouldn't worry much about a point or two, though. If you compare your metrics to those of 3-4 key competitors on a regular basis, this can help give you a baseline over multiple indices and time.
Last week I tried using the OpenSiteExplorer but it kept re-directing me to the seomoz homepage for some reason. I tried using Chrome, firefox and even a proxy.Really screwed me over that day because I needed some urgent info :)Does anyone have a solution in case it should happen again?
Good data no doubt Rand. But i am missing you in White Board Friday. Please come soon with few useful stuff. Have a nice day.
I always love an update to OSE and the MozScape API, i'm kind of like a kid on Christmas :)
Hey buddy.... I am also experience this... :)
Great with updates. But why is there a drop around september?
We weren't crawling quite as quickly for that one day :-) Think it may have been a partial outage/loss of some of our nodes.
Great work Rand! SeoMoz is gonna conquer the whole world soon ;) gj
Great update SEOmoz as helpful as last month, great to see this latest index metrics. Waiting to see your experiments and next interesting updates in index metrics.
Is this what initially messed up the data for some large sites like Twitter showing a DA of 8?
It looks fixed - good to see a fast update. :)
No, that was actually due to a bug in a separate software update that was rolled out on Friday. Sorry about that, but the bug has been fixed and hopefully won't happen again.
Ahh thanks Brandon! Glad to hear it's fixed. Although comparing my site to Twitter was at least fun for ONE day. :)
I'm just stunned, very time the new Mozscape is released that it is at all possible to emulate google. Keep up that work.
Fantastic, we can see all the data on our new site now!