The Big Data Team at Moz has had one of the hardest periods to date producing an index (and we've had some nightmarish doozies in the past). In June, after our funding from Foundry & Ignition, we started running 4 simultaneous indices on Amazon while also starting to set up our new hybrid (our processing and Amazon storage) cloud data center in Virginia. This tactic mostly worked until August, when we had 4 indices collapse on us due to a high rate of Amazon EC2 disk failures. Naturally, this makes us want to move off the cloud more quickly, and it also meant our bill was higher than ever - nearly $700,000.
Here are the metrics for this latest index:
- 64,023,562,478 (64 billion) URLs
- 1,282,691,523 (1.2 billion) Subdomains
- 148,634,588 (148 million) Root Domains
- 651,894,828,133 (651 billion) Links
-
Followed vs. Nofollowed 2.28% of all links found were nofollowed
- 55.53% of nofollowed links are internal
- 44.47% are external
- Rel Canonical - 13.74% of all pages now employ a rel=canonical tag
-
The average page has 71 links on it
- 60.74 internal links on average
- 10.87 external links on average
And the following correlations with Google's US search results:
- Page Authority - 0.34
- Domain Authority - 0.24
- MozRank - 0.20
- Linking Root Domains - 0.24
- Total Links - 0.20
- External Links - 0.24
Super Awesome News! :)
Really curious about the cost of the server farm - w/ all of the disk failures is it less expensive to go through Amazon rather than making the capital investment and having ownership of the farm? Or, does the design & maintenance make it too complicated & expensive to roll out in house?
Not at all. Prior to our funding from Foundry, we couldn't have quite closed that gap, but since May/June, we've been working on getting new hardware installed in a hybrid cloud facility in Virginia and moving processing to that location. We ran a test index there last week, but encountered some bugs, so it's probably 6-10 more weeks before we can kick off our first index at that facility. Once we do, we can move processing off Amazon permanently, and focus our spend with them on stuff they're good at (parallelization of API serving, hosting, etc).
I'm sorry, but did you say $700,000 beans a year? There goes my Mozscape knockoff idea;)
Actually, that $700,000 is just for a month.
Hence my comment about bringing that stuff in house! $700,000.00 (cent$ added for effect) - that is a whole heckuva lotta beans!
Most definitely! We're working on making that happen. :)
I was complaining about my co-lo costs earlier. I'm just going to shut up now.
sparagi, close! That number was for the month of August.
I just fell off my chair. Sat back down again and fell off again.
Crazy, right!?
same here Nick! I mean just look at the never ending zeros!
My campaigns in Pro are definitely still showing old data :(
When do you guys think it will update across the board? Thanks!
Ditto. Old here. Boo!
Hi! We're looking into that right now and will update when we have a little more info. So sorry about that!
The data in your campaigns should all be updated now. Thanks!
You guys are doing a truly bang-up job as always Moz! I hope all of our props are worth somewhere around 700K a month!
The marketing, SEO, and related universes love us some data and the service you provide is indispensable for this. I am excited to see how the service evolves as you develop the ability to move away from Amazon.
Keep up the great work SEOmoz!
Thanks Philip! We all really appreciate that. :)
I hope the data is more reliable now and there is less down time on OSE.
You have posted really valuable updates of this month brilliant effort putting by SEOmoz team.
Just checked out OSE and when running a query it shows Last index update: 8/14/12 ?
Is this correct?
Forgot to add I am still seeing old data in the web app. When will this be updated?
The same goes for me. Everytime I check a site on opensiteexplorer.org it shows "Last index update: 8/14/12". I've tried a number of times with different sites and it's the same every time.
Yes the date is incorrect, however the data has been updated. Sorry about that!
The data in the web app has been updated!
I also see ' Last index update: 8/14/12' in OSE and it also shows old data in my web app... .
The date is updated in Open Site Explorer now as well! Sorry about that confusion!
The date is incorrect, but the index is fresh. We'll get that date corrected this morning. Sorry for the confusion!
Oh well, that's the rest of my day spoken for then. Good stuff! :)
no wonder why the PRO tools and OSE haven't been working properly these last couple of weeks then... hope u fix it soon
Sorry you had to spend so much, but I am glad it's working.
Would love to see a new post detailing the correlations with actual search results, also how OSE / search result correlations have changed over time.
The correlations have been mostly consistent in our keyword set over time, with some minor fluctuations depending on the size/freshness of the index. This is something we plan to look at in detail near the end of the year/early next year as we ramp up for the 2013 version of the Ranking Factors. We'll be sure to report on it then.
I really love this tool.
Rel Canonical - 13.74% of all pages now employ a rel=canonical tag
Wow - that is pretty quick uptake.
I was just checking out the updates and found that data (PA and DA) which shows on the Seomoz toolbar are different from the data on Opensiteexplorer. What is the reason behind this? and which data should I consider for my site?
They both pull from the Mozscape API, so the numbers should be the same. Are you sure the two numbers you're looking at are the same? If so, would you mind emailing help[at]seomoz.org with the URL so we can check it out? Thanks!
Thanks Jennita for response...I think I should wait for one more day to know the actual numbers, as I found some fluctuation in it. BTW the site is www.taaza.com
Thanks for sending over the site! We're going to use it to do some research on why those numbers are different right now. We really appreciate the help!
Hello again! It looks like the numbers are matching now. We had an initial caching issue. If you wouldn't mind double checking though and letting me know I'd appreciate it. :)
Yes, the numbers are matching now. Thanks Jennita for your support :)
Excellent!
Great news and will love to see more posts like this.
Uncontrollable downtime and data loss is one of the challenges with working in the virtual space. But as you say, you are learning and it will get better. I expressed interest in one of your open positions and put my best foot forward and it took a while for some of the communication to happen, but you guys did get back to me (thanks Sierra!) which is better than no response at all :). Wish I knew Ruby as well as I do Java, but I still have time for that; I'm just getting started.
That's a lot of links ! It means you've already started to reach out of the galaxy in the outer space even more and not focus only on the middle no more.
Great news !
Some tools that use the API stoped working overnight! ;(
Still the future looks bright.
Which tools?
Thanks for updating data parallel with Google algorithms update.
I now see fresh data across all apps. Thanks!