** Latest update: Monday, October 10, 2011 11:30am PDT: Historical crawl data has now been completely restored! Please email [email protected] if you have any issues. Thanks :)
**Update Friday, September 30, 2011 10:36am PDT: CRAWL SERVICE IS LIVE! We have turned on crawl service in the PRO app and the Test Crawl tool. Campaigns will have the most recent crawl data, however, historical data will be spotty as it filters in over the next week. THANK YOU ALL SO MUCH FOR YOUR PATIENCE! The SEOmoz community is truly amazing!
**Update Friday, September 30, 2011 8:10am PDT: We are activating the front end of the crawl service today after working through a small hiccup last night due to the missing historical data in campaigns. This was resolved last night and we should be able to turn crawl service back on today. The back end crawl service has been working properly the past few days so campaigns will see their most recent data, however, historical data will be spotty and will filter in over the next week.
**Updated: Tuesday, September 27, 2011 9:30am PDT
**Updated: Monday, September 26, 2011 11:30am PDT
Howdy folks! I wish I was writing you with better news, but in the spirit of TAGFEE, we want you to be as informed as possible about your PRO membership:
Due to a major PRO web crawler service outage that occurred on Friday evening, crawler-related PRO features (link analysis and crawl diagnostics) are currently disabled. However, rankings, on-page optimization, and all tools except Crawl Test are functional.
It's our best estimate that we will have all service functionality restored by Thursday, Sept. 29., however full historical crawl data will not be available until Monday, Oct. 10. We will be doing our very best to beat these estimates.
So what the *bleep* happened!?
Amazon turned the lights out on us. Well, not exactly—I’ll explain. We host a number of our web applications from Amazon Web Services (AWS). For many of these hosts we pay a fixed rate per hour, however AWS offers an alternative billing model called spot instance pricing. Spot instance pricing is a method for purchasing excess computing power from AWS at a respectable discount. Everybody wins, we get a great price for the hundreds of computers we use daily while AWS is able to sell a resource that’s otherwise just sitting around idle.
But the use of spot instance pricing comes at a risk: the computers hosting your services are only allocated to you as long as there is still excess capacity and that no one else is willing to bid more for those hosts than you are. If someone comes along offering to pay more, then AWS may revoke your hosts without any warning, leaving you to rebuild your services from scratch. This is not so bad if you can ensure you have enough computers left to still service requests… and therein lays the problem.
Our Mistake
The contract of spot instance pricing is quite clear: your servers may be arbitrarily taken from you, so you must be strategic about its usage. We unfortunately did not apply good strategy to our PRO web crawler configuration. Almost all of our service hosts were spot instances allocated with a dangerously low bid price (e.g. $2.00/hour), and they were all clustered within the same AWS availability zone (more on this later).
So we put ourselves at risk with a low bid price, excessive use of spot instance pricing, and a poor distribution of hosts across AWS availability zones. We bet that there’d be little to no chance that AWS would reclaim our spot instance hosts but we bet very wrong. At approximately 6 PM PST, AWS terminated approximately 50% of our active spot instance hosts in the PRO crawler service cluster. Around this same time, the going spot instance price shot up to $2, our maximum bid price, which triggered this culling of our service hosts.
Losing half our hosts wasn’t entirely catastrophic, however bad. In fact, it was a salvageable situation but then it got much worse. At 9PM PST we lost all of our service hosts that were spot instances (> 90%). The going spot instance price had jumped to $2.51/hour at this time, and given most of our hosts were bid at the price of $2/hour we effectively forfeited all rights to our previous claims. Our service wasn’t broken; it was just plain gone!
This pretty graph from AWS accurately documents the spot instance pricing timeline for the day in question:
Three practices that could have prevented this from occurring:
1) Use a spot instance price that is commensurate with the value of the service.
If a host was very critical to service functionality, we should’ve bet a much higher price than the $2/hour. Using the spot instance pricing chart as a guide we should have at least used a bid of $3/hour or more to ensure better chance of avoiding host reclamation by AWS.
Could we have predicted this optimal bid price? Likely not. Regardless, we should’ve bid what we thought the continued functioning of our service was worth. I think it’s easy to appreciate we now find that value much higher than the $2/hour we’d originally bid.
2) Distribute hosts across multiple availability zones.
The initial increase in spot instance price occurred in one availability zone (us-east-1c), with the secondary increases in us-east-1c and us-east-1d. Had we spread our bets across multiple availability zones, we could have weathered this price volatility with at least half of our service hosts intact, even at the bid price of $2/hour.
Although we were aware that prices could vary by availability zone, we did not use this to hedge our bets more effectively.
3) Use a mix of on-demand and spot instance pricing.
On-demand priced hosts use a different pricing strategy where you agree to pay a fixed amount per hour to AWS but in return you get certain guarantees about your host claim, most notably it won’t be arbitrarily taken from you due to demand. Had we diversified our portfolio between on-demand and spot instance pricing we could’ve ensured at least minimal functionality of our service in the worst case while enjoying some good amount of cost savings in the best case.
As with any critical investment you have to be strategic about minimizing your downside; we will do this moving forward.
So where are things?
To be frank, we are absolutely mortified that we’ve had to disable such an indispensable product feature as crawl diagnostics, especially when this service outage was otherwise avoidable. We are literally working day and night to re-enable the PRO app crawler service. Currently, we are rebuilding the API servers, the underlying NoSQL data store (Cassandra), and the various processing and crawling hosts. We are being very careful as we do this to avoid the previous mistakes, being strategic about diversifying pricing type (on-demand vs. spot-instance), distributing across availability zones and using a very competitive spot-instance bidding price.
Most of the aforementioned service components are pretty easy to restore, but we have one unfortunate problem that will somewhat delay full restoration of the service: the terabytes of data generated by the hundreds of thousands of crawls we’ve executed over the last nine months. We must load this data from our backups (securely stored in AWS S3) into our NoSQL data store, something that by no means can be done quickly.
Being perfectly transparent, this is an operation that could take the full duration of a week. We certainly don’t want to make anyone wait a full week just to see data that’s already a week out of date, so we plan to be a bit more clever with this service restoration, choosing the most optimal path to populate our database while also ensuring we preserve our weekly crawl cycle. Do we have all the solutions in place to achieve these goals? Not immediately, but we are making great progress and I’m very confident we will have more optimistic projections about service restoration in the next several days.
Ok, so how exactly does this affect me again?
As a PRO member you can still:
- Create new campaigns
- Check your rankings
- Manage keywords
- Check your on-page SEO
- Run reports
- Check your backlinks & traffic data
- Use Open Site Explorer
- Watch webinars
- Ask & answer questions in PRO Q&A
For the next week you won't be able to access:
- Crawl diagnostics for any of your campaigns
- PRO Dashboard will show 0 pages crawled
- SEO Web Crawler in Research Tools
Also as a reminder, none of the data is lost, we just need time to rebuild so we can access it.
In the meantime, rest assured that we are doing everything we can to get your PRO functionality back up and running like it’s meant to be. We realize that many of you rely on this data to optimize your company’s and clients’ sites, and want to return service ASAP so you can continue to do what you do so well. Thank you for hanging in there with us as we learn from our mistakes.
I'm sure you're all wondering about your data - here are few answers to those burning questions:
- What happened to my data? No worries - your data has not been lost and is safe. You'll be able to see your full historical data by October 10.
- What has been affected, exactly? When can I use those features again? Crawl Diagnostics is currently unavailable. You'll be able to access the feature and new data by September 29 (but historical data will be unavailable until October 10).
Wow: this is an excellent use/instance of TAGFEE. Thanks for being so honest, upfront, and direct. As a paying PRO customer, I really appreciate this uncompromising detail, transparency, and accountability.
Good luck with the fixes, give yourselves a break, and rock on!
I guess that’s the great part… community build and grow on trust and transparency and kind of honesty showed by SEOmoz team is really appreciate-able and one should learn from it!
Totally amazing transparency showin here. That's the trust / relationship that grows support and business. You don't see enough of it theses days. Thanks SEOmoz!! I'll be here when you get back because of this.
Thank you all - every one who has offered their words of support. We couldn't ask for a finer community.
I'm all for trust, transparency and honesty. In fact this blog post has been incredibly refreshing.
However, if I was an SEOmoz pro member (and I was), I'd certainly be a little confused as to the whereabouts of any compensation? And I'm not even referring to money here, how about another SEOmoz live?! :D
I just wanted to add one quick thing - just because the crawl information is not available, does not mean we lost it - we actually have all of that data - it just isn't accessible at the moment.
And as Bryce said, we are working to try and be sure that all crawls that were supposed to happen this week do so on schedule.
We are truly very sorry for the inconvenience, but I assure many team members worked late into the night (and early morning) and will be working the weekend and on until this is completely resolved.
Thanks for being so patient, and we will do better in the future.
Another thing that could have been done is having a fallback setup (rackspace??). Most clients would understand the AWS outage as a genuine reason, but some may not. How does seomoz plan to compensate for such an eventuality?
Shit happens. Still the best service around
Here here :)
Thank you Russ and Al. Your support is generous and we're very, very grateful.
So true, we live and learn and move forward learning from the experiances. Thats what we call life and look forward to getting the more uptodate information, the historical isn't as important. I just feel for the team that has to work nearly 24/7 to get this back up and running again. Good Luck!
completely true... we have all had web issues that are out of our hands at one point or another in our career. Thanks for keeping us informed!
I agree Russ, but still would be nice to get some compensation back from this. I needed that data rather badly for some of my clients, so now had to say that their reports will be late...which is kind of tarnishing our reputation a wee bit.
Still, dig on SEOMoz, you're my boys (and girls). x
Great work, that sucks it happens, thanks for the post explaining what went down.
Absence makes the heart grow fonder
Wow, this post is also a good lesson about AWS and running large data services on it! At least I learnt something :)
I'm not a PRO member and thus not affected by this outage, but this post was completely informative about an area of AWS I don't (yet) dabble in. TL;DR I concur.
It seems that we could share a lot more about our experiences with running large data services in AWS. We've been doing it for a very long time and there are some fascinating and somewhat novel techniques we've employed that I think worth sharing with the community at large. Stay tuned...
Well... the only "good news" is that problem had been caused by too many crawl requests, from what I understood... a growing business accident somehow.
I am confident you will solve this issue soon... the irony is that recently also Majestic SEO had a similar problem (and wrote a post about it); again, maybe is a not the kind of news we will love to hear, but it is an indicator of an growing industry.
I think is that in some closed room, there have been high fives as well. Server outage in most cases is a sign of the business growing.
Although, the techies have a hard time recovering after such a blow. Red Bull and Caffeine supplies need to be multiplied :-)
You guys rock, honest, straight to the point when there is a mistake, you provide excellent customer service.
Rand you should be a very proud man with an unbelievable team, wow I dream to have a team like yours and I am working on it...
Pure Awesomnessss...
Agreed - I'm amazed every day by the quality of people I get to work with.
Big thanks to Bryce, AK and all the engineers who sacrificed their weekend to work on this.
Hi everyone!
I just wanted to drop a note with a quick update.
First, thank you to those of us who have been so supportive. This is a huge mess up on our end but we still appreciate your kind words.
Second, for those of you who are upset, we are truly sorry. We are working on some creative ideas to provide some extra incentives to compensate those who were affected. We are considering things like unlimited crawls using the standalone tool, or even increasing the limit to the number of pages for while so people can catch up.
And finally, I want to assure you that the team here is working non-stop here to get the crawls back online. There were a flurry of emails at 3am and all of us are treating this as the most important priority at the moment.
Thanks again for your patience, and for everyone's support and understanding. As Rand said earlier, we had been doing this for years without issue, and never realized it was such a risk. We will not be taking any risks like this going forward, even if that means increased costs on our end.
-kate
I appreciate the transparency and all, but using spot instances for critical services that your customers are paying for is just a bad idea. There's absolutely no guarantee that any price you put on the servers will keep the price from going above it. Which means, if you continue to use spot instances for these services (which *your* customers are paying for, mind you), you are basically guaranteeing that you will be down again in the future.
While I do like the SEOmoz PRO interface, it gives me pause whether I have faith in their ability to provide me a reliable service going forward.
After this experience, I agree. However, prior to it, we'd been successfully using spot for 3+ years, so it was a surprise on our part and a change on Amazon's, too. We could have been more prepared, and we'll be thinking hard about ways to prevent this type of situation everywhere.
Thanks sporcle.
I want to know who it was that out bid you!? Do you know who they are?
Good question there. I'm sure deep in the bowels of Amazon there are data to answer that question however it's something not likely to be shared with us. I somewhat suspect there may even be some automation at the bidding side of things causing this mayhem. Regardless we will be looking deeply into this issue, engaging Amazon directly. I hope to share my findings with all you fine folks, I know there's many out there who would love to know how to run things as cheap as possible in the cloud without incurring the same issues we have.
This post will go a long way to easing the pain. I often show clients SEOmoz posts - not just from an SEO point-of-view, but with specific regard to ethics online.
I've kept a lid on it for the past few days, believing that someone else would surely say what needs to be said here, but it seems that is not to happen, so as my Grandmother would have said I think it's "time to get down to tin tacs":
When it comes to the growing list of people suggesting that some kind of "compensation" should be required, it's way past time to consider a few basic facts:
If we are worth our salt as SEO's, we ought to be aware of the concept of short term pain for long term gain. Understanding the long term consequences of the short term gain some expect should be simple enough.
In the end, we all make a decision about how we choose to conduct ourselves, but if your choice is to whinge, whine and thump the desk, please at least make an effort to keep it civil and constructive.
Here endeth the rant.
Sha
Sha,
I think it's all about setting expectations. I don't disagree with anything you said IF those expectations were set before the bill was paid.
If downtime for 10 days a month were part of a service level agreement, would people sign up? I think we have those kinds of expectations when it's a free service, but as soon as you begin paying a bill that expectation changes... dramatically. The clients we work for for free don't yell at us about deadlines... but the ones who pay expect us to meet them UNLESS we set other expectations.
SEOmoz has SURELY been a great partner to me personally and I don't ever want them to fail. I want them to succeed so much that I'd reinvest any compensation. I don't need compensation... I need reports :)
Doug
Anyone who's ever worked in IT will be familiar with this kind of event. I used to work for a Top 5 UK University and a UK Police Force area and we certainly felt this kind of pain.
What's important is what you do after it; you must be transparent (done) and learn from it (done).
Well done guys you've shown your class; if you were working for McDonalds, you'd be wearing 5-stars on your badge right now :-)
If all businesses chose this way to handle mistakes and errors in judgement all our lives would improve. On reading the post, my first thought was, "Not sure I would be putting out that much info." Interestingly, by doing so SEOmoz makes me feel more included as a customer.
As a customer, I get the most unhappy when I am lied to or ignored. By giving me the nitty gritty details and accepting responsibility for the error, you make me feel like I matter. I think most people feel that way when a business goes to lengths to let them know when an issue has arisen.
As far as those wanting compensation, their request is not without merit; for me, given the other services I have been using during the downtime that I would have been using anyway I would guess I would be owed around $3.00 at max. Given the assistance I have received over the last year or so by virtue of these tools and the assistance of staff at SEOmoz.....consider it a wash.
Thanks for the updates.
Just added this update to the post: It's our best estimate that we will have all service functionality restored by Thursday, Sept. 29., however full historical crawl data will not be available until Monday, Oct. 10. We will be doing our very best to beat these estimates.
Can't tell you much I appreciate everyone's understanding here. Just the same we are trying to employ every trick in the book to get this up faster for y'all. Very sorry for the terrible inconvenience.
Thank you so much for the update! This explains why I had some trouble working on the weekend. I figured it was a subtle reminder from the Moz team to take the weekend off. ;)
Thanks for the info. I missed the part that said, "because you will not be able to utilize all of the data you've been paying for and becasue this may cause you an inconvenience with clients, we will be discounting your monthly rate by...." I'll go through and read it again.
I hear you Mike. We're talking about ways to compensate, but a cash refund/discount probably isn't something we can afford at this point. We run very lean as a startup, but we'll see if there's a service/data/tool credit we can provide when the crawl is back up and operating.
I hear you that you can't give cash back, but I think some kind of offer would help. Maybe a sneak preview of some new feature or something like that.
The tricky part for you is probably people like me, still in their trial period. I think I'd like to keep going with the Pro membership, but it's a little hard to tell because especially here at the beginning it's the crawl diagnostics that are the most important. When my trial period comes to and end I will only have been able to see that for about a week's worth of data, so it's hard to tell for sure if this service is a good idea for me.
This post, and your general approach, make me think it probably is, but there are probably a bunch of folks like me a little bit on the fence about it.
I say this just to be transparent about it from my perspective.
I do wish you all the best.
A perfect example of how I want to run my business. I wish everyone would take SEOmoz's example on this. Can you imagine a world run by SEOmoz?
!!!!SEOMOZ FOR PRESIDENT!!!
Do it, Ben! Run your busines this way, too. One company at at time, we'll change the face of business around the world.
Thanks for your transparency. You should turn this into another blog post about how to turn a bad situation around. Lesson learned - tell the truth, people will understand.
Wow. Another glimpse of the mind blowing complexity of what goes on just to provide us with the service every day.
Thanks for explaining it all so we really have an understanding of what's happening. Honestly, the complexity of it all blows my mind!
Big thank you to everyone for all the extra time you're putting in to get things sorted.
Hope it all goes smoothly and Roger can recover his composure with no ill effects :)
Sha
We will soon share more details of the inner workings of our services. We hope to really blow some minds with the specifics of what it takes to build and operate these services in the cloud.
Thanks for your well wishes. Roger really appreciates it. :)
While I appreciate all the support you folks are getting in this thread... and while I appreciate you being open and honest about the outage, it doesn't change the fact that I was going to deliver reports and work on my clients' sites tonight and now I can't. I'm not going to scream and yell at you, but I can assure you that some of my clients won't be as understanding as the folks above.
Again, I'm not trying to be a jerk... but I am a paying customer and it appears that you're admitting above that you put the service you're providing me at risk to save a few bucks. This isn't some free tool we're using, we do actually pay for it. Do we just get a "I'm sorry I messed up!" and we live with that?
I can personally live with it - especially since I'm a small "PRO" member. If I was spending a lot more with you, I think I'd be curious about my next bill, though.
Just trying to speak the truth here. It can't all be unicorns and rainbows.
Couldn't agree more. It's great that SEOMoz are open and transparent but it does seem like there's an admission that corners have been cut and the best we can hope for is a "we're sorry".
Would be great to know what's going to happen with regard to some kind of sweetener to this increasingly bitter pill?
We are working on options, but need to make sure that we don't make things worse/put more load at a delicate time. We certainly do want to provide some form of compensation, though, and will have a plan there soon.
hey Rand, hang in there, I know you are doing your best to sort it out for us.
Best, @gunshotdigital
Yet another fine example of the SEOmoz transparency (like Rand's detailed account of the VC journey). One thing I didn't understand "Our service wasn’t broken; it was just plain gone!" What does it look from the admin console? You login and none of your files and scripts can be seen?
Good question. AWS provides a very nice web-based admin console from which you can monitor and manage your cloud resources. When managing EC2 resources from the console you're provided a paginated list of all the computers you've currently allocated, where each machine is listed alongside its status, represented as a state description (e.g. running), and a corresponding green/red icon. As you can guess, it's not entirely welcome to see a long column of red icons adjacent to the word "terminated" as we did Friday night. Ugh.
I can understand the cognitive dissonance with the concept of "gone" here. Literally there was nothing to logon to, the service hosts disappeared like they never existed - their respective hard drives re-formatted and passed onto another user like they'd never existed before. This is life in the cloud; even the very concept of "possession" is ephemeral and vague. That's why strategy is so important with the cloud, there's very little assurance of anything, but that's the price you pay for the vast power you may otherwise wield.
I just wanted to say thank you for your honesty on this issue. Depite the glitch I feel even better about SEOMoz.org than ever.
I agree. @gunshotdigital
Shouldn't the crawl analysis be up again by now?
The transparency is much appreciated. I wish larger organizations had the cojones to own up to mistakes when things go awry. I'm bummed not to have my crawl data for the next week, but I can plan around it.
Thanks for being so upfront and honest. Not to mention thorough. That's refreshing. A+ SEOMOZ.
Yes thanks for the UpDate!
I'll echo the comments this section and thank you for the quick response/transparency.
Although I do rely on this data almost everyday, I can definitely understand server issues and the fact that you are upfront about it and adressing it lets me know you care about how it affects us.
Being a newbie (3 months into my paid subscription, I have just started supplying my clients with their reports.) So I was alittle down in the mouth to see that certain features where not functioning and wouldn't be for some time.
However having taken the time to read the posting and explanation and being a business owner myself I am totally refreshed with your absolute truth and honesty and explanation of your current situation. It is a breath of fresh air for a larger IT company to use plain english to explain a situation, and also your rational behind your choices.
I beleive you have just won a customer for life here, I am here to stay.
-Ryan
GO SEO MOZ
1. Might I suggest having better (or any at all) BCP/DRP planning in place in general? I think you'll find that the old adage 'it's easier to ask for forgiveness than permission' is actually patently false, especially in business.
2. I'm disappointed mostly because I signed up for a 1 month trial on September 20th, and if this is the service... I'm not that impressed. And I can assure you that I really wanted to be super duper impressed enough to justify a significant monthly expense.
3. With that said, I totally appreciate the honesty and full explanation of what's going on, because you can't always account for any potential problem. However, if you retain the services (even part time) of a business continuity planner/risk analyst you might be able to better forsee and prevent risks to your service.
I will try to overlook this in deciding whether to commit to the monthly Pro service, as I respect the integrity of owning up to your own mistakes rather than ignoring them, giving a cursory BS explanation, or distributing the blame elsewhere. So thanks for the post! :)
We also have to watch the budgets pretty closely and I've been a Pro member for 5-6 months now (I think) and this is the first issue that has arisen and it's only affecting one main area of the service. So try not to be put off too much by this and maybe give it a go for one month after your free month, see how you get on. You're not tied in, so the flexibility to stay or cancel is still yours.
It's not great timing for you, but equally it's not great for seoMoz and at least you're not paying for the month where the issue has cropped up, so there are still things to be grateful for. :) Enjoy your trial!
Thanks Martin - really appreciate the support. PRO is going to not only get more stable, but even more valuable and useful in the next 3-4 months :-)
Thanks Amber - certainly appreciate you giving us a trial and can understand that this isn't putting our best foot forward. We will definitely learn from this and be more solid/stable/reliable in the future.
I've been a Pro Member going on 2 years and this is the first serious tool outage that I can remember.
I think you hit it on the spot - a mistake was made, but Moz fully owned up to it, apologized and then explained what they were doing to fix it. You can't really ask for more.
Mistakes will always be made - by companies large and small. The real measure of value is in how they deal with it - in this regard I think SEOMoz has acted brilliantly.
The best risk management consultant in the world can't see every eventuallity - sometimes shit just goes wrong, and you have to fix it.
No other toolset out there provides the same kind of value (at least that I've seen), so I'd definitely keep with it.
SEOMOZ has done a lot for me. I have no complaints waiting for 1 week so the problem is fixed. You can take 1 month for me! I'll substitute the inferior GWT for the moment.
Although this was a negative experience for SEMOZ, I am now educated in this area. You're mistake was a positive learning experience for all of us. I use AWS!
Thank for your honesty and the detailed information!
I have got to say I really appreciate your honesty about the problem. I hope that a timely fix goes in and the service commences business as usual. SEOmoz is a great service the best in the industry by far, but the Pro service comes at a premium price. I do consider that this issue could have been avoided; I for one expect the service to be available 24/7?
We really appreciate TAGFEE, and how your handeling this situation. Thanks for being so open with everyone.
Thank you for the detailed information.
Bruce, you need a profile picture!
Done! :)
Bummer, I hate when tech issues happen and especially when I am working with new clients and waiting on the results. @Rand glad you mentioned the options and the immediate test feature.
Thing is this: I know at least 3 or 4 active Internet peeps with businesses that had issues with Amazon. Most do pretty consistent business online. Certainly makes me think twice about using them.
Wishing SEOMoz peeps speed, large amounts of caffeine, and a big thank you party.
@amberbailey I did the trial, left and then came back. Awesome resources and despite the glitch I think you'll find good value.
But as @ulpakitty mentioned, I am relying more and more on the services--so big inconvenience and I do hope you find some way to compensate paid users since start up or not, small businesses rely on you!
As always, great transparency - thanks for the update :)
Thanks for the update. Is the Seomoz toolbar effected as well? I am still and have been experiencing issues since Friday.
Fair play for being so honest and upfront about this, cock ups do happen, after all we are only human. If only Financial Institutions were as transparent!
Yes indeed, 'cock ups' do happen. lol
I'm impressed about how upfront SEOMOZ is being. Fine example of crisis management.
I'm not a pro member anymore so I feel like I'm sticking my nose in other people's business (apologies), however I couldn't help but point out three prominent things about this post.
1. SEOmoz are incredibly transparent and honest
2. It's unfortunate that any form of compensation was not mentioned (in the OP)
3. Complaints about SEOmoz lacking a backup plan because of their lack of a backup plan is ironic
I'd like to know where we're at on this as well.
Latest update being the 27th portrays doesn't really make me confident that everything is being done to fix this. I want to see my reports, I'm waiting for that and I don't know how hard your working, probably very hard, but how could I know if there's no update.
Anyway, thanks for listening.
Hey! We ran into a small hiccup yesterday getting the UI working without users historical data, but we found a solution last night and are finishing testing this morning. We should have things turned back on later today!
The back end crawl service has been collecting data the past few days and this will be visible in your campaigns, however, we are still collecting all of the historical data so graphs may look a bit spotty as it filters in over the next week.
Thanks so much for your patience, we all realize what an inconvenience this is and we are working hard to get this back online for you!!
Thanks,
Carin
Thank you for the update Carin,
This does not mean 5:pm Pacific time, correct?
My customers' month ends at 1:pm Pacific (5:pm East Coast in the US). I need to have already reported and met with them after they've reviewed.
Not complaining, but I do not see this as possible using the results of the Crawl Tool this month.
Respectfully,
Ian
Hey Ian - totally understood!
I can assure it is the first priority for everyone this morning to have this deplyed as soon as possible today - and by today, I mean as early as possible in the day and not 5pm Pacific :)
Thanks,
Carin
Genuinely grateful. I did not doubt it and only wish I could help.
:)
THANK you for the efficient updates this morning Carin.
Ian
Very informative post and a great example of how to handle a crisis. Our company offers web hosting for a large number of clients and we experienced our servers going down last year. My sympathies to the SEOmoz team!
Sorry to moan, but this is actually not acceptable guys. We're bearing with you, but this is now 2 months in a row that we have had problems with the SEOMoz PRO service that has affected our ability to deliver our monthly SEO reports to our clients!
Last month, you were checking google.com instead of google.co.uk in our ranking reports by mistake. This caused clients to get upset as they thought their rankings had dropped.
This month we can't access the Crawl Stats.
With so many other tools out there and we're only in our second month of using SEOMoz, starting to wonder how good the service really is? Should we stay? Or what will happen in month 3...
OMG! I'm freekin out! How could this happen? What am I going to do?
KIDDING!
I'm happy that you're organization is built to scale, that you have the resources and intelligence to turn this issue into a positive. Feel sorry for the tech team who missed a perfectly good Seattle weekend! Good luck.
I know stuff goes wrong guys but that is a major oversite! I had an important report to run on my clients website for Wednesday. It has just gone live and I need to check for errors. Because of this I wont be able to run the report until next week I bet. There are only so many times you can say to a client "sorry my software is not working" before you look stupid. This will be the second time in as many weeks I have uttered that line.....
Totally hear you dude. We will make sure this experience makes us more reliable in the future, and that this type of issue doesn't happen again.
If you have Google Webmaster Tools set up you can get some crawl stats...
I have never in my life seen a company admit total liability for a mistake like this.
I have been a member for several months now and as the historical data will be recovered, I have to say hats off to you for your honesty and dedication to pull this around. I know some will say its been a disaster for them, but in all fairness this was one element of the service and you've probably done yourself the world of good by just taking it on the chin.
I for one won't be going elsewhere. Life is a learning process. If companies learn from their mistakes, they come out at the end of it in a better position.
To all who are on trials, stick in there, SEO Moz are one of the good guys.
Steve
Transparency is king - well, availability is king of kings though ;)
Thanks for your honesty, it helps keeping customers calm. I passed on your learnings about AWS to two startups. Hopefully they will learn from this challenge.
Keep on going! All the best /v
We have client deliverables due end of week, this outage puts us in a bad spot.
Rufo, they have to be transparent how. There was no stability in the platform choice, no redundancy, and a code error pushed out to the production server.
Management bet a key service on an auction bid which, with the current situation would in many other companies, be a career changing event for someone.
My hats off to the team that has to clean-up the mess; maybe three days at Rosario, families included?
Hi everyone! Thanks so much for your patience with this issue and waiting for historical data. I'm happy to post our latest update:
** Latest update: Monday, October 10, 2011 11:30am PDT: Historical crawl data has now been completely restored! Please email [email protected] if you have any issues. Thanks :)
Good to know there is a backup.
Hi, it has happened again today. Tried adding three domains or subdomains and keep getting this message:
Roger has detected a problem:
We have detected that the root domain ---.com does not respond to web requests. Using this domain, we will be unable to crawl your site or present accurate SERP information.
Is there another outage?
We're not aware of an outage. The help team doesn't monitor these older blog posts, so the best thing to do would be to go to our help page at https://www.seomoz.org/help and click the "contact our help team link" and they'll be able to help you tomorrow.
Hi folks! For all of you following this post I wanted to make sure you saw the latest update that the crawl service is back up!
Update Friday, September 30, 2011 10:36am PDT: CRAWL SERVICE IS LIVE! We have turned on crawl service in the PRO app and the Test Crawl tool. Campaigns will have the most recent crawl data, however, historical data will be spotty as it filters in over the week. THANK YOU ALL SO MUCH FOR YOUR PATIENCE! The SEOmoz community is truly amazing!
Hey Jen,
Thanks for the update!
Hope everyone there will now get a chance to at least take a breath and relax a little for the weekend.
Amazing works both ways you know :)
Sha
YouHoo!!!! SEOMOZ - hows is going. is the crawl playing nice to still giving you grief.
hope you are guys are well -
best regards,
@gunshotdigital
I am currentlly woking on a app and was doing some reliability study and came across this article.
I think the author of this post should consider the google app engine instead.
check out
https://www.youtube.com/watch?v=rgQm1KEIIuc
I'm sorry to complain, I hope you understand, but what happenned to this?:
"It's our best estimate that we will have all service functionality restored by Thursday, Sept. 29"
I don't know what time it is in the US, I guess 9am or 10am but it is certainly the 30th already... Some updates would be greatly appreciated.
great informative post, any chance i could get a couple of extra weeks on my trial?
Yeah! - Glad to see the update this morning.
I just signed up for PRO on Monday, specifically for the Crawl Diagnostics, and have been anxiously awaiting results.
any way we can turn it up to super speed? lol
Any update on the Page Crawl status?
I typically have already reported to my clients and there is no status update. I'll have to use other services until this is repaired, as my customers move forward based on the success they see in my reports.
My customers will not understand the details of what is happening (no updates or estimates). Please provide something within the hour here. Sorry this is so frustrating within your team this morning.
Ian
v Pretty, this was a really wonderful post. Thank you for your provided information.
Stuff happens.
Count me in the camp of folks impressed with the transparency. Having not only the original explanatory blog posting, but also the participation of the top leadership of the company in this discussion thread reflects very, very well on the SEOmoz team and brand. At least imho.
I am sure there are plenty of bags under the eyes over at the MOZplex. Nice work on giving to your user base straight.
In the meantime, any suggestions as to toolbars that have any type of SERP overlay and a replcacement for page and domain authority?
I like to use these to visualize serps for clients and I have a meeting tonight, but I obviously cant use SEOmoz for this one. Any suggestions?
Hi Spencer - the mozBar and SERPs comparison tools should be working fine! This just affects web crawls. We did have ~50minutes of slow access to the Linkscape API, which might have caused some issues with toolbar data yesterday, but that should be all fixed.
I am confident you'll fix the problem soon. Stay up day and night to fix things is something i can surely be relate to. Come on. YOU CAN DO IT !!
This happened at the WORST possible time, however I have to respect this level of honesty and transparency. I can't stay mad at you. Thanks guys and gals!
Good work SEOmoz on comming out with a full explanation and letting the customers know exactly what is hapening when it is hapening =)
Hopefully we will see Opensite explorer back this week =)
guys, thanks for the reporting! I am sold to your honesty and happy to wait till the problem is resolved. I wish all commercial companies out there adopted this ethical approach and recognised their mistakes from time to time. After all, we're all humans.
+ the tip about how AWS selling excess computer power and the lessons to learn is also a great takeaway for enterprise folks in this forum.
Thanks for the transparency! Wasn't expecting that level of detail about the outage.
Great read. Thanks for sharing.
Your honesty policy is what makes SEOMoz an extremely successful business.
If every business took a leaf out of your book, society would be all the better for it.
Many thanks, and learnt about AWS too!
Will happily wait for resolution.
transparency is much appreciated hopfully all of us learn some thing and plan better.. Good Luck guys...be patient and believe in seomoz team, moz will be back on track soon...:)
It Happens! Thanks for the transparency.
Cheers
Thanks for the updates Bryce and you too Rand.
Much appreciated.
Sha
Thanks for being so open and honest with us. These tools are still great and well worth the price tag - so thank you for the hard work and all the best with getting things sorted! :)
Hey guys,
I decided to test the free one month trial just a few days ago. Will you take your mistake into consideration and prolong the free trial period accordingly?
Thanks
Hi polyniki,
Welcome to the community :)
(...looks like we were typing at the same time - my comments below were not directed at you)
But I do guess being so honest about it is good. Its the least you could do really to try and keep people from asking for re-funds.
Years of free subscriptions would not cover what this is costing me right now. I just love being on a work trip, in the board room of my most important client, bring up the site crawl report to cover that only to see some douchy bug crawler gif. Oops. I'm sure the President of the company would LOVE to book another meeting with me.
CLOUD FAIL on your part. At least you made the mistake because you were being cheap and lazy and not actually coding something that didn't work.
As an IT expert I don't even want to know why this would take you more than 24-48 hours to fix but whatever.... Good Luck on all that.
Anyone have recommended providers with better service?
<edited inappropriate language>
@Sorbie - PREPERATION FAIL on your side, mate. Presenting live IT stuff without backup strategy shows a lack of experience. I made this mistake, too. Back in the early 90's last century...
And flaming the good guys wont do anything good. EPIC FAIL.
@SEOmoz: Thanks for transparency. Keep on fighting!
So I am not to rely on the internet any more? I'm supposed to export 3,000+ page errors and display that in what exactly, when I already have a system to do that?
Stop defending and acting like you know anything.
<edited inappropriate language>
And, by the way, how can I create an offline version of data I cannot get to?
You didnt get the point, nvm. EoM.
Zero reward for transparency. This is 100% expected in today's world and to give more props than a "thanks for letting me know" is unnecessary.
Sorbie you obviously rely heavily on the great tools SEOmoz has to offer, and you don't even know of any other tools??? Yet here you are raving and ranting about how YOU had no back up data, no other tools to run the crawl, and a very bad relationship with your client. Grow up and take responsibility. SEOmoz made a mistake but atleast they have owned up to it like an adult. You know Sorbie, this may just be Karma.
Great work Rand/SEOmoz
Sorbie - my sincere apologies. We're working hard to get this fixed as you know and we're changing our architecture so it can't happen again. Until then, here are some tools I can recommend:
- www.seomoz.org/blog/crawler-faceoff-xenu-vs-screaming-frog Dr. Pete covered both Xenu and Screaming Frog here
- www.google.com/webmasters Google's free crawl info
Those three may be worth checking out, and in the next few days, we'll have the crawls repaired here and a few weeks later, all historical data restored, too.
I'm in the same boat sorbie is in but, it seems, much less upset about it. This is a huge inconvenience for me and will definitely have an effect on my bottom line. I have had to reach into my bag-o-tricks to scramble for the data I need. I like the transparency but unfortunately feel that this is more of a Netflix style response. If you've seen the South Park where BP tries to drill into the moon for oil, it's kind of like that. I don't want apologizes, I want my data! Or at least some type of compensation for the service I'm paying for and have come to rely on.
Now that I've finished venting I wish to clarify the only reason I'm upset is because of how heavily I have come to rely on the services that you provide. You guys do great work and make the lives of SEO folks like myself much, much easier.
@sorbie: Y ya mad bro. Life is shiney
One more point I should have added - once we have the crawl service back up (hopefully next 48 hours), you can run an "on-the-spot" crawl using https://pro.seomoz.org/tools/crawl-test and get results back faster than waiting for the PRO app to get through your site in the queue. I realize you may not be able to wait that long, but I should have added that to my earlier comment as well.
@sorbie You're definitely on the IT side of things and not content or user relations. You have completely valid points, but when you address them in a way that warrants your message being edited for inappropriate language (twice) it makes people not take you seriously. Yes, this sucks (especially for your situation as you described it). Yes, pro members should get some kind of compensation. But it is what it is. If you don't like the service, try another one (but is there anything really comparable? eh...).
Didn't see some of of these comments back in September... no one will likely read this but for what it's worth Rand did connect and compensate - it was not really necessary but was a very nice thing to do and I thank them for being impossibly attentive. Super-huge props to MOZ on that.
I'm actually on the relations side - that's why it's most uncomfortable to go into a meeting and whip out a blank report. Backups you say? I'm fully aware, but we need to treat the web like it's 2012 and not 1997. This was a big issue for the MOZ, it's not like an internet connection went down. Still, I lean on the side of needing to trust my services will be up as opposed to spending the time constantly keeping everything in triplicate.
I also DO NOT recall or use inappropriate language so I have no idea what was edited.
All in all, the MOZ is pretty good. Mostly I feel like this service collects data form other mostly-free sources, so while I like what the MOZ does in general it's not a miracle - but we appreciate the organization of information. 100% of what the MOZ does might represent a small percentage of what we throw at SEO overall but it's an important percentage and does save us time.
Still, the information we use takes much exporting and reshuffling of data so we actually add a lot more muscle to this after the fact because of all the tools, functionality and cross-over sources it doesn't include - some of which might be impossible because it's simply industry related or proprietary methods no one wants public, or they can't get to the source data. I'm not about to start listing my methods or ways to improve MOZ in this post!
Next time I will take my business issues directly to the MOZ and keep my attitude off the discussion board since it just seems to upset people and I don't think anyone needs to waste their time defending the MOZ.
Hi Sorbie,
Since it's been quite a while since this happened, I don't recall exactly what was edited from your comments. We do remove comments that don't follow our community etiquette.
Thanks!