Drive Failures Affecting Some Customers' Rankings and Reports

Comments 33

Please keep your comments TAGFEE by following the community etiquette.

E-mail me when new comments are posted

Sort by:

Comments are closed on posts more than 30 days old. Got a burning question? Head to our Q&A section to start a new conversation.

Gianluca Fiorelli

2012-05-02T16:54:16-07:00

Thanks for the info and update.

Just one question, which I know it is not just mine: the csv export from OSE are getting really slow in these last days, sometimes so much that - personally - I find myself deleting my requests.

Is this issue, which honestly is quite painful especially when you have to audit sites, related to this problem or it is caused by something else? Because something like this happened also a couple of month ago, but it was related - as told me - to outstanding volume of links export request.

Thank you

(being not a problem I had only, but others, I preferred to use this post occasion instead of the feedback).

gfiorelli1 edited 2012-05-02T16:55:23-07:00
5 1

Thanks for the info and update. Just one question, which I know it is not just mine: the csv export from OSE are getting really slow in these last days, sometimes so much that - personally - I find myself deleting my requests. Is this issue, which honestly is quite painful especially when you have to audit sites, related to this problem or it is caused by something else? Because something like this happened also a couple of month ago, but it was related - as told me - to outstanding volume of links export request. Thank you (being not a problem I had only, but others, I preferred to use this post occasion instead of the feedback).
Cancel
- Aaron Wheeler
 
 2012-05-02T18:06:25-07:00
 
 Hey Gianluca! The OSE export slowness is an unrelated known issue that we're looking into. It's been going on for a couple of days and we're not yet sure of the root cause, but once we find out, speeding it back up will be a top priority! Thanks for asking. I know it's really painful to have to wait a long time.
 
 2 2
 
 Hey Gianluca! The OSE export slowness is an unrelated known issue that we're looking into. It's been going on for a couple of days and we're not yet sure of the root cause, but once we find out, speeding it back up will be a top priority! Thanks for asking. I know it's really painful to have to wait a long time.
 Cancel
 - A-W
 
 2012-05-03T00:29:33-07:00
 
 Oh so there is an OSE Export issue in there? I was trying to export data for a site yesterday and it was like hanged for quite long, and I started to doubt my browser and my net connection, thought there may be some issue with the net connection.
 
 I hope you find the fault and repair it back. Best of luck!
 
 2 1
 
 Oh so there is an OSE Export issue in there? I was trying to export data for a site yesterday and it was like hanged for quite long, and I started to doubt my browser and my net connection, thought there may be some issue with the net connection. I hope you find the fault and repair it back. Best of luck!
 Cancel
 - Fusion Unlimited
 
 2012-05-03T03:43:37-07:00
 
 Couple of days? Since the new interface more like
 
 2 0
 
 Couple of days? Since the new interface more like
 Cancel
 - Carin Overturf
 
 2012-05-03T08:27:58-07:00
 
 Adding an update:
 
 The slowness on OSE was caused from an unbalanced load across our API cluster after launching the new index on Tuesday. The spike in traffic brought to light a small configuration change that needed to be made.
 
 Our engineers were able to resolve this around 5pm PST last night, so you should see better page load and export times today!
 
 There was a separate, unrelated, isssue on Sunday night when our machine writing and exporting the CSVs fell over. Our ops team made some quick recovery, but, unfortunately, there was a big backlog to work through which caused some weird behavior on reports in flight.
 
 If you're still having any issues with reports not finishing up, let us know and we will look into them!
 
 2 0
 
 Adding an update: The slowness on OSE was caused from an unbalanced load across our API cluster after launching the new index on Tuesday. The spike in traffic brought to light a small configuration change that needed to be made. Our engineers were able to resolve this around 5pm PST last night, so you should see better page load and export times today! There was a separate, unrelated, isssue on Sunday night when our machine writing and exporting the CSVs fell over. Our ops team made some quick recovery, but, unfortunately, there was a big backlog to work through which caused some weird behavior on reports in flight. If you're still having any issues with reports not finishing up, let us know and we will look into them!
 Cancel
 - A-W
 
 2012-05-03T23:59:36-07:00
 
 Thanks for the quick updates!
 
 1 0
 
 Thanks for the quick updates!
 Cancel
RodneyRiley

2012-05-03T06:43:17-07:00

Your experience with SSD is very interesting. I hadn't realised that they were so bonded to the number of read write cycles, I'd always taken them to be a 'rough guide'. Your mistake will stop me making the same in a few months when we change our network drive over to a RAID NAS box. SEOmoz teaching me stuff I didn't know again :-)

3 0

Your experience with SSD is very interesting. I hadn't realised that they were so bonded to the number of read write cycles, I'd always taken them to be a 'rough guide'. Your mistake will stop me making the same in a few months when we change our network drive over to a RAID NAS box. SEOmoz teaching me stuff I didn't know again :-)
Cancel
PeterAlexLeigh

2012-05-03T03:40:36-07:00

I genuinely find it incredible that a mechanical drive can outlast an SSD! But anyway, thanks for the update.

I've been experiencing extreme problems with OSE as well - I end up deleting about 90% of the reports I'm running because they're just hanging for 1+ days.

Hope you get it fixed soon!

3 0

I genuinely find it incredible that a mechanical drive can outlast an SSD! But anyway, thanks for the update. I've been experiencing extreme problems with OSE as well - I end up deleting about 90% of the reports I'm running because they're just hanging for 1+ days. Hope you get it fixed soon!
Cancel
Thomas McElroy

2012-05-03T10:32:21-07:00

Update for 5/3/2012- Monday's rankings are collected and Tuesday's are progressing as expected. We have started reprocessing the custom reports, and we expect them to be completed and back to normal by Saturday. We will update this post again tomorrow morning.

3 0

Update for 5/3/2012- Monday's rankings are collected and Tuesday's are progressing as expected. We have started reprocessing the custom reports, and we expect them to be completed and back to normal by Saturday. We will update this post again tomorrow morning.
Cancel
- Thomas McElroy
 
 2012-05-04T11:59:30-07:00
 
 Update for 5/4/2012- Monday and Tuesday's rankings are collected and Wednesday's are over 80% complete. April monthly reports and last weeks reports are being re-created slower than normal, and we expect them to be completed and back to normal by Sunday night.
 
 We will update this post again Monday.
 
 1 0
 
 Update for 5/4/2012- Monday and Tuesday's rankings are collected and Wednesday's are over 80% complete. April monthly reports and last weeks reports are being re-created slower than normal, and we expect them to be completed and back to normal by Sunday night. We will update this post again Monday.
 Cancel
 - Thomas McElroy
 
 2012-05-07T11:24:08-07:00
 
 Update for 5/7/2012- Rankings are back on-track and up-to-date. April monthly reports are re-generated, and last weeks weekly reports have been re-generated.
 
 We are back to normal. Thanks for your patience over the last week.
 
 1 0
 
 Update for 5/7/2012- Rankings are back on-track and up-to-date. April monthly reports are re-generated, and last weeks weekly reports have been re-generated. We are back to normal. Thanks for your patience over the last week.
 Cancel
bhennings

2012-05-03T11:12:30-07:00

Drive failure? Hey.. that's one of our keywords!

Congrats on the funding. It's great to know that SEOMoz will continue to grow.

2 0

Drive failure? Hey.. that's one of our keywords! Congrats on the funding. It's great to know that SEOMoz will continue to grow.
Cancel
Matt Beswick

2012-05-02T16:19:00-07:00

Lovely way to recover from the hangover of celebrating the investment round! ;) You guys are really under it at the moment and from what I can see the root cause of this is storage that keeps failing...

Out of interest, why isn't it just a case of using RAID storage to make sure you have ample redundancy? (i.e. is it down to cost, i/o, etc.?).

2 0

Lovely way to recover from the hangover of celebrating the investment round! ;) You guys are really under it at the moment and from what I can see the root cause of this is storage that keeps failing... Out of interest, why isn't it just a case of using RAID storage to make sure you have ample redundancy? (i.e. is it down to cost, i/o, etc.?).
Cancel
- Erica McGillivray
 
 2012-05-02T16:23:41-07:00
 
 Indeed. :) One of major projects right now by our egineering team is fixing our current storage problem.
 
 1 0
 
 Indeed. :) One of major projects right now by our egineering team is fixing our current storage problem.
 Cancel
- Thomas McElroy
 
 2012-05-02T16:52:04-07:00
 
 Great Question! I had the same thought when the first problem initially occured. We hadn't been using RAID in these machines because the data is redundant across the cluster, and we don't need the RAID for performance because we are using the SSDs. Interestingly enough, had we been RAIDing the drives together, we would likely have the same problem. :-(
 
 The issue lies in using many SSDs with the same load as redundant drives for each other: they will all fail at the same time. SSDs have a very bounded number of R/W cycles until they fail. In a normal RAID configuration, they would get the exact same number of R/W usages, leading to failures at nearly the same time (which was our problem).
 
 Fundamentally, the architecture for storage that we were using was one made for spinning disk drives, and we wanted it to be faster, so we used SSDs, however that actually introduced a reliability risk, in that our load is very balanced among the servers, so the SSDs (unlike spinning disk) all "wore out" at the same time.
 
 7 0
 
 Great Question! I had the same thought when the first problem initially occured. We hadn't been using RAID in these machines because the data is redundant across the cluster, and we don't need the RAID for performance because we are using the SSDs. Interestingly enough, had we been RAIDing the drives together, we would likely have the same problem. :-( The issue lies in using many SSDs with the same load as redundant drives for each other: they will all fail at the same time. SSDs have a very bounded number of R/W cycles until they fail. In a normal RAID configuration, they would get the exact same number of R/W usages, leading to failures at nearly the same time (which was our problem). Fundamentally, the architecture for storage that we were using was one made for spinning disk drives, and we wanted it to be faster, so we used SSDs, however that actually introduced a reliability risk, in that our load is very balanced among the servers, so the SSDs (unlike spinning disk) all "wore out" at the same time.
 Cancel
cdigital

2012-05-06T21:58:22-07:00

the end of month reports should come out on the first day of the new month. stop selling new subscriptions and clean up your systems before you grow anymore. all well and good to keep bringing on new customers but servicing existing customers should be done first.

2 0

the end of month reports should come out on the first day of the new month. stop selling new subscriptions and clean up your systems before you grow anymore. all well and good to keep bringing on new customers but servicing existing customers should be done first.
Cancel
AimHomeServices

2012-05-02T16:08:58-07:00

Thanks for the heads up, just noticed this as I had my email informing me of my new keyword rankings but once logged in was seeing old rankings from 26th April. Hope you fix it soon!

2 0

Thanks for the heads up, just noticed this as I had my email informing me of my new keyword rankings but once logged in was seeing old rankings from 26th April. Hope you fix it soon!
Cancel
- Erica McGillivray
 
 2012-05-02T16:24:14-07:00
 
 You're welcome. It should be fixed by the end of this weekend.
 
 2 0
 
 You're welcome. It should be fixed by the end of this weekend.
 Cancel
Highland

2012-05-03T13:38:42-07:00

I am hereby blaming this on Penguin (it seems like a good bandwagon to jump on at the moment). Curse you, Google, for taking away our SEO tools!

For those wondering about SSDs and failures, you need to understand that SSDs are just large Flash RAM arrays, just like USB thumb drives. They are designed to be semipermanent (meaning that, unlike their regular RAM brethren, they don't need power to store data) and, because they're not spinning platters, they are vastly faster than HDDs. The tradeoff is that they have a life span. So for something that high I/O but needs speed, you're looking at replacing new SSDs on a regular basis.

Technical stuff https://www.storagesearch.com/bitmicro-art1.html

2 0

I am hereby blaming this on Penguin (it seems like a good bandwagon to jump on at the moment). Curse you, Google, for taking away our SEO tools! For those wondering about SSDs and failures, you need to understand that SSDs are just large Flash RAM arrays, just like USB thumb drives. They are designed to be semipermanent (meaning that, unlike their regular RAM brethren, they don't need power to store data) and, because they're not spinning platters, they are vastly faster than HDDs. The tradeoff is that they have a life span. So for something that high I/O but needs speed, you're looking at replacing new SSDs on a regular basis. Technical stuff https://www.storagesearch.com/bitmicro-art1.html
Cancel
Lee Jackson

2012-05-03T10:15:33-07:00

Appreciate the transparency, problems happen but thumbs up for your efforts to resolve them!

2 0

Appreciate the transparency, problems happen but thumbs up for your efforts to resolve them!
Cancel
Louis Marcus

2012-05-08T11:56:28-07:00

Whats happening with the Anchor text in OSE? It still shows an oticed that the data is from Feb.

1 0

Whats happening with the Anchor text in OSE? It still shows an oticed that the data is from Feb. 
Cancel
- Erica McGillivray
 
 2012-05-08T12:43:18-07:00
 
 In order to fix a problem and ship the index sooner rather than later, we decided not to update the anchor text this time around. We figured that our community would rather have fresh data with old anchor text than no update at all. Currently, the anchor text is scheduled to update between 4/30 and 5/9. I apologize for any problems this may have caused you.
 
 2 0
 
 In order to fix a problem and ship the index sooner rather than later, we decided not to update the anchor text this time around. We figured that our community would rather have fresh data with old anchor text than no update at all. Currently, the anchor text is scheduled to update between 4/30 and 5/9. I apologize for any problems this may have caused you.
 Cancel
 - Louis Marcus
 
 2012-05-08T14:45:02-07:00
 
 Perfectly understood!
 
 1 0
 
 Perfectly understood!
 Cancel
BWIRic

2012-05-11T09:05:22-07:00

Thanks for the update - will get round to looking over the massive index soon.

1 0

Thanks for the update - will get round to looking over the massive index soon.
Cancel
Jeff Downer

2012-05-03T08:32:59-07:00

Thanks for the update. I was trying but with no joy. I had no worries though, I knew you folks were on top of it. It's one of the bumps on the road when you're dreaming big, don't stop.

1 0

Thanks for the update. I was trying but with no joy. I had no worries though, I knew you folks were on top of it. It's one of the bumps on the road when you're dreaming big, don't stop.
Cancel
Sha Menz

2012-05-05T20:42:34-07:00

Hi Thomas,

Thanks for keeping us up to date on what is happening.

To be honest, with all the Google rollouts and volatility in the SERPs during the past couple of weeks, the new timing of reports for one particular client was actually a Godsend :)

Sorry it caused a lot of extra work for you guys and anxiety for some, but I'll happily grab the silver lining which will give me updated rankings exactly when I need them this week!

Sha

1 0

Hi Thomas, Thanks for keeping us up to date on what is happening. To be honest, with all the Google rollouts and volatility in the SERPs during the past couple of weeks, the new timing of reports for one particular client was actually a Godsend :) Sorry it caused a lot of extra work for you guys and anxiety for some, but I'll happily grab the silver lining which will give me updated rankings exactly when I need them this week! Sha
Cancel
ErikJohnson

2012-05-06T07:11:40-07:00

Hello,

I was wondering if this issue extends to the crawler diagnostic tests as well?

i created a campaign last week and made the necessary changes based on what the crawler found. However, when the crawler ran again this week, there was no change whatsoever in the reports. It is totally flat.

More specifically, with the duplicate content and duplicate titles errors is there something I have to do so that rogerbot sees these changes? Most if not all missing page titles and duplicate page titles are definately fixed. Does rogerbot crawl fresh each week or do it crawl old pages that it indexed before? I was expecting the number to change a little at least.

Just wondering if these reports are delayed as well or if there is something I am missing so that rogerbot does not crawl old or removed pages from the site conitnuously.

Thanks!

1 0

Hello, I was wondering if this issue extends to the crawler diagnostic tests as well? i created a campaign last week and made the necessary changes based on what the crawler found. However, when the crawler ran again this week, there was no change whatsoever in the reports. It is totally flat. More specifically, with the duplicate content and duplicate titles errors is there something I have to do so that rogerbot sees these changes? Most if not all missing page titles and duplicate page titles are definately fixed. Does rogerbot crawl fresh each week or do it crawl old pages that it indexed before? I was expecting the number to change a little at least. Just wondering if these reports are delayed as well or if there is something I am missing so that rogerbot does not crawl old or removed pages from the site conitnuously. Thanks!
Cancel
- Sha Menz
 
 2012-05-06T22:07:02-07:00
 
 Hi Erik,
 
 Best to send an email direct to the Help Team if you have any problems with your campaigns. Send email to help at seomoz.org and be sure to give them the Campaign number (in the URL when inside your campaign) or the domain so they can take a look at it for you.
 
 Hope that helps
 
 Sha
 
 1 0
 
 Hi Erik, Best to send an email direct to the Help Team if you have any problems with your campaigns. Send email to help at seomoz.org and be sure to give them the Campaign number (in the URL when inside your campaign) or the domain so they can take a look at it for you. Hope that helps Sha
 Cancel
 - ErikJohnson
 
 2012-05-08T12:31:07-07:00
 
 Thank you Sha. I appreciate the info.
 
 1 0
 
 Thank you Sha. I appreciate the info.
 Cancel
58phases

2012-05-03T09:17:01-07:00

With the recent round of funding and the growth you guys are experiencing, I'd love to see some serious money invested in technology. Recently it seems like a lot of services are down, delayed or slow. It would be really awesome to see fewer hiccups.

1 0

With the recent round of funding and the growth you guys are experiencing, I'd love to see some serious money invested in technology. Recently it seems like a lot of services are down, delayed or slow. It would be really awesome to see fewer hiccups.
Cancel
- Anthony Skinner
 
 2012-05-03T11:35:47-07:00
 
 We are in complete agreement. Pre-funding we started adding staff to reduce customer impacting issues. Further, we are in the process of building our own data center at a co-location facility, which will give us better control over the quality of hardware.
 
 Anthony_Skinner edited 2012-05-03T12:04:56-07:00
 1 0
 
 We are in complete agreement. Pre-funding we started adding staff to reduce customer impacting issues. Further, we are in the process of building our own data center at a co-location facility, which will give us better control over the quality of hardware.
 Cancel
FFTCOUK

2012-05-04T10:37:23-07:00

This is our first week since signing up, and I am not impressed. I cannot understand why you have no back up facility for such and event. all serious businesses run a risk assessment, and what if scenario, which you seem to have failed to do

1 2

This is our first week since signing up, and I am not impressed. I cannot understand why you have no back up facility for such and event. all serious businesses run a risk assessment, and what if scenario, which you seem to have failed to do
Cancel
- Casey Henry
 
 2012-05-04T10:54:45-07:00
 
 Thanks for your comment FFTCOUK. Our engineering team is working hard to improve the process at which we collect and store ranking for all our customers. We understand that an outage like this can effect our customers and that's why we are working hard to solve the issue and prevent it from happening again.
 
 1 0
 
 Thanks for your comment FFTCOUK. Our engineering team is working hard to improve the process at which we collect and store ranking for all our customers. We understand that an outage like this can effect our customers and that's why we are working hard to solve the issue and prevent it from happening again. 
 Cancel

Post Analytics

When will it be fixed?

Comments 33

Log in to Moz

Don't have an account?