Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Data Storage

The Sidekick Failure and Cloud Culpability 246

miller60 writes "There's a vigorous debate among cloud pundits about whether the apparent loss of all Sidekick users' data is a reflection on the trustworthiness of cloud computing or simply another cautionary tale about poor backup practices. InformationWeek calls the incident 'a code red cloud disaster.' But some cloud technologists insist data center failures are not cloud failures. Is this distinction meaningful? Or does the cloud movement bear the burden of fuzzy definitions in assessing its shortcomings as well as its promise?"
This discussion has been archived. No new comments can be posted.

The Sidekick Failure and Cloud Culpability

Comments Filter:
  • Re:Management (Score:5, Insightful)

    by sopssa ( 1498795 ) * <sopssa@email.com> on Monday October 12, 2009 @10:32AM (#29718659) Journal

    As always, cloud computing/hosting/whatever is a vague term used like any other buzz term. I just see it as a platform where the resources should be allocated automatically and the underneath system takes care of having those available.

    The same failure points are there. You're just putting the trust and management to someone else. Even if they do have backup plans and certain levels of redundancy, it can always fail. Cloud computing isn't something magical.

    “Similarly datacenters fail, get disconnected, overheat, flood, burn to the ground and so on, but these events should not cause any more than a minor interruption for end users. Otherwise how are they different from ‘legacy’ web applications?”

    That's because they aren't. The system is just managed by someone else, and its managed for thousands of people at the same time so its cheaper. Kind of like what Akamai has been doing for long with their content delivery network - it's cheaper for the providers because they dont have to build the infrastructure themself, and its cheaper for Akamai because they do it for so many clients.

  • by davidwr ( 791652 ) on Monday October 12, 2009 @10:37AM (#29718717) Homepage Journal

    If you can't trust your outsourcing partner, replace them or bring the work in-house.

  • by iamacat ( 583406 ) on Monday October 12, 2009 @10:45AM (#29718831)

    Just like people lose their stuff on personal hard drives when not backed up, they will lose cloud data when not backed up. Both kinds of computing have merits, and long term persistence of data is not automatic with either. Most people do not place THAT hard a value on backups of their cell phones. They typically sync with a PC anyway. But any business that doesn't have weekly reliable offsite backups of their fundamental assets should be sued by shareholders/customers for irresponsibility weather they use cloud or not.

  • by dFaust ( 546790 ) on Monday October 12, 2009 @10:50AM (#29718901)

    Personally, I always interpreted cloud computing as software that's running on a number of boxes of which the number can fluctuate without being meaningful (obviously there are performance implications depending on the overall load and number of boxes, but one box going down doesn't inherently bring down the system). One nice thing is these boxes can be geographically distributed as well - so when one data center gets nuked, the others are safe. Now, I realize geographic distribution isn't a requirement but even still, the press release says the data loss is due to a "server failure." Not a data center failure, but the apparent failure of a single server.

    So is this really even "the cloud"? Does that mean that Geocities was "the cloud" or that every web host out there is "the cloud" because they've got my data running on a single machine? I certainly never interpreted it that way, but I'm no expert on the matter. It seems like if this data was in "the cloud" that it could have all been retrieved off of another machine somewhere. Perhaps for some customers those other machines might not yet be completely synced with very recent updates, but that would affect a small amount of data for a subset of customers.

  • Comment removed (Score:5, Insightful)

    by account_deleted ( 4530225 ) on Monday October 12, 2009 @10:52AM (#29718923)
    Comment removed based on user account deletion
  • Assumptions (Score:5, Insightful)

    by eagl ( 86459 ) on Monday October 12, 2009 @10:52AM (#29718925) Journal

    Just because you're paying someone to store your data doesn't mean they care about that data as much as you do... That's one of the two big problems with cloud computing that can't be solved by technology. First, nobody cares about your data as much as you do. Second, nobody will protect your data (ie. control it's distribution and prevent unauthorized changes) to the level you find appropriate.

    It's usually a good idea to avoid using broad generalities (like I just did), but it seems like in general it would be a bad idea to let someone else be the sole keeper of anything even remotely important or sensitive. There are exceptions, but those seem to be internal to a company (ie. the company runs it's own cloud and has all employees use it). Or military/government applications where centralized security and backup can keep user errors from becoming a real danger to the organization beyond "help I lost my email!".

  • by trybywrench ( 584843 ) on Monday October 12, 2009 @10:53AM (#29718933)
    I think the key here is was it only T-Mobile's data that was lost or was every customer of the "cloud" affected. If it was only T-Mobile's data than the issue is T-Mobile's backup policy, if it was "cloud"-wide than it's an issue with the "cloud" provider. In either case, I don't think you can paint the entire "cloud" concept as unstable. Cloud computing is really just a dynamic datacenter with all the usual weak links and issues present in a traditional metal datacenter.
  • When you cut through the "cloud", if you look into the center of things, you see that the so-called modern "cloud" computing environment is a giant computer(s), surrounded by high powered priestly geeks, doling out resources to everyone, completely centralized. The priests have some new tricks to entertain the masses with, but there's nothing fundamentally different between cloud computing and IBM's vision of computing in the 1960s.

  • No true scotsman (Score:5, Insightful)

    by vadim_t ( 324782 ) on Monday October 12, 2009 @10:59AM (#29719007) Homepage

    This is awfully convenient. Something that at least to my eyes looks a lot like a cloud crashes. Cloud pundits announce:

    "if it loses your data - it's not a cloud".

    So if Amazon's S3 ever fails horribly and loses everybody's data, then it wasn't a cloud either.

  • by FauxReal ( 653820 ) on Monday October 12, 2009 @11:02AM (#29719053)

    But some cloud technologists insist data center failures are not cloud failures. Is this distinction meaningful?

    Do you think the customer will want to argue semantics with you after you've lose their data?

  • Re:Management (Score:5, Insightful)

    by dkf ( 304284 ) <donal.k.fellows@manchester.ac.uk> on Monday October 12, 2009 @11:03AM (#29719057) Homepage

    Well there is one difference. Cloud computing and virtual servers are to computers what keychains are to keys, it enables you to lose everything at once.

    It's not really a difference. With home-grown datacenters you still have that risk unless you do something like building multiple redundant buildings in different locales and managing some kind of replication and backup strategy. But then all of that stuff is the same with going to a Cloud provider, except you're not having to futz around with the physical facilities yourself.

    There's no magic. All we're seeing is stupid people getting burned because they didn't use basic due diligence.

  • Re:Management (Score:4, Insightful)

    by dFaust ( 546790 ) on Monday October 12, 2009 @11:03AM (#29719067)
    But if Akamai loses a server, I don't have to repopulate the gigs of data they're hosting for me - it's not lost, it's just no longer on that particular server that died. That's exactly why I consider Akamai to be "the cloud" and why it doesn't side like Danger was. Especially with an infrastructure like Akamai or Google where things are geographically distributed, you just don't hear about servers dying, and you might not even hear about data centers dying (unless it places an unusually high burden somewhere and causes performance issues - but you don't hear about data loss as a result).
  • by Anonymous Coward on Monday October 12, 2009 @11:05AM (#29719079)

    I don't think that has anything to do with it; at least not for me. My main concern with cloud computing is trust. Do I trust someone other than myself to not fuck up and lose all my data? For critical data, the answer is no. If somebody is going to fuck up and lose all my data, it's going to be me. I don't know if all the data on a Sidekick would qualify as critical, but it would certainly be annoying as fuck to lose it all.

  • by rickb928 ( 945187 ) on Monday October 12, 2009 @11:08AM (#29719103) Homepage Journal

    I'm a TMO subscriber, and I love them, so this is painful. And my sister-in-law is a longtime Sidekick user, so she's in a special agony.

    But T-Mobile is in a potentially no-win situation. They obviously have to believe Danger/Microsoft that they have good processes to avoid and recover from such failures. They didn't, and now TMO is probably going to take the hit. On one hand, they should - if the service is important, take responsibility and ensure management. On the other hand, they have good assurances, so hey, how much is enough?

    BlackBerry users, you should take note. Rim differs only in scale. Ahd, you hope, depth of resilience. Not that RIM hasn't had outages, though not total failure yet.

    TMO may have to tell their Sidekick users to be prepared for the inevitable restore, and of course, work with Danger/Microsoft to re-establish service (even though they don't provide service, D/M does), and of course some money compensation no matter how inadequate.

    And maybe offer them shiny new myTouch3Gs to give the disillusioned Sidekick users an option with a marginally better track record.

    No, wait, that isn't right. I've had to wipe my G1 every update, and some apps don't have a way to save data. They just don't.

    I'm glad I never got on the Sidekick train, but I have no hope that this won't some day hit me. Do you suppose the next major Sidekick update will include data backup? :)

  • by RR ( 64484 ) on Monday October 12, 2009 @11:11AM (#29719141)
    This is a service run by Microsoft. Microsoft is a bit hostile to consumers. It would be ironic and sad if Microsoft's failure to maintain the Sidekick service gets blamed on the faceless "Cloud" and it hurts Microsoft's competitors.
  • predictably doomed (Score:4, Insightful)

    by jipn4 ( 1367823 ) on Monday October 12, 2009 @11:15AM (#29719181)

    Danger held your data hostage from the start and didn't provide backup. Then, when Microsoft took them over, it was clear that they were going to mess with the service and servers. No backup + Microsoft mucking with the servers = kiss your data goodbye.

    But that's no more an indictment of hosted services or "cloud computing" than a Windows BSOD is an indictment of desktop computing. Microsoft screwed up, and quite predictably, too.

  • by John Hasler ( 414242 ) on Monday October 12, 2009 @11:16AM (#29719191) Homepage

    Just define away your problems. ROFL.

  • by Prototerm ( 762512 ) on Monday October 12, 2009 @11:16AM (#29719193)

    Why on Earth would you trust your valuable data (and if it wasn't valuable to you, why keep it in the first place?) to someone else, someone who doesn't answer to the same people you do? I have always thought that "the cloud" is an epic fail waiting to happen. As a concept, it makes no sense. It's a scheme worthy of Professor Harold Hill himself.

    You want your data safe? You want it backed up properly? Don't want to lose it? Then put it on your own hardware and take care of it yourself. Don't leave it to someone else to save your bacon when something goes wrong. Because, in the end, they don't care about you. You're just a monthly fee to them, and the agreement/contract/whatever you signed with them absolves them of all responsibility.

  • Sort of (Score:3, Insightful)

    by Kirby ( 19886 ) on Monday October 12, 2009 @11:22AM (#29719255) Homepage

    Well, any time you're storing data in a central place, you have a greater consequence of failure. That's a downside of "cloud computing", or any web application that stores data in a database too.

    The alternative approach is everyone to have a local version of their data, which will be lost by individuals all the time but not by everyone all at once.

    Obviously, if you have a server that's a single point of failure for your company, and you botch a maintenance, something went very wrong. And not having a backup - it seems strange for a company that's been around the block a few times and has big resources behind it. You have to write this off as more of a specific failure and not a failure of the concept of storing data on a remote server.

    I do have a good friend that works for Danger - I really don't envy the week he must be having.

  • by sopssa ( 1498795 ) * <sopssa@email.com> on Monday October 12, 2009 @11:22AM (#29719259) Journal

    Tip: If you want to link to specific part in youtube video, you can add #t=1m3s etc on it, ie http://www.youtube.com/watch?v=kcFUDvTFokg#t=1m40s [youtube.com]

    Also adding &hd=1 gives hq/hd version.

  • Re:Cloud Failure (Score:3, Insightful)

    by slim ( 1652 ) <john.hartnup@net> on Monday October 12, 2009 @11:24AM (#29719283) Homepage

    I know my songs, videos, and other important files are backed-up across triple drives. I don't know if the same is true if I stored them online, and this major failure of Sidekick demonstrates I'm right not to trust them.

    That depends entirely on the online storage service you use. If your contract says the files are backed up across triple drives, then you've a right to expect that they are. If your contract doesn't say that, then you shouldn't expect it. Simple.

    Now, I'd argue that any cloud service worthy of the name ought to have very robust mirrored storage. But since there's no legal definition of the word, you'd better read the contract.

  • by Attila Dimedici ( 1036002 ) on Monday October 12, 2009 @11:28AM (#29719335)
    I wish I had mod points because that is the best summation of "cloud" computing I have read yet. Every few years some technological development causes this computing paradigm to be brought up as the "new thing" in computing. Every time this happens there are all these people talking about how it is the "wave of the future" and that all computing will go that way. After a few years, people realize that it has the same limitations that caused it to be rejected except for those limited industries and applications where it is a good idea. Of course, most of those were already using the old way of doing this and only move to this where the new implementation is an improvement over the old way.
    There are always companies who massively push this "new" approach because it is a great way to guarantee a steady income stream
  • Re:Management (Score:4, Insightful)

    by QuantumRiff ( 120817 ) on Monday October 12, 2009 @11:38AM (#29719479)

    True, but how much more money and brain power does Google have to invest in datacenter design and disaster recovery than your local college?

    Seriously.. I worked at one.. All our stuff was on "next day parts" from Dell.. We had a single internet connection to the campus, single linux based sendmail email server, etc.

    Granted, I had tapes up the wazoo, and could retrieve any file for the past X years, but downtime is still downtime.

    Then you have Google, with multiple sites, multiple connections, replication, Load balancers, etc.

    Not only do they have more to invest, but when they call up a vendor and say "we are Google, we have an outage, and we need some things from you" I bet those vendors jump a little faster than when a local school IT guy calls them up..

  • Re:Management (Score:5, Insightful)

    by wickerprints ( 1094741 ) on Monday October 12, 2009 @11:44AM (#29719577)

    To be fair, Sidekick users didn't have a viable means to back up their personal data that was being pulled from Microsoft/Danger servers. I don't think it's reasonable to expect the users to find some hack or unofficial method to copy all their data from their devices. The only blame they could be assigned is that they bought the service being sold. Your criticism would be valid for, say, iPhone users, since the user has a backup stored on their computer. But no such functionality exists for the Sidekick, as far as I am aware.

    And as to who is really being burned here.... Obviously not Microsoft/Danger. Microsoft doesn't give two shits about this, since their acquisition of Danger in 2008 was really about cannibalizing their talent for Windows Mobile 7, as the Pink project has shown. Danger is just a shell of its former self--the damage was done long before this latest failure, which I think was an inevitable consequence of the acquisition. The ones who got burned are T-Mobile (for trusting Microsoft to manage Danger, and Danger to maintain a proper backup solution), and of course, the consumers.

    The real issue, of course, is that data is always at risk of being lost no matter how, where, or in what amount it is stored. The passage of time guarantees it. But people want to believe in the existence of certainties, in the notion that if something has a 99.9999% reliability, then we can effectively ignore the minuscule probability of failure. But failures happen all the time and there is no such guarantee. We need to rid ourselves of this delusion that data can somehow be made "safe," that risk can be ignored when made small. Cloud computing is just the flavor of the day.

    I knew someone who worked at Danger years ago when the company was still fairly new. It was, at the time, an amazing technology. There was nothing like it. They had so much going for them, and there was a lot of good talent working there. One thing that impressed me was how they solved the problem of mobile web browsing. At the time, mobile web browsing seriously sucked ass. It was not only slow, but many sites simply would not load. Danger solved that by re-parsing the sites on their servers so that pages would look good and function properly on your mobile device. It was the best solution until mobile OSes and hardware became powerful and complex enough to support full browsing; and even then, the UI needed to be tightly integrated before browsing became efficient instead of tedious. It's sad to see such a pioneering company wither on the vine.

  • by Anonymous Coward on Monday October 12, 2009 @11:45AM (#29719599)

    Nothing can correct something like this, which involved an error propagating to the backups.

    If you can instantly corrupt your "backups", they're not backups. If you're doing anything serious and you don't have offline copies at a remote location going back at least a few months, you're doing something very wrong.

  • by Locutus ( 9039 ) on Monday October 12, 2009 @11:49AM (#29719653)
    Microsoft gutted Danger and left it on life support but all the while they lead their customers( T-Mobile and users ) to believe Danger was thriving and doing fine. Wow, doesn't that sound like Paulson in early 2007 having stated that the banking system was just fine? The difference, Paulson really was clueless while Microsoft knew darn well they'd pulled most of Dangers developers over to their project Pink.

    This is what should be up in lights with flares and fireworks and not anything about how bad/good cloud computing is. But once again, there is Microsoft at the wheel and yet the press is saying "pay no attention to that man behind the curtain".

    And this interesting in tying this to cloud computing sounds eerily familiar since I just read how Steve Ballmer was bashing IBM for not running their business correctly. Basically, paying too much attention to software and cloud computing and he's all amped about this right when yet another Microsoft failure proves how bad they are at this. Could be spin control so watch for more of the same if it is.

    LoB
  • by natoochtoniket ( 763630 ) on Monday October 12, 2009 @11:51AM (#29719681)

    There is one difference.

    In previous decades, for the most part, the company that operated the computing center considered the data to be valuable, and took great care to prevent data loss. They knew that the hardware could fail, and so they made multiple copies of each data file. They did backups, and they checked and tested the backups. Most even stored some copies off-site to hedge against the possibility of catastrophic loss of the entire data center.

    At present time, many young people have never seen data loss. Many people do not realize that hardware failure is even possible. If they make backups, they rarely check or test the fidelity or reliability of those backups. Those same people are administering the data center operations. Managing the disk farm, replacing failed mirrors, and making backups of customers data, are all activities that are part of the service. As far as many of the MBA types are concerned, all of those are just costs to be minimized.

    A single disk might have a MTBF of 30 years. But a system that uses ten thousand disks will have a MTBF of about a day. (On average, a disk will fail somewhere in the system, every day.) RAID systems do not eliminate the issue, because simultaneous disk failure is possible. And a power-supply failure, fire, explosion, software failure, or employee can kill a whole bunch of disks all at once.

    In my own organization, I want to know where my data is. I want mirrored disks to minimize the operational effects of common hardware failure, and off-line/off-site backups so we can stay in business after an uncommon failure. I want to review the backup schedule. I want regular verification of backup status. I want periodic audits of the backups, to be sure they really exist and that they can really be read. And, when the data is vital to the continuance of my business, that verification and auditing must not be outsourced.

    Whenever your MBAs want to cut the cost of doing backups, you really should check with the underwriter of your business-continuation insurance.

  • Re:Management (Score:3, Insightful)

    by StuartHankins ( 1020819 ) on Monday October 12, 2009 @12:00PM (#29719819)

    To be fair, Sidekick users didn't have a viable means to back up their personal data that was being pulled from Microsoft/Danger servers. I don't think it's reasonable to expect the users to find some hack or unofficial method to copy all their data from their devices.

    Absolutely correct. Wish I had mod points.

  • Re:Management (Score:5, Insightful)

    by BrokenHalo ( 565198 ) on Monday October 12, 2009 @12:05PM (#29719869)
    This all comes back to the thrust of the OP: whether the apparent loss of all Sidekick users' data is a reflection on the trustworthiness of cloud computing or simply another cautionary tale about poor backup practices.

    The simple truth, of course, is that it is both. And the only solution here is the old one: if you want something done properly, you will have to do it yourself. If your data, documents or whatever are in any way important to you, you should not be relying on anyone else to keep them safe. Simple as that, and no excuses.
  • by Anonymous Coward on Monday October 12, 2009 @12:20PM (#29720061)

    It's symptomatic of the times, I think. "Git 'R Dun!" means provide the superficial essentials as fast and as cheap as possible and don't sweat the details. Once it's running, forget about filling in the missing stuff - all it does is make the product more expensive; and besides, "nothing like that's ever going to actually go wrong". Pat yourself on the back because you're so "productive". Rinse, repeat on next product.

  • Re:Management (Score:3, Insightful)

    by Eskarel ( 565631 ) on Monday October 12, 2009 @12:22PM (#29720081)

    That's really a rather idiotic statement.

    If your data is important then you take it's storage seriously. Sometimes that means you host it yourself, sometimes it means you get someone else to host it for you. You don't host your critical data if you can't afford the staff and infrastructure to support it, and if you've already got the staff and infrastructure you don't pay someone else to do it.

    The important thing is that you take it seriously. That means contracts with your data storage provider with exactly what backup and restoration services they're promising and penalties for failing to meet those promises. It means full disaster recovery plans and proper due diligence including understanding what kind of outages you can afford and what it's going to cost you to keep outages under that value.

    There's nothing inherently more safe about storing your own data or inherently unsafe about having someone else do it. In most cases the person who stores the data and the person who actually needs it are different anyway. What is unsafe is trusting someone else to look after your data without checking up that they actually are, be they internal or external.

  • Re:Management (Score:2, Insightful)

    by Anonymous Coward on Monday October 12, 2009 @01:14PM (#29720739)

    Although I agree to some extent with your argument, the fact remains that it sure didn't work out in this case!

    Sometimes a company will so thoroughly fall in love with the savings of server consolidation, they fail to implement (and especially TEST) their nifty backup and failover infrastructure. You might think that a company that SELLS cloud-style services would be at the forefront of robust testing. Evidently not in this case. Competent datacenter sysadmins are an endangered species.

    Sidekick users would have been better off with self-service USB backup to a laptop. Even if some of them neglected or screwed up their backups, the would have (at worst) the same scenario they have now. There is something to be said for being in charge of your own destiny.

    This is the problem I have with outsourced cloud-style services. Because so much risk has been consolidated into a single service provider, they either run a spectactular operation or they fake it until disaster strikes. In the end, there is no guarantee that what they do will perform any better than a homebuilt Linux box.

  • by James McP ( 3700 ) on Monday October 12, 2009 @01:40PM (#29721171)

    "Real" cloud computing is supposed to be based on a mesh of geographically diverse, redundant servers each carrying various subsets of the data. Think RAID5 for servers, with each partition located in a different part of the world and on different networks.

    Which means it is nothing more than an internet based service with five 9s of reliability and availability.

    However it is an *expensive* internet based service so it needs a new moniker. But without a "Cloud Computing Consortium" with ownership of the trademark "cloud computing" to enforce correct usage, there's nothing to prevent everyone and their dog from using the term incorrectly for any "always connected" application.

    The problem with all this is that its almost impossible for an end user to know for sure if someone really has a proper cloud application until something fails. If there is a total failure of a site and no one notices, you've got a working cloud. If people lose data or functionality, you don't.

  • by John Whitley ( 6067 ) on Monday October 12, 2009 @02:44PM (#29722271) Homepage

    Heck, I know folks who've lost entire well-known (hobbyist) web-portals some years back due to provider server failures. It was a harsh lesson for those involved. So much for the provider's backup policies. The real solution is to have multiple copies of the data, ideally in different formats. For example, when I was in grad school the University had (for the time) a huge email installation, basically full email hosting for the entire institution. The server and storage spec was excellent -- a big SAN-like dual storage array that could handle failures at multiple levels, including one entire half of the storage system. Turns out they got hit by a nasty filesystem corruption bug, which nuked the whole array. Oops. Their bacon was saved because they also had regular verified tape backups (IIRC, it took many, many weeks to fully restore archived mail to the cluster).

    These problems really have little to do with the computing models involved. There's a misperception that the "cloud" provides some sort of data robustness beyond what mere mortals can accomplish, but the reality is that valuable data just needs more copies. Perhaps their backup strategies are layered and awesome, but you never really know where the weak links are. One remote service provider really only ever counts as one copy. And so it's useful to consider a service like GitHub [github.com]. The fundamental model of the service is to encourage folks to share and copy their data around, because that's a prime goal of the supporting software: git [git-scm.com]. If a git-based service goes down, there should be many copies of the repository data, and the various users will regroup, republish, and move on. No single user has to be overly conscious of maintaining lots of backups, because copying is the basic working model.

    There's a lesson there for those of us working in software: design for subversive backup, where critical data is backed up/synced/secured as a normal part of day-to-day workflow. Make sure that failure in any one point doesn't induce the others to similarly fail or become corrupt. Think through and verify the recovery schemes. Imagine that it's your data going down the tubes...

  • Re:Management (Score:3, Insightful)

    by mcrbids ( 148650 ) on Monday October 12, 2009 @05:04PM (#29724307) Journal

    Don't confuse downtime (EG server powered down) with a catastrophic failure like this one. (total, irretrievable data loss)

    Your school was a far better place (apparently) than MS Danger. Although downtime was more likely with your single sendmail server, you would still expect about 99.9% to 99.99% uptime year on year. that equates to about 4 hours per year on average. That's definitely down in the 'minor inconvenience' range for a school.

    And your risk of catastrophic failure with all the (verified?) tapes is near zero. Sounds like a good solution to me!

If you think the system is working, ask someone who's waiting for a prompt.

Working...