Server Failure Destroys Sidekick Users' Backup Data

Server Failure Destroys Sidekick Users' Backup Data 304

Posted by timothy on Sunday October 11, 2009 @05:29AM from the oh-well-enough-said dept.

Expanding on the T-Mobile data loss mentioned in an update to an earlier story, reader stigmato writes "T-Mobile's popular Sidekick brand of devices and their users are facing a data loss crisis. According to the T-Mobile community forums, Microsoft/Danger has suffered a catastrophic server failure that has resulted in the loss of all personal data not stored on the phones. They are advising users not to turn off their phones, reset them or let the batteries die in them for fear of losing what data remains on the devices. Microsoft/Danger has stated that they cannot recover the data but are still trying. Already people are clamoring for a lawsuit. Should we continue to trust cloud computing content providers with our personal information? Perhaps they should have used ZFS or btrfs for their servers."

Server Failure Destroys Sidekick Users' Backup Data

This discussion has been archived. No new comments can be posted.

Search 304 Comments Log In/Create an Account

Comments Filter:

"they should have used ZFS or btrfs" (Score:5, Insightful)

by Manip ( 656104 ) writes: on Sunday October 11, 2009 @05:34AM (#29709757)

This seems a rather silly point to make. I know this is Slashdot and we have to suggest Open Source alternatives but throwing out random file systems as a suggestion to fix poor management and HARDWARE issues is some place between ignorant and silly.
Perhaps they should have had at least mirrored or stripped raid, with an off-site backup every week or so?

Re:"they should have used ZFS or btrfs" (Score:5, Insightful)

by timmarhy ( 659436 ) writes: on Sunday October 11, 2009 @05:46AM (#29709795)

retarded comments like that are the reason these zealots aren't taken seriously in the enterprise.
i'd hazard a guess that the offsite backups were corrupted as well somehow or were silently failing.

Re:Backups? (Score:5, Insightful)

by TheSunborn ( 68004 ) writes: <mtilsted.gmail@com> on Sunday October 11, 2009 @05:47AM (#29709799)

Or this was really a software error, and the backup servers in an other datacenter, just copied the faulty data/delete command.
They should really be far to big to have all their data stored in a single datacenter with no offsite backup. (Or they should have an entry on thedailywtf.com)

It's The Backups Stooped (Score:5, Insightful)

by tres ( 151637 ) writes: on Sunday October 11, 2009 @05:57AM (#29709837) Homepage

This is an issue of irresponsibility. Plain and Simple. The company responsible for maintaining the data should -- at the very least -- have had some full system backup from last month. If they had some old backup somewhere at least you could chalk it up to systems failure or bad backup tape or bad admin or something.
But the fact that there is no backup anywhere indicates brazen negligence on the part of everyone responsible for the data. Everyone who had a part in designing the system and managing the system is culpable. The most ridiculous part of this is the over-reliance on server-side data storage by the sidekick designers.

Re:A server failure? (Score:4, Insightful)

by Hadlock ( 143607 ) writes: on Sunday October 11, 2009 @05:58AM (#29709847) Homepage Journal

Reportedly sidekicks are thin clients, other than making phone calls, everything on the phone is saved on the server side. Which is a special kind of retarded, in today's world where a blackberry performs all the same functions, and provides a local backup feature. But yeah as for the backups, all your backups are worthless if your data backup code is flawed, and nobody ever checks the backup tapes. When MS bought the service, they probably changed the location the servers were in, plugged everything back in, and kept going. I imagine a project like that would be on a short timetable, and "checking to see that the backup tapes are really being backed up to" is low on the priority list when the service is already live.

WTF (Score:5, Insightful)

by ShooterNeo ( 555040 ) writes: on Sunday October 11, 2009 @06:08AM (#29709883)

This is unbelievably bad. The real problem is : why aren't there incremental off site backups to another server farm? A weekly binary difference snapshot would have made this failure less catastrophic.
Ultimately, with a complex application like this, you can't guarantee 100% that the code doesn't have a bug in it that could result in loss of user data. You can be ALMOST sure it won't, but 100% is not possible with current analysis techniques. (even a mathematical proof of correctness wouldn't protect you from a hacker)
But a properly done set of OFFLINE backups, stored on racks of tapes or hard disks in a separate physical facility : you can be pretty sure that data isn't going anywhere.

Re:"they should have used ZFS or btrfs" (Score:5, Insightful)

by sopssa ( 1498795 ) * writes: <sopssa@email.com> on Sunday October 11, 2009 @06:12AM (#29709901) Journal

Exactly, this can be a software bug too and that could possibly easily destroy or corrupt backup data too. I really doubt this service was ran without backups.
The type of filesystem has nothing to do with this.

Re:"they should have used ZFS or btrfs" (Score:5, Insightful)

by Znork ( 31774 ) writes: on Sunday October 11, 2009 @06:47AM (#29710045)

I really doubt this service was ran without backups.
Knowing 'enterprise' backups I'd bet there was at least a backup client installed and running. However, I'm equally sure that the backups were, at best, tested once in a disaster recovery exercise and were otherwise never verified.
Further, responsibility would probably be shared between a storage department, a server operations department and an application management department, neatly ensuring that no single person or function is in the position to even know what data is supposed to be backed up, what limitations there are to ensure consistency (cold/hot/inc/etc), to monitor that that's actually what does happen and that it keeps happening as the application and server configuration evolves.
Backups of dubious value do not seem to be a rarity in enterprise settings.

Re:See it as an opportunity (Score:4, Insightful)

by AnotherUsername ( 966110 ) writes: on Sunday October 11, 2009 @06:52AM (#29710055)

Now is the opportunity for opensource to show what it's good for. Someone whip together a small app to extract all info from the Sidekick, put it up on sourceforge for FREE and you have tons of goodwill for OSS. Of course, the app should be Linux-only, thus forcing all Sidekick users to install Ubuntu...

Thus eliminating any goodwill that would have been gained...

Really, if you think that open source is a viable option for the masses, you shouldn't care which operating system a powerful application like the one you describe is on. If you really care about using open source for goodwill, releasing it simultaneously on all operating systems should be your goal. How is forcing people to use Ubuntu via software applications any different from Microsoft forcing people to use Windows via software applications?

Re:"they should have used ZFS or btrfs" (Score:3, Insightful)

by Anonymous Coward writes: on Sunday October 11, 2009 @06:58AM (#29710079)

Repeat after me, you haven't got backups unless you've tested RESTORES.

Thin client: Android, too? (Score:3, Insightful)

by KlaymenDK ( 713149 ) writes: on Sunday October 11, 2009 @07:12AM (#29710135) Journal

Reportedly sidekicks are thin clients, other than making phone calls, everything on the phone is saved on the server side. Which is a special kind of retarded
Isn't that also how Android works?
I mean sure, the apps and such are on internal flash, but it's a different story for your "important" data such as email or contacts list. Heck, as I've learned, one can't even read one's existing ("synced") email without a working web connection. How they can call that "syncing", and what it's doing besides simple header indexing, is beyond me.
This is another reason I am loath to trust "the cloud" -- if I know I can be self-sufficient (in a data accessibility context), that's going to be much better than storing things on a corporate server and hope that said corporation is not going to, um, fall from the sky.

RIP Sidekick (Score:5, Insightful)

by drinkypoo ( 153816 ) writes: <drink@hyperlogos.org> on Sunday October 11, 2009 @07:13AM (#29710145) Homepage Journal

With all the competition in the smartphone market today, this is probably an unrecoverable error. If they manage to recover the data then they will come off as heroes for having the courage to tell their customers promptly. Otherwise they just look like they are: incompetent. No great loss, though.

Irresponsibility to EPIC proportions. (Score:3, Insightful)

by MrCrassic ( 994046 ) writes: <<li.ame> <ta> <detacerped>> on Sunday October 11, 2009 @07:25AM (#29710175) Journal

HOW THE HELL DO THEY NOT HAVE OFF-SITE TAPE BACKUPS????
So essentially, everybody's Sidekick backup data, which is apparently critical should they ever lose power, was all concentrated on A SINGLE SERVER? I hope they at least say their tape backups caught fire and their replicated server died on the same day too...
Their retentions lines are going to be hot this Columbus Day weekend! The iPhone is getting cheaper...

Re:"they should have used ZFS or btrfs" (Score:5, Insightful)

by petes_PoV ( 912422 ) writes: on Sunday October 11, 2009 @07:35AM (#29710209)

It's not a backup unless you can prove it will restore. Until then it's just a waste of tape, or disk, and time
The point about backups is not to tick the box saying "taken backup?" but to provide your business / customers / whatever with a reliable last resort for restoring almost all their data. If you don't have 100% certainty that it will work, you don't have a backup.

Huh? (Score:3, Insightful)

by msauve ( 701917 ) writes: on Sunday October 11, 2009 @07:36AM (#29710213)

"incremental..."weekly binary difference"

Uh, those would do nothing in this case, where it appears the entire DB has been lost. You need a regular full backup, or diffs and incrementals are just cruft. It appears they don't even have that, since there's no talk of restoring to month (or ?) old data.

Re:Irresponsibility to EPIC proportions. (Score:3, Insightful)

by AHuxley ( 892839 ) writes: on Sunday October 11, 2009 @07:48AM (#29710259) Journal

"Back him up, boys!"
T-Mobile says, "but I thought you were going to back us up!"
Robbie says, "We didn't get rich buying a lot of servers, you know!"

Re:"they should have used ZFS or btrfs" (Score:2, Insightful)

by cupantae ( 1304123 ) writes: <maroneill&gmail,com> on Sunday October 11, 2009 @09:11AM (#29710629)

When I read that you had quoted "I really doubt this service was ran without backups," I twitched and the thought
I know it's bad grammar, but let's just ignore it, please
was loud in my ears. I was so relieved when I saw that you weren't mentioning it. I don't know what this makes me, but it happens all the time. I'm definitely bothered by poor grammar and spelling, but I want no one to ever point it out.

Re:"they should have used ZFS or btrfs" (Score:3, Insightful)

by Antique Geekmeister ( 740220 ) writes: on Sunday October 11, 2009 @09:46AM (#29710779)

I've had something like that happen. The recovery system for a partner had never been tested with a _full_ recovery, only with recovering a few selected files. But because someone decided to get cute with the backup system to pick and choose which targets got backed up, individual directories each got their own backup target. Thousands and thousands of them. And the backup system had a single tape drive, not a changer.
The result was that to restore the filesystem, the tapes had to be swapped in and out to get the last full dump, then the incremental dump, of _each_ of the thousands of targets. Fortunately for them, I managed to liberate an under-used tape library, but the incredible amount of time having the tape drive grind back and forth to find the different targets on each tape was also incredibly nasty. We helped them find other solutions for that issue, but it was nasty to clean up. And unfortunately for them, they didn't _have_ a large enough repository to have tested the full restoration procedure.
The point is that "random checks" are not enough. You have to actually do a full test, once a year. This is also why I despise people who sell monolithic, "high availability" storage systems that are not partitioned enough to create a mirror of your active data anywhere.

You assume Danger used a MSFT platform (Score:4, Insightful)

by xswl0931 ( 562013 ) writes: on Sunday October 11, 2009 @11:05AM (#29711131)

Looking at the timeframe that Danger was acquired by MSFT and that the Danger OS was likely based on NetBSD (http://en.wikipedia.org/wiki/Danger_Hiptop), it's more likely that Danger was still using NetBSD as their Server Software and this was merely a process issue. Blaming it on the "Microsoft Platform" without any real data is just spreading FUD.

Re:It's The Backups Stooped (Score:5, Insightful)

by 1s44c ( 552956 ) writes: on Sunday October 11, 2009 @11:54AM (#29711391)

But the fact that there is no backup anywhere indicates brazen negligence on the part of everyone responsible for the data. Everyone who had a part in designing the system and managing the system is culpable. The most ridiculous part of this is the over-reliance on server-side data storage by the sidekick designers.
I will bet you there were good people -SCREAMING- to fix the backups, implement and test failover and all sorts of other good things. In my experience things like this are due to management refusing to spend money fixing problems that have not lost customers yet.

Re:Claimed information from the inside (Score:2, Insightful)

by Anonymous Coward writes: on Sunday October 11, 2009 @12:14PM (#29711521)

This doesn't mean the data is "gone", it means that most likely a bunch of disk with user data have had their metadata changed and perhaps a bit of new data has overwritten them. Reformatting drives or changing the RAID configuration doesn't delete data, it just makes it inconvenient to access it. Unless their SAN is designed to magically write zeros over every disk within a few minutes of a configuration change, at least some data is still there. How hard it is to access it depends on how much support they can get from the people who designed the storage system (file system, database, or a raw object store of some kind).

Autorestore - multiple birds one stone. (Score:3, Insightful)

by Colin Smith ( 2679 ) writes: on Sunday October 11, 2009 @01:17PM (#29711887)

To the standby or testing system. Our staging/testing systems all run yesterday's production data, restored from the most recent backup.
if your backups don't work then neither will your test/staging server... Which will be noticed.
What do you get?
* Backups tested every day.
* A test/staging/standby system identical to the production.
* Something the business can run all the crappy queries they like against without affecting the production system.

The value of data (Score:5, Insightful)

by symbolset ( 646467 ) writes: on Sunday October 11, 2009 @01:49PM (#29712067) Journal

Granted, this isn't cheap, but our data isn't either.

Microsoft bought Danger for half a billion dollars. Current estimates of the value of this data are roughly... half a billion dollars, plus a little. There's little doubt that in addition to destroying the entire value of the acquisition they've created a connection between "Microsoft", "Danger" and "data loss". In their release T-Mobile isn't being shy about tying those things together. Not good. That's going to have impacts even for some completely unrelated cloud-based products like Azure [microsoft.com].
Somebody's about to get a really awkward performance review.

Re:As if millions... (Score:2, Insightful)

by davester666 ( 731373 ) writes: on Sunday October 11, 2009 @02:31PM (#29712301) Journal

It is really 'backup' data?
From the sounds of it, each Danger phone loads its data from the 'cloud' whenever it's powered on, and syncs the data as it changes. To me, this makes the 'cloud' the live data store, and the phone just the local cache...

Re:You assume Danger used a MSFT platform (Score:3, Insightful)

by xswl0931 ( 562013 ) writes: on Sunday October 11, 2009 @06:46PM (#29713905)

You assure us anonymously without any proof? Of course.

Re:"they should have used ZFS or btrfs" (Score:2, Insightful)

by Anonymous Coward writes: on Sunday October 11, 2009 @11:50PM (#29715485)

Nice background, but all useless when the problem they had was morons upgrading the SAN firmware without a proper backup...

Re:"they should have used ZFS or btrfs" (Score:2, Insightful)

by kiwi-backup ( 1648301 ) writes: on Monday October 12, 2009 @02:29AM (#29716109) Homepage

Backup is expensive. Disaster recovery exercise is very expensive and bring no extra value to the customer. Managers wants more value for the customer to get more money, no extra expense. It's very hard for the security team to get some time on this kind of things.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Server Failure Destroys Sidekick Users' Backup Data 304

Server Failure Destroys Sidekick Users' Backup Data More Login

Server Failure Destroys Sidekick Users' Backup Data

"they should have used ZFS or btrfs" (Score:5, Insightful)

Re:"they should have used ZFS or btrfs" (Score:5, Insightful)

Re:Backups? (Score:5, Insightful)

It's The Backups Stooped (Score:5, Insightful)

Re:A server failure? (Score:4, Insightful)

WTF (Score:5, Insightful)

Re:"they should have used ZFS or btrfs" (Score:5, Insightful)

Re:"they should have used ZFS or btrfs" (Score:5, Insightful)

Re:See it as an opportunity (Score:4, Insightful)

Re:"they should have used ZFS or btrfs" (Score:3, Insightful)

Thin client: Android, too? (Score:3, Insightful)

RIP Sidekick (Score:5, Insightful)

Irresponsibility to EPIC proportions. (Score:3, Insightful)

Re:"they should have used ZFS or btrfs" (Score:5, Insightful)

Huh? (Score:3, Insightful)

Re:Irresponsibility to EPIC proportions. (Score:3, Insightful)

Re:"they should have used ZFS or btrfs" (Score:2, Insightful)

Re:"they should have used ZFS or btrfs" (Score:3, Insightful)

You assume Danger used a MSFT platform (Score:4, Insightful)

Re:It's The Backups Stooped (Score:5, Insightful)

Re:Claimed information from the inside (Score:2, Insightful)

Autorestore - multiple birds one stone. (Score:3, Insightful)

The value of data (Score:5, Insightful)

Re:As if millions... (Score:2, Insightful)

Re:You assume Danger used a MSFT platform (Score:3, Insightful)

Re:"they should have used ZFS or btrfs" (Score:2, Insightful)

Re:"they should have used ZFS or btrfs" (Score:2, Insightful)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot