Don’t Blame T-Mobile, Blame Bad Planning

UPDATE (Oct 20, 2009): Microsoft and Danger has since been able to recover all the lost data, which is good.  It seems that someone jumped the gun on the whole "unrecoverable" thing.  It still brings to light now important it is to backup your data.

On a side note, last night I accidentally deleted 10 years of photos I'd taken during a move from one backup system to a new system.  Sadly, during my attempt to create a fat finger recovery system, I fat fingered my data.  Though I have some offsites and replicated data, I did sadly lose all of my photos from my Summer in Europe and my two cousin's weddings. =(

Just goes to show how important it is to adequately backup your data!


It’s no question that there are a lot of angry Sidekick users this week.  T-Mobile has now confirmed that all data for their Sidekick users has been completely lost and unrecoverable.  Understandably, their customers are pissed. 

Most techy type users understand that data outages can and do happen.  A bad router or DOS attack can easily take down a service.  On the other hand, total data loss has never been and never will be acceptable. 

Sold upon the idea that their data is safely backed up on T-Mobile’s backend, their trust in T-Mobile, and possibly technology itself, is completely shattered.  But is it really T-Mobile’s fault?  I would contend it’s not and that the majority of the user hatred should direct itself toward the IT managers of Microsoft and Danger, the creator of the Sidekick.

Now I cannot say I know anything about any of these companies since I have had no interactions with any of them.  Everything I know is from reports, comments and other assumed information from the fallout.

In data storage, there is simply once law that every IT staff member must follow at all times...never lose the data.

It’s a simple concept that anyone can understand.  If your business involves storing data, you have to protect the data!  There are many ways to store and protect it, all of which vary in cost and complexity.  Though storage itself does have a cost, how does that cost compare to losing your customer’s data?  T-Mobile is learning that cost quickly and painfully.

It is simply irresponsible and unethical to not protect data with as many means as you can.  As Danger has now learned, Murphy’s Law applies to everyone...always.  Apparently, Danger’s management deemed it too costly or of no value to provide adequate systematic backups of their customer’s data.  They have put all their trust into the very systems they developed without any actions or plans for true failure.  Even the most green of IT staffers know the danger of putting your trust into a single point of failure.  I can’t believe that the IT staffer would blindly place their trust in any system, so in my opinion, the responsibility and decisions must have fallen the management team itself.

As an IT administrator, I’ve learned that every IT system needs two action plans for data recovery.  These action plans were obviously not implemented by Danger, and I hope this information is useful to IT staffs who are thinking about their data deployments.

The Act of God Plan

While data itself exists as electronic bits, it physically must be stored somewhere.  This physical location is subject to the same dangers as any building.  Do you have a plan in case your building burns to the ground?  Do you have a plan if a tornado rips apart your data center?  Do you have a plan if the state of California falls into the sea?

These situations, while remote, are actual concerns that IT departments must address.  Store your data in multiple physical locations!  Further locations will minimize the risk of a single destructive force, but even having your data in two separate campus buildings is safer than one.  While renting a second data center has cost, what is the cost of losing your customers data?  How much does your brand suffer when your users lose all their trust in you?

Fat Finger Recovery Plan

IT staffers are generally human, and as such, they can and do make mistakes.  There’s a very small difference between deleting an old user account and deleting ALL user’s account.  While it’s best to avoid and minimize these incidents, they can happen and you have to be prepared.

Database transactions, data replication, or progressive snapshots are all simple ways to protect yourself from these situations.  While a rollbacks are time consuming, data loss or reconstruction is much worse.  Management needs to consider this factor when planning their deployments.  Again, it may be of significant cost to protect data, how can you ethically not?  Is saving a few hundred thousand dollars worth the risk of alienating a million users and the potential lawsuits that will follow?

Now I would not say that T-Mobile is completely blameless in this situation.  T-Mobile management should have had the forethought to pressure and ask their partners how they are handling their data.  As an manager, you must think about the risks to your company, both internal and external.  If an external company is handling your mission critical data, you have have an ethical and moral responsibility to ask them about their data security and protection.

T-Mobile is in a tight bind now.  While they don’t shoulder the entire blame of the Sidekick data fiasco, in their user’s mind, they are the sole target of the wrath.