Forgot your password?
typodupeerror
Security Data Storage IT

To Purge Or Not To Purge Your Data 190

Posted by CmdrTaco
from the i-much-prefer-the-binging-part dept.
Lucas123 writes "The average company pays from $1 million to $3 million per terabyte of data during legal e-discovery. The average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up — so a 5,000-worker company will pay out $1.25 million for five years of storage. So while you need to pay attention to retaining data for business and legal requirements, experts say you also need to be keeping less, according to a story on Computerworld. The problem is, most organizations hang on to more data than they need, for much longer than they should. 'Many people would prefer to throw technology at the problem than address it at a business level by making changes in policies and processes.'"
This discussion has been archived. No new comments can be posted.

To Purge Or Not To Purge Your Data

Comments Filter:
  • Re:hmm (Score:5, Interesting)

    by MrMr (219533) on Thursday September 18, 2008 @10:34AM (#25054801)
    The top 500 company I worked for did just the opposite: Destroy all data in case a legal issue comes up.
    They called it 'desk cleanout day', and unless you were an official dedicated contact on a particular subject you were to wipe all correspondence of more than a year old.
    (There were also other grades of information, but erase after a year was the default).
  • My last job (Score:3, Interesting)

    by dj245 (732906) on Thursday September 18, 2008 @10:50AM (#25055067) Homepage
    My last job had some files from the 1890's. The company had moved from New York to New Jersey to Houston in all that time. I can't imagine that material would ever need to be used, or would be called up during a legal investigation. Even if it were, would the authorities penalize a company for files that were that old??? At some point, everything is trashable or museum material.

    This company occasionally needed blueprints from the 1930s/1940s (great lakes ships), but none of their ships went back much further than that.
  • Communicate less (Score:3, Interesting)

    by Yvanhoe (564877) on Thursday September 18, 2008 @10:51AM (#25055083) Journal
    In a world where backup takes money, a law that says to companies "keep every communication backuped" is saying essentially the same thing as "communicate less".
  • Re:Easier to keep (Score:4, Interesting)

    by COMON$ (806135) * on Thursday September 18, 2008 @10:52AM (#25055099) Journal
    What I want to know is how these numbers are broken down. $5 per gigabyte to back up? Maybe if you factor in the cost of a robotic library. Considering that tapes currently run about $30 a pop for for 800GB and that I am on a 12 month rotation, I still don't come NEAR that price. 1.25 million for a 5000 person company? What kind of company? 10GB average is about 9GB over my average user here. Even when I worked at a larger company, we still weren't even breaching 700MB average INCLUDING e-mail.

    Lovely scaremongering, but what did they mean by legal e-discovery? The time it takes to sort through the data or what?

  • by brunes69 (86786) <slashdot AT keirstead DOT org> on Thursday September 18, 2008 @11:17AM (#25055503) Homepage

    My 10GB mail box in outlook, when mirrored to my local hard drive in MBOX format, automagically becomes 2 GB - and that's before compression and attachment pruning.

    I have no idea what the hell Outlook is doing on the server, if it is just storing things in multiple formats at once or if it is just mis-calculating all the space, but that is one hell of a difference.

  • Re:Easier to keep (Score:4, Interesting)

    by Chrisq (894406) on Thursday September 18, 2008 @11:18AM (#25055511)
    We went paperless, and when application forms, etc. arrive they are scanned and stored. Examination of the data shown that very often people would print out all the existing infromation on a customer and add it to the pile sent for scanning.

    Result, look up a customer and you would find some files scanned half a dozen times.
  • Re:Easier to keep (Score:4, Interesting)

    by BobMcD (601576) on Thursday September 18, 2008 @11:29AM (#25055719)

    you'll need to filter your 'customer communications' from your 'shopping lists'

    Actually, I thought it was a fairly common legal tactic to make the data as difficult to actually find as possible, without revealing too much to the other side.

    "They want records from three years ago? Send a truck with printouts of all the files we have, that'll keep them busy..."

    Does anyone know that this is no longer the case?

  • Re:Easier to keep (Score:3, Interesting)

    by Geoffrey.landis (926948) on Thursday September 18, 2008 @11:42AM (#25055959) Homepage

    The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping or training staff to organize their data and retain only that which is necessary .

    There, fixed that for you.

    According to the original article, ("The average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up ") the cost of backups is fifty dollars a year per employee.

    So if that an average employee costs the company $100 per hour (including overhead), then if "training training staff to organize their data and retain only that which is necessary" takes more than half an hour per year, it's more cost effective to archive the junk than it is to train the employees to sort it.

  • Re:Easier to keep (Score:5, Interesting)

    by cmause (903686) on Thursday September 18, 2008 @11:53AM (#25056165)
    There used to be a sort of gentlemen's agreement between attorneys to not dig in to electronically stored information (ESI). That was back when everything important ended up on paper anyway, which was discoverable.

    As time went on, fewer things ended up on paper, but the rules of discovery didn't evolve. That was the time of backing up a U-Haul full of printed out copies of every file, e-mail, etc. that a company had. Now the opposition had to dig through mounds of trash in the hopes that they will find that one incriminating document.

    Then attorneys got more savvy, and in the so-called Rule 26 (refers to the Federal Rules of Civil Procedure), the attorneys would agree on the format of ESI to be exchanged. In December, 2006, the Federal Rules of Civil Procedure changed to directly address ESI and electronic discovery.

    Now, in litigation, parties may still get obnoxious amounts of data, but it's electronic. Once it's processed and converted (usually to TIFFs with extracted text, but sometimes PDF), attorneys can do what amounts to a Google search through the files and find what they want pretty quickly. In fact, paper documents are usually scanned and OCRed so they can be handled and searched in the same manner.

    Actually, I thought it was a fairly common legal tactic to make the data as difficult to actually find as possible, without revealing too much to the other side.

    "They want records from three years ago? Send a truck with printouts of all the files we have, that'll keep them busy..."

    Does anyone know that this is no longer the case?

    So no, it's no longer the case. But the first guy who did it must have thought he was pretty funny.

  • Mod parent way up! (Score:4, Interesting)

    by khasim (1285) <brandioch.conner@gmail.com> on Thursday September 18, 2008 @12:13PM (#25056463)

    Congratulations. You're the first person I've seen who understands that.

    Accounting understands the need to close one year and open the next. They have processes for what is carried over and how it is identified.

    Yet no other department (or application) understands the need to close old data and archive it.

  • Re:hmm (Score:3, Interesting)

    by Lumpy (12016) on Thursday September 18, 2008 @01:23PM (#25057677) Homepage

    That was a common company wide AT&T policy wipe everything after 60 days. all email to be deleted after 60 days. it was a fireable offense for creating a pst file on your desktop and we did a regular sweep for pst files on corperate pc's on a regular basis.

    It really did not stop anyone from keeping info, many managers simply printed out the emails and kept them in files, one IT manager we let go had 3 years of email printed and stored in file cabinets in his office. it was insane.

  • Re:Easier to keep (Score:2, Interesting)

    by euri.ca (984408) on Thursday September 18, 2008 @01:50PM (#25058193) Homepage

    Lets not get snippy here, but I think the consensus is that:

    • $5/GB is reasonable (or low) for hardcore backups like the source tree, accounting records (anything where you have a person verifying that it's there is super expensive by default)
    • 90-100% of what any typical user makes (the 5GB/year figure) doesn't (or at least shouldn't) make its way into the expensive storage. But it might anyway, because your options for backing up email easily are limited.

    Of the 30 gigs of things I've put on this laptop this year, maybe 100 megs have been checked-in to CVS (and the expensive backups), I doubt accounting and HR have generated another 4.9Gigs on me this year.

  • by Doc Ruby (173196) on Thursday September 18, 2008 @02:36PM (#25059071) Homepage Journal

    Let's say your corp is more than 50% likely to go through "e-discovery" once every 10 years. Each worker will generate 10GB * 10 years = 100GB, backing up all the increasing data pile is (pairing the balancing ends of the accumulation for half the accumulation years) 101GB * 5 = 505GB, at $5:GB is $2525, plus about $2M:TB / 505GB = $1.01M, for a total of $1,012,525 per worker, times at least 0.50 probability is at least $506,262 average predictable cost per employee.

    One approach is to keep much less data. But when you keep less data, you have to guess right every time what data you'll need later. If your process discards data that's valuable later (but lost) it better be worth less than the amount you save. That's too hard to know, which is one reason companies keep all the data, and figure it out later.

    A better approach is just to cut that $1-3M:TB e-discovery cost. Of course, the best way is to avoid being investigated, but one has less than 100% control over that, especially from inside the IT department. A much better way to do it is to better inventory the data stored as you go along accumulating it, in the terms in which a later e-discovery would search it. Which also can have the benefit of making the info in the data more available in the normal course of business, which can make that data's increased value (and lowered costs of searching it) worth the entire process. The cheaper possible e-discovery would be just a bonus.

    What really gets me is how these economics are the true cost of storage. A 1TB drive costs $120, and maybe a better 1TB in a 100% redundant RAID costs $250. But it really costs something like $300,000 over its lifetime (probably replaced every 3 or so years, across the 10 years I analyzed). If IT spent a few hundred hours a year streamlining the navigation of all that data, at a cost of a few dozens of thousands of dollars, divided across all those employees, the entire org's IT operations would be much more economical, when the large cumulative risk of e-discovery costs are factored into the true cost.

"Everything should be made as simple as possible, but not simpler." -- Albert Einstein

Working...