Forgot your password?
typodupeerror
Security Data Storage IT

To Purge Or Not To Purge Your Data 190

Posted by CmdrTaco
from the i-much-prefer-the-binging-part dept.
Lucas123 writes "The average company pays from $1 million to $3 million per terabyte of data during legal e-discovery. The average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up — so a 5,000-worker company will pay out $1.25 million for five years of storage. So while you need to pay attention to retaining data for business and legal requirements, experts say you also need to be keeping less, according to a story on Computerworld. The problem is, most organizations hang on to more data than they need, for much longer than they should. 'Many people would prefer to throw technology at the problem than address it at a business level by making changes in policies and processes.'"
This discussion has been archived. No new comments can be posted.

To Purge Or Not To Purge Your Data

Comments Filter:
  • Easier to keep (Score:5, Insightful)

    by Geoffrey.landis (926948) on Thursday September 18, 2008 @09:18AM (#25054505) Homepage
    The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping.
  • Re:Easier to keep (Score:5, Insightful)

    by Daimanta (1140543) on Thursday September 18, 2008 @09:22AM (#25054573) Journal

    True, proper archiving takes huge amounts of time since it adds overhead to your operation.

    In an ideal world, everything that you store is automatically labeled and old data will automagically be purged. But storing all kinds of shit is just that much easier. It also doesn't help that data storage is so dirtcheap. 1TB can be bought for around $100 if I am not mistaken. It doesn't pay to kill old useless stuff you have floating on your hard disk.

  • Huh? (Score:5, Insightful)

    by qoncept (599709) on Thursday September 18, 2008 @09:22AM (#25054575) Homepage
    $250k a year for a 5000 employee company? To put it in perspective, if the average employee at this company is making $60k a year, this company will be paying $1.5 billion in salaries over the same 5 years. To be fair, I think the estimated cost from the article is very much underestimated. But while corporate storage costs more than you'd think, and companies are definately storing a whole bunch of data they don't need, what about the costs of reviewing and purging that data? That is straight up time, whether it's reviewing existing data or spending the time to create guidelines for which data to keep. And time costs money. More than storage.
  • Re:Easier to keep (Score:4, Insightful)

    by Sobrique (543255) on Thursday September 18, 2008 @09:24AM (#25054597) Homepage
    Add to that legal requirements of retention - you'll need to filter your 'customer communications' from your 'shopping lists'. That's what actually makes this a nuisance - the possibility that there will be legal action in 5 years time, that you'll need to fight.

    Yes, less data need to be kept, but first there needs to be a _massive_ re-education of the 'data packrat' culture that the users of it have.

  • by arth1 (260657) on Thursday September 18, 2008 @09:27AM (#25054659) Homepage Journal

    10 GB of data per user, sure.
    10 GB of user data, no way.
    If assuming 300 work days per employee, that would mean that the average employee creates 1.2 kB of data per second.

    The only way this could be true is if you count data that isn't user generated, and they count the total data storage for the company and divide it by employees.
    If so, users deleting their e-mails won't have much of an effect.

  • Re:Easier to keep (Score:5, Insightful)

    by sunking2 (521698) on Thursday September 18, 2008 @09:28AM (#25054679)

    Cheaper to keep. Every hour I waste cleaning house costs more than it does to keep it stored. Storage continues to get cheaper, salaries typically don't. Sure, that $1.25M is a big scary number. But nothing compared to the salaries/benefits at a 5000 person company. Now you can argue the cost of data retrieval goes way up because chances are it'll take a hell of a lot longer to find, but that's a different argument altogether and you can just as easily question what the cost of not being able to recover something that was cleaned by accident is.

  • by paulhar (652995) on Thursday September 18, 2008 @09:29AM (#25054703)

    Apps aren't really well designed for this in mind. They don't come at the problem from a "document lifecycle" perspective but instead a "document creation".

    This is generally because data has a variable lifespan. Lets take an email as part of a project as an example. As the author I may decide that the email isn't needed after a week so set an expiry of 1 week. But you, as the recipient, may take that email and turn that into several tasks so for you the email is much more important and thus want to keep it for much longer.

    Users aren't really going to be good at making these decisions unless some application continually bombards them with "go check the status of these 1000 documents you've got".

  • Re:hmm (Score:3, Insightful)

    by NoisySplatter (847631) <noisysplatter@nOspam.gmail.com> on Thursday September 18, 2008 @09:31AM (#25054751)
    It's not so much that you want your company to have a leg to stand on, its that you don't want your legal opposition to get their foot in the door. Innocent until proven guilty remember?
  • Re:Easier to keep (Score:3, Insightful)

    by zappepcs (820751) on Thursday September 18, 2008 @09:38AM (#25054869) Journal

    The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping or training staff to organize their data and retain only that which is necessary .

    There, fixed that for you. Meta-tags and other efforts might change this in the future, but until there is a generalized understanding of things that should be archived and things that should not, and a better way to store, find, retrieve, and utilize company data, there will be tons of data saved that really should not be. Humans are like that.

  • Email Attachments (Score:5, Insightful)

    by whisper_jeff (680366) on Thursday September 18, 2008 @09:40AM (#25054903)
    I don't know what most major companies' policies are regarding backing up emails (just back up the text or back up emails plus attachments) but, as but one example, I'm sure this would be an easy spot for most companies to dramatically reduce the amount of storage space required. Most business communications I see from corporate personnel have various attachments on every email - things like logos, custom backgrounds, etc. Forget getting rid of all the unnecessary attachments - getting rid of the "look at my pretty email that looks like a page from a spiral-bound notebook with my company logo at the bottom" images, and the hundreds and thousands of duplicates of those images, would reduce storage requirements, bandwidth requirements, and probably make corporate communications look more, you know, professional. So many emails are filled with unnecessary garbage and, if that's being backed up, that garbage can get costly.

    Then again, I'm biased - I believe email should just be pure text. Perhaps that's a sign that I'm now old...
  • by PainKilleR-CE (597083) on Thursday September 18, 2008 @09:45AM (#25054989)

    Additionally, there are many businesses that don't understand their data retention requirements beyond 'we need to keep some data for 10 years', so instead of compartmentalizing their data and saying 'keep this for 10 years, that for 5 years, and purge this every year and that every 3 months', they just keep everything. Further, if they have a data retention requirement for 3 years or 10 years, they might wait longer before purging it just because it's easier to keep it then it is to go find and remove the 5 or 12 year old data.

    I only recently organized some data being maintained by the company I work for that was basically divided into 'archived' and 'live' data, logs generated by a many-user application. The 'archived' data went back 4 or 5 years with no easy distinction between data that was many years old and data that was generated in the most recent archive. Now at least the data is sorted by date (and being archived by date), so that when someone decides on how long we want to keep it (they can't seem to make up their mind, and while everyone seems to agree that we don't need data from 2005 and earlier, no one's willing to say I can delete it, either), it won't be hard to dump the older data at least on an annual or semi-annual basis.

  • Re:Easier to keep (Score:5, Insightful)

    by daeg (828071) on Thursday September 18, 2008 @09:49AM (#25055045)

    The bigger problem is that you will fight different battles. If you're fighting a sales rep that sold your clients to a competitor, you want as much ammunition as possible. If a client is suing you for incorrect information relayed 8 years ago and you're probably guilty, you want as little information as possible.

  • by Chris Mattern (191822) on Thursday September 18, 2008 @10:10AM (#25055371)

    Unfortunately, writable DVDs are not an acceptable archive medium, and a stack of disks with written labels is not an indexing solution that will scale beyond one person.

  • Re:Easier to keep (Score:3, Insightful)

    by TheRaven64 (641858) on Thursday September 18, 2008 @10:21AM (#25055561) Journal
    The $5 presumably includes the physical media, the backup operator's time spent configuring the system, the hardware for performing the backup, and the safe, secure, off-site storage costs. 10GB per years is a lot more than I produce - my PhD was only 1.5GB in total, including temporary files (build cruft and so on), with only 210MB needed for the subversion repository (176MB after bzip2) - the bzip2'd repository of my book (including all text and code examples) is only 4.6MB. My mail folder is only 3GB, and that contains over ten years of email messages (and would compress very well).

    On the other hand, I don't use Word, which manages to make single-page documents that are more or less plain text take up a few MBs. If you're in a company where everyone sends Word document attachments as emails instead of plain text (I've seen it done[1]) then you could probably generate 10-20MB of date per day from around 5KB of actual content, and backing this up might be cheaper than educating your users. Assuming some other work as well as emails this can easily get to 10GB.

    [1] Even worse was my publisher, who sent me a scanned version of a contract as a Word document. A PNG of the same image was around 100KB, while the word document was 5MB and contained nothing other than the image. A lot of people just treat Word documents as a default container format for any content.

  • by mkcmkc (197982) on Thursday September 18, 2008 @10:28AM (#25055695)

    I did a back-of-the-envelope calculation on just this question in 2004, and estimated that file deletion was not productive unless we could do it at a rate of at least 17MB per minute (of labor). Four years later the threshold is probably at least 45MB per minute.

    Generally, this means that if we can blow away whole disks or huge directories of data, it may pay off. Users going through their files one by one is usually an absolute waste.

  • Re:Easier to keep (Score:3, Insightful)

    by vvaduva (859950) on Thursday September 18, 2008 @10:29AM (#25055725)
    Well, I did not RTFA in detail but it does not seem to address key regulations like HIPAA and SOX which put hard numbers on data retention. So whether or not it's expensive, you have to do it if you want to be legit. If the issue is discovery, a sound archival system will eliminate expenses related to discovery and would allow one to provide requested information very quickly and efficiently. I say let the legal people fight discovery requests and unless you have something to hide, stick with the requirements for archival and retention. The argument "the less you keep the less they ask for" is simply stupid. In certain SOX-related situations, even the appearance of impropriety will come back to bite you, so I always tell folks to do the right thing, by running your business properly, identifying document types correctly and sticking to regulatory requirement as much as possible.
  • by RyansPrivates (634385) on Thursday September 18, 2008 @10:43AM (#25055979)
    I definitely see where you're coming from, and you SHOULD be right. However, this goes to the heart of the article: most companies are OVER-retaining their data. Backing up things that shouldn't be backed up, and retaining things beyond legal requirements or indefinitely.

    Additionally, even though we may not agree on the figures, we definitely agree that storage costs have exponentially decreased. This has led to the trend to just keep adding storage, as opposed to actually going through what is being stored and for how long.

    Like I stated in another post, this problem needs to be attacked from a business policy angle, not merely from a technological capacity (pun fully intended).

The ideal voice for radio may be defined as showing no substance, no sex, no owner, and a message of importance for every housewife. -- Harry V. Wade

Working...