Slashdot Log In
To Purge Or Not To Purge Your Data
Posted by
CmdrTaco
on Thu Sep 18, 2008 10:15 AM
from the i-much-prefer-the-binging-part dept.
from the i-much-prefer-the-binging-part dept.
Lucas123 writes "The average company pays from $1 million to $3 million per terabyte of data during legal e-discovery. The average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up — so a 5,000-worker company will pay out $1.25 million for five years of storage. So while you need to pay attention to retaining data for business and legal requirements, experts say you also need to be keeping less, according to a story on Computerworld. The problem is, most organizations hang on to more data than they need, for much longer than they should. 'Many people would prefer to throw technology at the problem than address it at a business level by making changes in policies and processes.'"
Related Stories
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Easier to keep (Score:5, Insightful)
Re:Easier to keep (Score:5, Insightful)
True, proper archiving takes huge amounts of time since it adds overhead to your operation.
In an ideal world, everything that you store is automatically labeled and old data will automagically be purged. But storing all kinds of shit is just that much easier. It also doesn't help that data storage is so dirtcheap. 1TB can be bought for around $100 if I am not mistaken. It doesn't pay to kill old useless stuff you have floating on your hard disk.
Parent
Re:Easier to keep (Score:4, Insightful)
Yes, less data need to be kept, but first there needs to be a _massive_ re-education of the 'data packrat' culture that the users of it have.
Parent
Re:Easier to keep (Score:4, Interesting)
you'll need to filter your 'customer communications' from your 'shopping lists'
Actually, I thought it was a fairly common legal tactic to make the data as difficult to actually find as possible, without revealing too much to the other side.
"They want records from three years ago? Send a truck with printouts of all the files we have, that'll keep them busy..."
Does anyone know that this is no longer the case?
Parent
Re:Easier to keep (Score:5, Interesting)
As time went on, fewer things ended up on paper, but the rules of discovery didn't evolve. That was the time of backing up a U-Haul full of printed out copies of every file, e-mail, etc. that a company had. Now the opposition had to dig through mounds of trash in the hopes that they will find that one incriminating document.
Then attorneys got more savvy, and in the so-called Rule 26 (refers to the Federal Rules of Civil Procedure), the attorneys would agree on the format of ESI to be exchanged. In December, 2006, the Federal Rules of Civil Procedure changed to directly address ESI and electronic discovery.
Now, in litigation, parties may still get obnoxious amounts of data, but it's electronic. Once it's processed and converted (usually to TIFFs with extracted text, but sometimes PDF), attorneys can do what amounts to a Google search through the files and find what they want pretty quickly. In fact, paper documents are usually scanned and OCRed so they can be handled and searched in the same manner.
Actually, I thought it was a fairly common legal tactic to make the data as difficult to actually find as possible, without revealing too much to the other side.
"They want records from three years ago? Send a truck with printouts of all the files we have, that'll keep them busy..."
Does anyone know that this is no longer the case?
So no, it's no longer the case. But the first guy who did it must have thought he was pretty funny.
Parent
Re:Easier to keep (Score:5, Insightful)
Cheaper to keep. Every hour I waste cleaning house costs more than it does to keep it stored. Storage continues to get cheaper, salaries typically don't. Sure, that $1.25M is a big scary number. But nothing compared to the salaries/benefits at a 5000 person company. Now you can argue the cost of data retrieval goes way up because chances are it'll take a hell of a lot longer to find, but that's a different argument altogether and you can just as easily question what the cost of not being able to recover something that was cleaned by accident is.
Parent
Re:Easier to keep (Score:4, Interesting)
Lovely scaremongering, but what did they mean by legal e-discovery? The time it takes to sort through the data or what?
Parent
Re: (Score:3, Insightful)
Yes--deleting costs money! (Score:5, Insightful)
I did a back-of-the-envelope calculation on just this question in 2004, and estimated that file deletion was not productive unless we could do it at a rate of at least 17MB per minute (of labor). Four years later the threshold is probably at least 45MB per minute.
Generally, this means that if we can blow away whole disks or huge directories of data, it may pay off. Users going through their files one by one is usually an absolute waste.
Parent
Re: (Score:3, Insightful)
The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping or training staff to organize their data and retain only that which is necessary .
There, fixed that for you. Meta-tags and other efforts might change this in the future, but until there is a generalized understanding of things that should be archived and things that should not, and a better way to store, find, retrieve, and utilize company data, there will be tons of data save
Re:Easier to keep (Score:5, Insightful)
The bigger problem is that you will fight different battles. If you're fighting a sales rep that sold your clients to a competitor, you want as much ammunition as possible. If a client is suing you for incorrect information relayed 8 years ago and you're probably guilty, you want as little information as possible.
Parent
My last job (Score:3, Interesting)
This company occasionally needed blueprints from the 1930s/1940s (great lakes ships), but none of their ships went back much
Re:Easier to keep (Score:4, Interesting)
Result, look up a customer and you would find some files scanned half a dozen times.
Parent
Huh? (Score:5, Insightful)
10 GB user data? Not likely (Score:5, Insightful)
10 GB of data per user, sure.
10 GB of user data, no way.
If assuming 300 work days per employee, that would mean that the average employee creates 1.2 kB of data per second.
The only way this could be true is if you count data that isn't user generated, and they count the total data storage for the company and divide it by employees.
If so, users deleting their e-mails won't have much of an effect.
Re: (Score:3, Funny)
If assuming 300 work days per employee, that would mean that the average employee creates 1.2 kB of data per second.
Top posting and absence of editing by Microsoft Outlook users engaged in a brief inter-departmental discussion could easily account for that volume.
Is that what you meant by "isn't user generated"?
It's not the storage... it's the apps (Score:4, Insightful)
Apps aren't really well designed for this in mind. They don't come at the problem from a "document lifecycle" perspective but instead a "document creation".
This is generally because data has a variable lifespan. Lets take an email as part of a project as an example. As the author I may decide that the email isn't needed after a week so set an expiry of 1 week. But you, as the recipient, may take that email and turn that into several tasks so for you the email is much more important and thus want to keep it for much longer.
Users aren't really going to be good at making these decisions unless some application continually bombards them with "go check the status of these 1000 documents you've got".
Mod parent way up! (Score:4, Interesting)
Congratulations. You're the first person I've seen who understands that.
Accounting understands the need to close one year and open the next. They have processes for what is carried over and how it is identified.
Yet no other department (or application) understands the need to close old data and archive it.
Parent
Email Attachments (Score:5, Insightful)
Then again, I'm biased - I believe email should just be pure text. Perhaps that's a sign that I'm now old...
Communicate less (Score:3, Interesting)
easy solution (Score:3, Funny)
put everything on one disk drive, unRAIDed. when it fails, problem solved. voila, built in obsolescence
Re: (Score:3, Insightful)
Re:hmm (Score:5, Interesting)
They called it 'desk cleanout day', and unless you were an official dedicated contact on a particular subject you were to wipe all correspondence of more than a year old.
(There were also other grades of information, but erase after a year was the default).
Parent
Re: (Score:3, Insightful)
Additionally, there are many businesses that don't understand their data retention requirements beyond 'we need to keep some data for 10 years', so instead of compartmentalizing their data and saying 'keep this for 10 years, that for 5 years, and purge this every year and that every 3 months', they just keep everything. Further, if they have a data retention requirement for 3 years or 10 years, they might wait longer before purging it just because it's easier to keep it then it is to go find and remove the
Re: (Score:3, Insightful)
Unfortunately, writable DVDs are not an acceptable archive medium, and a stack of disks with written labels is not an indexing solution that will scale beyond one person.