Forgot your password?
typodupeerror
Security Data Storage IT

To Purge Or Not To Purge Your Data 190

Posted by CmdrTaco
from the i-much-prefer-the-binging-part dept.
Lucas123 writes "The average company pays from $1 million to $3 million per terabyte of data during legal e-discovery. The average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up — so a 5,000-worker company will pay out $1.25 million for five years of storage. So while you need to pay attention to retaining data for business and legal requirements, experts say you also need to be keeping less, according to a story on Computerworld. The problem is, most organizations hang on to more data than they need, for much longer than they should. 'Many people would prefer to throw technology at the problem than address it at a business level by making changes in policies and processes.'"
This discussion has been archived. No new comments can be posted.

To Purge Or Not To Purge Your Data

Comments Filter:
  • Easier to keep (Score:5, Insightful)

    by Geoffrey.landis (926948) on Thursday September 18, 2008 @10:18AM (#25054505) Homepage
    The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping.
    • Re:Easier to keep (Score:5, Insightful)

      by Daimanta (1140543) on Thursday September 18, 2008 @10:22AM (#25054573) Journal

      True, proper archiving takes huge amounts of time since it adds overhead to your operation.

      In an ideal world, everything that you store is automatically labeled and old data will automagically be purged. But storing all kinds of shit is just that much easier. It also doesn't help that data storage is so dirtcheap. 1TB can be bought for around $100 if I am not mistaken. It doesn't pay to kill old useless stuff you have floating on your hard disk.

      • Re:Easier to keep (Score:4, Insightful)

        by Sobrique (543255) on Thursday September 18, 2008 @10:24AM (#25054597) Homepage
        Add to that legal requirements of retention - you'll need to filter your 'customer communications' from your 'shopping lists'. That's what actually makes this a nuisance - the possibility that there will be legal action in 5 years time, that you'll need to fight.

        Yes, less data need to be kept, but first there needs to be a _massive_ re-education of the 'data packrat' culture that the users of it have.

        • Re:Easier to keep (Score:4, Interesting)

          by BobMcD (601576) on Thursday September 18, 2008 @11:29AM (#25055719)

          you'll need to filter your 'customer communications' from your 'shopping lists'

          Actually, I thought it was a fairly common legal tactic to make the data as difficult to actually find as possible, without revealing too much to the other side.

          "They want records from three years ago? Send a truck with printouts of all the files we have, that'll keep them busy..."

          Does anyone know that this is no longer the case?

          • Re:Easier to keep (Score:5, Interesting)

            by cmause (903686) on Thursday September 18, 2008 @11:53AM (#25056165)
            There used to be a sort of gentlemen's agreement between attorneys to not dig in to electronically stored information (ESI). That was back when everything important ended up on paper anyway, which was discoverable.

            As time went on, fewer things ended up on paper, but the rules of discovery didn't evolve. That was the time of backing up a U-Haul full of printed out copies of every file, e-mail, etc. that a company had. Now the opposition had to dig through mounds of trash in the hopes that they will find that one incriminating document.

            Then attorneys got more savvy, and in the so-called Rule 26 (refers to the Federal Rules of Civil Procedure), the attorneys would agree on the format of ESI to be exchanged. In December, 2006, the Federal Rules of Civil Procedure changed to directly address ESI and electronic discovery.

            Now, in litigation, parties may still get obnoxious amounts of data, but it's electronic. Once it's processed and converted (usually to TIFFs with extracted text, but sometimes PDF), attorneys can do what amounts to a Google search through the files and find what they want pretty quickly. In fact, paper documents are usually scanned and OCRed so they can be handled and searched in the same manner.

            Actually, I thought it was a fairly common legal tactic to make the data as difficult to actually find as possible, without revealing too much to the other side.

            "They want records from three years ago? Send a truck with printouts of all the files we have, that'll keep them busy..."

            Does anyone know that this is no longer the case?

            So no, it's no longer the case. But the first guy who did it must have thought he was pretty funny.

            • by kmac06 (608921)

              attorneys can do what amounts to a Google search

              No, they can do a search. Why compare it to Google?

    • Re:Easier to keep (Score:5, Insightful)

      by sunking2 (521698) on Thursday September 18, 2008 @10:28AM (#25054679)

      Cheaper to keep. Every hour I waste cleaning house costs more than it does to keep it stored. Storage continues to get cheaper, salaries typically don't. Sure, that $1.25M is a big scary number. But nothing compared to the salaries/benefits at a 5000 person company. Now you can argue the cost of data retrieval goes way up because chances are it'll take a hell of a lot longer to find, but that's a different argument altogether and you can just as easily question what the cost of not being able to recover something that was cleaned by accident is.

      • Re:Easier to keep (Score:4, Interesting)

        by COMON$ (806135) * on Thursday September 18, 2008 @10:52AM (#25055099) Journal
        What I want to know is how these numbers are broken down. $5 per gigabyte to back up? Maybe if you factor in the cost of a robotic library. Considering that tapes currently run about $30 a pop for for 800GB and that I am on a 12 month rotation, I still don't come NEAR that price. 1.25 million for a 5000 person company? What kind of company? 10GB average is about 9GB over my average user here. Even when I worked at a larger company, we still weren't even breaching 700MB average INCLUDING e-mail.

        Lovely scaremongering, but what did they mean by legal e-discovery? The time it takes to sort through the data or what?

        • My 10GB mail box in outlook, when mirrored to my local hard drive in MBOX format, automagically becomes 2 GB - and that's before compression and attachment pruning.

          I have no idea what the hell Outlook is doing on the server, if it is just storing things in multiple formats at once or if it is just mis-calculating all the space, but that is one hell of a difference.

          • by ckaminski (82854)
            What do you use to MBOX your outlook data, if I may ask?
            • by jgrahn (181062)

              What do you use to MBOX your outlook data, if I may ask?

              I cannot say what he does, but if the Sexchange server is open for IMAP, you can telnet to it and pass an IMAP command to dump everything in RFC 822 format. It ends up very close to mbox format; it might even have a _From line.

        • Re: (Score:3, Insightful)

          by TheRaven64 (641858)
          The $5 presumably includes the physical media, the backup operator's time spent configuring the system, the hardware for performing the backup, and the safe, secure, off-site storage costs. 10GB per years is a lot more than I produce - my PhD was only 1.5GB in total, including temporary files (build cruft and so on), with only 210MB needed for the subversion repository (176MB after bzip2) - the bzip2'd repository of my book (including all text and code examples) is only 4.6MB. My mail folder is only 3GB,
          • by jgrahn (181062)

            10GB per years is a lot more than I produce - my PhD was only 1.5GB in total, including temporary files (build cruft and so on), with only 210MB needed for the subversion repository (176MB after bzip2)

            210MB is a lot. That's as large as my CVS repository, which I have added to daily for ten years or so, and which contains lots of external data too (a copy of The Great Gatsby in troff format is in there somewhere).

            On the other hand, I don't use Word, which manages to make single-page documents that are more

        • Re: (Score:3, Informative)

          by guruevi (827432)

          1) This is the average. Your company might have 700MB/user, in my organization, it's close to 1TB/user/year that gets added. We're doing medical imaging.

          2) It's not just tape libraries. The cost for D2D2T or D2D2D (what we're doing) goes way up compared to a 'simple' backup scheme. Especially if you're like us and require mulitple gigabit streams, disk storage can't be just 4 cheap SATA disks in RAID5. We have 2 storage arrays with 14 drives each for general access and another storage array with 10 SATA dis

          • by COMON$ (806135) *
            I know it is average, what I was getting at was is this an average for all businesses or what?

            When I worked in enterprise environments, my cost went up for backups but cost per GB went down. In general that is the rule I have found, in larger environments my cost per MB goes down significantly not up.

            My point boils down to this, general stats like they have above are useless because we have environments like yours where you do medical imaging, and environments like mine where we do a mixture of marketing

        • by geekoid (135745)

          Jeez, when did you last work at a large company?
          We easily get close to 10 GB per person, and we are reasonably vigilant about it.
          Then you ahve the Total Cost of the back up. The drive(not as cheap as a home drive, but still cheap) the person receiving, the people to put a drive in, the process of managing the disk arrays, the NAS, the backing up, and insurance. Plus normal overhead.

          Legal e0discovery is time consuming becasue it needs humans involved. People may be trying to hide what they are doing in a ma

      • by mkcmkc (197982) on Thursday September 18, 2008 @11:28AM (#25055695)

        I did a back-of-the-envelope calculation on just this question in 2004, and estimated that file deletion was not productive unless we could do it at a rate of at least 17MB per minute (of labor). Four years later the threshold is probably at least 45MB per minute.

        Generally, this means that if we can blow away whole disks or huge directories of data, it may pay off. Users going through their files one by one is usually an absolute waste.

        • by ckaminski (82854)
          Currently filesystems track the following:

          Creation time
          Last Access Time
          Last Modified Time

          If we also had a

          Last backed up time/scanned time

          that virus scanners and backup software could use instead, then you can track last-access to eliminate files that haven't been opened by end-users in a particular time period for permanent offsiting or removal. Making today's complex HSM architectures easier to implement or not necessa
          • by whoever57 (658626)

            Currently filesystems track the following:

            Creation time
            Last Access Time

            Access time tracking is routinely turned off to improve performance of filesystems.

    • Re: (Score:3, Insightful)

      by zappepcs (820751)

      The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping or training staff to organize their data and retain only that which is necessary .

      There, fixed that for you. Meta-tags and other efforts might change this in the future, but until there is a generalized understanding of things that should be archived and things that should not, and a better way to store, find, retrieve, and utilize company data, there will be tons of data save

      • Re: (Score:3, Interesting)

        The problem is that it's easier to just archive the cruft stuff than it is to go through it all and figure out what's worth keeping or training staff to organize their data and retain only that which is necessary .

        There, fixed that for you.

        According to the original article, ("The average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up ") the cost of backups is fifty dollars a year per employee.

        So if that an average employee costs the company $100 per hour (including overhead), then if "training training staff to organize their data and retain only that which is necessary" takes more than half an hour per year, it's more cost effective to archive the junk than it is to train the employees to sort it.

        • by Eivind (15695)

          Even that is only true if data-storage costs are constant -- or employee-data grows parallell to cost-falling. Which seems unlikely.

          Storing something for a year costs half of storing it forever, more like it, because storage-costs drop like a lead balloon and data grow.

          If I were to delete EVERY file in my home-directory that is more than 3 years old -- I'd save 15% of the space used. If I where to delete every file more than 5 years old, I'd save 4% of the space used.

          Which frankly ain't worth it.

    • Re:Easier to keep (Score:5, Insightful)

      by daeg (828071) on Thursday September 18, 2008 @10:49AM (#25055045)

      The bigger problem is that you will fight different battles. If you're fighting a sales rep that sold your clients to a competitor, you want as much ammunition as possible. If a client is suing you for incorrect information relayed 8 years ago and you're probably guilty, you want as little information as possible.

      • Re: (Score:3, Insightful)

        by vvaduva (859950)
        Well, I did not RTFA in detail but it does not seem to address key regulations like HIPAA and SOX which put hard numbers on data retention. So whether or not it's expensive, you have to do it if you want to be legit. If the issue is discovery, a sound archival system will eliminate expenses related to discovery and would allow one to provide requested information very quickly and efficiently. I say let the legal people fight discovery requests and unless you have something to hide, stick with the require
    • My last job (Score:3, Interesting)

      by dj245 (732906)
      My last job had some files from the 1890's. The company had moved from New York to New Jersey to Houston in all that time. I can't imagine that material would ever need to be used, or would be called up during a legal investigation. Even if it were, would the authorities penalize a company for files that were that old??? At some point, everything is trashable or museum material.

      This company occasionally needed blueprints from the 1930s/1940s (great lakes ships), but none of their ships went back much
    • ...whilst policies and procedures often solve a lot of things in a cleaner, more common sense manner there are unfortunately far too many people lacking common sense.

      Throwing hardware at it guarantees it'll be done, expecting people to follow policies and prcoedures will likely leave you with a 50% success rate in ensuring the correct data is kept/binned and that's if you're lucky.

      The world as a whole would be so much more efficient if we could get people to follow policies and procedures or at least the co

    • Exactly. If it takes me two hours per week to sort through every bit of my data and decide what to pitch, that cost has to be compared to the archival cost to decide whether it is a worthwhile endeavor.

      Of course, at my office, we just bought a server and a controller with 16 SATA ports, filled the sucker up with off-the-shelf 500GB disks, and built a 7TB RAID6 using Linux software RAID. The whole job only cost about $2k, and we no longer waste any time deciding what to delete and what to keep.

    • I find it surprising that this issue is simplified to cost of storage. As others noted, who cares about the cost/employee for storage. What's much more important is the cost of information retrieval. I'd like to make a comparison with paper storage, because much research has been done there to cut costs. So putting aside physical storage costs, if you store all crap for a while, the storage just becomes a black hole where nothing can be found back. Storing crap is human nature, a "what if I need this docume
    • by Malevolyn (776946)
      Wouldn't it be cheaper to just company-wide subscription to all those porn sites than it would be to spend $1.25 million every year backing up all the saved pictures and videos from preview pages?
  • Huh? (Score:5, Insightful)

    by qoncept (599709) on Thursday September 18, 2008 @10:22AM (#25054575) Homepage
    $250k a year for a 5000 employee company? To put it in perspective, if the average employee at this company is making $60k a year, this company will be paying $1.5 billion in salaries over the same 5 years. To be fair, I think the estimated cost from the article is very much underestimated. But while corporate storage costs more than you'd think, and companies are definately storing a whole bunch of data they don't need, what about the costs of reviewing and purging that data? That is straight up time, whether it's reviewing existing data or spending the time to create guidelines for which data to keep. And time costs money. More than storage.
    • by TubeSteak (669689)

      what about the costs of reviewing and purging that data? That is straight up time, whether it's reviewing existing data or spending the time to create guidelines for which data to keep.

      Right now, the-way-things-are-done is to save it all and pay for it.
      You can train employees to change the-way-things-are-done.

      The learning curve is expensive, but the general idea (aspirational, as with anything corporate) is that once everyone figures out the policies, time is used more efficiently and the 'cost' goes down.

      And time costs money. More than storage.

      Can I see the report that verifies your assertions?
      You did have someone study the long term costs and give you hard numbers, didn't you?
      A company isn't going to fsck around their multi-m

    • To put this into perspective, we have PRA requests for all sorts of "data" that we are supposed to keep. It has become almost a full time job going through all the crap to find what the PRA requests are asking for.

      And we're a SMALL school district.

  • by arth1 (260657) on Thursday September 18, 2008 @10:27AM (#25054659) Homepage Journal

    10 GB of data per user, sure.
    10 GB of user data, no way.
    If assuming 300 work days per employee, that would mean that the average employee creates 1.2 kB of data per second.

    The only way this could be true is if you count data that isn't user generated, and they count the total data storage for the company and divide it by employees.
    If so, users deleting their e-mails won't have much of an effect.

    • by cashman73 (855518)
      Only 10 GB?!?! Pfft! Amateurs,...

      I've been in my current position almost a year now, and I've already generated about 1/2 a terabyte of data; and that's only the stuff I've decided is worth keeping (I've probably generated several terabytes in reality),... Of course, I'm probably not your average office worker -- my data is mostly monte carlo simulations of proteins, on the order of millions (some in the billions) of steps long. Some of the largest trajectories are 45 GB (yes, that's one file).

      • 10GB of original data is easy, and it doesn't take a year, just a week or two. Today and yesterday, I measured physical properties of a lot of output from a particular industrial process (just one plant in a factory, and I only recorded measurements of a few instruments). This only gave me a few hundred MB of raw data, but it will result in several GB of data after analysis. This is all original data, and this is a normal amount of output. I regularly fill several DVDs with this sort of archive data.

        Of cour

    • You're obviously not writing software, doing CAD work, or any kind of computational modeling. It's easy to have that much data -- my source tree alone is 2GB.
      • by afabbro (33948)

        You're obviously not writing software, doing CAD work, or any kind of computational modeling. It's easy to have that much data -- my source tree alone is 2GB.

        And what about our colleagues in the porn production industry? I mean, one hour of hi-res MPEG is a lot of megabytes. Multiply it by the number of, ah, employees...

    • Re: (Score:3, Funny)

      by value_added (719364)

      If assuming 300 work days per employee, that would mean that the average employee creates 1.2 kB of data per second.

      Top posting and absence of editing by Microsoft Outlook users engaged in a brief inter-departmental discussion could easily account for that volume.

      Is that what you meant by "isn't user generated"?

    • They count more than just the stuff you typed as "user data." For example, Linux admins download ISOs, lawyers download PDFs, Windows admins download patches, service packs, and malware cleaning tools, and sales people download porn. All this data is used by the users and must be archived.

  • by paulhar (652995) on Thursday September 18, 2008 @10:29AM (#25054703)

    Apps aren't really well designed for this in mind. They don't come at the problem from a "document lifecycle" perspective but instead a "document creation".

    This is generally because data has a variable lifespan. Lets take an email as part of a project as an example. As the author I may decide that the email isn't needed after a week so set an expiry of 1 week. But you, as the recipient, may take that email and turn that into several tasks so for you the email is much more important and thus want to keep it for much longer.

    Users aren't really going to be good at making these decisions unless some application continually bombards them with "go check the status of these 1000 documents you've got".

    • Re: (Score:3, Informative)

      by ubercam (1025540)

      Users aren't meant to be making those decisions, the Records Management department should be... that is if you even have one! If you leave everything up to the users, you WILL have a cluster fuck of records.

      I work in Records Management at a large company with many different divisions in diverse fields. RM is completely left up to us. We manage well over 10,000 boxes and there's only 3 of us. We alone determine when something is to be destroyed (but require authorization from dept heads to be shredded), how

    • Mod parent way up! (Score:4, Interesting)

      by khasim (1285) <brandioch.conner@gmail.com> on Thursday September 18, 2008 @12:13PM (#25056463)

      Congratulations. You're the first person I've seen who understands that.

      Accounting understands the need to close one year and open the next. They have processes for what is carried over and how it is identified.

      Yet no other department (or application) understands the need to close old data and archive it.

      • by radarsat1 (786772)

        Yet no other department (or application) understands the need to close old data and archive it.

        Is this significantly different from tagging a release in a version control system?

      • by inKubus (199753)

        Well, ERP solutions try to assign other units to "resources" (not just money) and store them in a subledger somewhere. And BPM systems are trying to do that with everything else.

      • by afidel (530433)
        Ha, we are working towards an archiving solution for our ERP system and accountants are just as bad as anyone. A simple date based approach will NOT work in an ERP system, you need a tool which understands the relationship between objects in the system and which only performs an archive if all related objects fall into the archive period. Plus books are relatively simple from an archive perspective, they have a legally defined life, most ad hoc data is not so neatly categorized.
    • There should be enough local cache for every user to have access to every document they could possibly create, unless you are working at a movie company. Given proper indexing, it should be possible for users to find what they need.

      Storage is cheap enough for this to work, even if some documents are slow (compressed, maybe combined as deltas with other very similar documents) or very slow (have to pull from tape or something). But again, all of that which an average user needs should be cacheable on their o

  • For example, Financial institutions are required to keep data for longer period for legal purpose as well as traceability (during investigation of fraud or other kind of crimes). The banks worked for had legal requirement of keeping data at 2 places at least 15 km apart, with all kind of protection against fire and intrusion.

    A good manufacturing company would keep data for longer period ot only to comply with ISO standards, but to trace manufacturing defects and a good evidence of past history for insuran

    • Re: (Score:3, Insightful)

      by PainKilleR-CE (597083)

      Additionally, there are many businesses that don't understand their data retention requirements beyond 'we need to keep some data for 10 years', so instead of compartmentalizing their data and saying 'keep this for 10 years, that for 5 years, and purge this every year and that every 3 months', they just keep everything. Further, if they have a data retention requirement for 3 years or 10 years, they might wait longer before purging it just because it's easier to keep it then it is to go find and remove the

  • It just doesn't make sense to expend the limited political capital of the IT department to nag people into cleaning up their folders. If you're in a small company, and can more than double your server storage for $1000, instead of pissing off 25 people, you'll spend the money, and so will the CFO. I should know, we've done it more than once over the past 10 years.

    It's far better to spend a few $K than to waste literally weeks of time trying to sort things out, especially when you need sales to be selling a

    • by cowscows (103644)

      Exactly. I've worked at my current company for about three years. It'd take me a few days at least to go through all the documents that I've created since I've been here. The cost of storing all those documents is significantly less than the billable hours that my company would have to give up for me to spend those days sorting paper. Not to mention the fact that I can't imagine have the luxury of a few days without having to worry about projects/clients/etc and have the time to focus on sorting through sta

    • by geekoid (135745)

      Well, if you ahve done it more then once at some tiny shit hole company, I guess that's the way to do it...

  • Email Attachments (Score:5, Insightful)

    by whisper_jeff (680366) on Thursday September 18, 2008 @10:40AM (#25054903)
    I don't know what most major companies' policies are regarding backing up emails (just back up the text or back up emails plus attachments) but, as but one example, I'm sure this would be an easy spot for most companies to dramatically reduce the amount of storage space required. Most business communications I see from corporate personnel have various attachments on every email - things like logos, custom backgrounds, etc. Forget getting rid of all the unnecessary attachments - getting rid of the "look at my pretty email that looks like a page from a spiral-bound notebook with my company logo at the bottom" images, and the hundreds and thousands of duplicates of those images, would reduce storage requirements, bandwidth requirements, and probably make corporate communications look more, you know, professional. So many emails are filled with unnecessary garbage and, if that's being backed up, that garbage can get costly.

    Then again, I'm biased - I believe email should just be pure text. Perhaps that's a sign that I'm now old...
    • by xgr3gx (1068984)
      Hmm, maybe I should stop doing my weekly email of the "Monkey Drinking his own pee" video to 200 hundred people in my department.
      I guess that might explain all the SAN storage requests for our email archive servers.
    • by Vancorps (746090)
      For those of us with NetApp SAN storage we use A-SIS to dedup all of those files so we only store them once even if they are referenced in one hundred locations. This dramatically reduces storage requirements at the cost of cpu cycles at night while it scans all the new files and determines if any are duplicates.
  • average employee generates 10GB of data per year at a cost of $5 per gigabyte to back it up...

    I cry nonsense in the statement above.

    I put a 25 cent blank DVD into the DVDwriter of my PC. Then I copy the entire contents of my 'C:\backup' folder onto this DVD. I start the program, and go do something else. Total dedicated time: 2 minutes

    When the DVD write is done, I write a label code on the DVD (date, employee, backup number) and put the disk back on the stack in the file cabinet. Total dedicated time: 2

    • Large corporations back up servers on tape. Good tapes and tape drives are expensive. Including support, maintenance and replacement costs, $5 per gigabyte probably isn't that bad.
    • Re: (Score:3, Insightful)

      by Chris Mattern (191822)

      Unfortunately, writable DVDs are not an acceptable archive medium, and a stack of disks with written labels is not an indexing solution that will scale beyond one person.

    • by Vellmont (569020)


      My salary and benefits: @ $18/hr time used on backup: 0.067 hrs My cost per gigabyte of backup: $1

      And you backed it up a total of once. The cost of $5 is likely a yearly cost (as the volume is yearly), Backups are usually done 1/day. Your yearly costs would be in the hundreds of dollars per gigabyte.

      • My salary and benefits: @ $18/hr time used on backup: 0.067 hrs My cost per gigabyte of backup: $1

        You haven't counted overhead. First, there is your personal overhead. Do you talk to your co-workers in the hall? Get coffee on company time? Go to the bathroom? Fill out time sheets to account for what you do all day? Read memos telling you that you have to fill out time sheets? Read your e-mail? Post comments to slashdot at 10:47AM on a workday? Only robots are 100% efficient in their use of time.

        And then there is company overhead-- your computer, pens, paper, copy machine, office, lighting, se

    • by geekoid (135745)

      That is a completly ignorant example of needing to back up 1000's or people, and billions of transactions.

  • Used to be records were kept on paper,
    paper was kept in boxes,
    and boxes were dated MM/YY.

    I came into the office one fine 1998 January 02,
    and the hallway was stacked full of boxes dated 01/94,
    02/94, 03/94, etc.

    Company policy was discard records after three years,
    so all records from 1994 were on their way to the dumpster.

    • by cashman73 (855518)
      Used to be records were kept on paper, paper was kept in boxes, and boxes were dated MM/YY.

      So THAT explains why they kept moving Milton's desk (image [dereksemmler.com])! I guess all those TPS reports take up space!

  • The major cost of purging is the manpower and downtime. Therefore it's easier to keep the stuff, possibly with occasional housekeeping if your schema isn't as scalable as it should be. While the legal and tax requirements (which vary from country to country) have a limited lifetime, there are always possibilities, such as legal defences, where old data may be needed. These uses will not require the performance (and cost) of enterprise class storage: speed, redundancy, administration, warranties.So migrate i
  • Communicate less (Score:3, Interesting)

    by Yvanhoe (564877) on Thursday September 18, 2008 @10:51AM (#25055083) Journal
    In a world where backup takes money, a law that says to companies "keep every communication backuped" is saying essentially the same thing as "communicate less".
    • by BobMcD (601576)

      Or communicate less in writing - I personally have had this policy for a long time. If I worry that a question, comment, concern, etc might not reflect well on me in the future, I walk into my boss's office and ask out loud (with the door shut.) If I want the communication to be recorded for all eternity I use email...

  • by circletimessquare (444983) <circletimessquare AT gmail DOT com> on Thursday September 18, 2008 @10:55AM (#25055143) Homepage Journal

    put everything on one disk drive, unRAIDed. when it fails, problem solved. voila, built in obsolescence

  • Business Intelegence Software just may make use of the software. Wile a lot of buisness are STUPID in their use of BI Software. There may be some point either the company dies or will get a clue and do some BI analysis on its data.
    You actually can do some amaizing things with BI. Say for example You are storing Time Card Data from employees. And you want to check the effectivnes of managers. So with say 20 years of time card data and employee records of which manager is which. You just may find a coraltion

  • litigation hold (Score:2, Informative)

    Any record destruction policy must include a "litigation hold". A litigation hold means that record destruction must stop when litigation is anticipated or pending. But in a complex enterprise, it is tricky to know what litigation the enterprise anticipates. It was the trickiness of litigation hold that led to the demise of Arthur Andersen. The risks associated with litigation hold give enterprises incentive to store lots more records. --Ben http://hack-igations.blogspot.com/2008/07/document-discovery- [blogspot.com]
  • Look at how people deal with email. I've got coworkers that have every single email (including mailing lists they've subscribed to) they've ever sent or received since they started (~8yrs ago). They're probably got 20GB of email on their laptop. Now we only allow 100MB of server based email storage, so that helps on the server side, but we're still backing up this guys laptop.

    On the datacenter side, we had a database corruption about 10years ago so we implemented snapshots, and then snapshots of those sn

  • IANAL. This is why most companies spend some money developing a retention policy and planning its implementation. It requires a bit of time from every employee to decide if a piece of information is something that requires short term, long term or permanent storage but if you get people into the habit of sorting things like email into folders that reflect the company retention policies (which need to be pretty clear and well planned both from an IT and a legal perspective) then you can reduce the cruft you

  • What about throwing company policies at a technology problems?

    Hypothetically (never happens in the real world of course), what if there was a document management server, samba dropbox, where all documentation for deliverables are kept in portable excel 2003 format? What if content identification is done my creating folders with "project" and "project"_old naming conventions, hyperlinking is done in excel (because html is complicated), and ad nauseum for the automated process called "company policy"?

  • by Doc Ruby (173196) on Thursday September 18, 2008 @02:36PM (#25059071) Homepage Journal

    Let's say your corp is more than 50% likely to go through "e-discovery" once every 10 years. Each worker will generate 10GB * 10 years = 100GB, backing up all the increasing data pile is (pairing the balancing ends of the accumulation for half the accumulation years) 101GB * 5 = 505GB, at $5:GB is $2525, plus about $2M:TB / 505GB = $1.01M, for a total of $1,012,525 per worker, times at least 0.50 probability is at least $506,262 average predictable cost per employee.

    One approach is to keep much less data. But when you keep less data, you have to guess right every time what data you'll need later. If your process discards data that's valuable later (but lost) it better be worth less than the amount you save. That's too hard to know, which is one reason companies keep all the data, and figure it out later.

    A better approach is just to cut that $1-3M:TB e-discovery cost. Of course, the best way is to avoid being investigated, but one has less than 100% control over that, especially from inside the IT department. A much better way to do it is to better inventory the data stored as you go along accumulating it, in the terms in which a later e-discovery would search it. Which also can have the benefit of making the info in the data more available in the normal course of business, which can make that data's increased value (and lowered costs of searching it) worth the entire process. The cheaper possible e-discovery would be just a bonus.

    What really gets me is how these economics are the true cost of storage. A 1TB drive costs $120, and maybe a better 1TB in a 100% redundant RAID costs $250. But it really costs something like $300,000 over its lifetime (probably replaced every 3 or so years, across the 10 years I analyzed). If IT spent a few hundred hours a year streamlining the navigation of all that data, at a cost of a few dozens of thousands of dollars, divided across all those employees, the entire org's IT operations would be much more economical, when the large cumulative risk of e-discovery costs are factored into the true cost.

For every bloke who makes his mark, there's half a dozen waiting to rub it out. -- Andy Capp

Working...