Dell Says 90% of Recorded Business Data Is Never Read 224
Barence writes "According to a Dell briefing given to PC Pro, 90% of company data is written once and never read again. If Dell's observation about dead weight is right, then it could easily turn out that splitting your data between live and old, fast and slow, work-in-progress versus archive, will become the dominant way to price and specify your servers and network architectures in the future. 'The only remaining question will then be: why on earth did we squander so much money by not thinking this way until now?'" As the writer points out, the "90 percent" figure is ambiguous, to put it lightly.
This is new? (Score:5, Interesting)
Good argument for tape? (Score:4, Interesting)
This is one reason I like tape: The drives are expensive, but the tapes are $30-$50 (LTO-4 is $30 on mail-order). So having an autochanger moving all the rarely used data into storage is likely the most efficient way of moving data to long term archiving. Even better is making sure that 2-3 sets of tapes are used (one onsite, one offsite.)
Of course, hard disks by themselves may seem cheaper, but they are not a true archival medium. There are so many moving parts in a HDD and each of them (bearings, heads, spindles, motors, controller card) are a point of failure.
With HDD capacities starting to not grow as exponentially as they did last decade, it would be nice if tape companies would not just catch up with 2-3TB native tape offerings, but be able to offer drives at a lower price so home and SOHO users can use them for long term storage. I'm sure that if someone offered a consumer level tape drive for $500 with a decent capacity, that a lot of small businesses would buy it, especially if it came with decent backup software (Retrospect, Backup Exec, Amanda, bru, or another utility that is similar.) Since some tape drives are even bootable (some HP offerings have a section of the tape to emulate a boot CD or DVD), it would be ideal for bare metal recoveries even by nontechnical users. Pop in the tape, boot the machine, type in the encryption key, select where the data should be restored to, walk off for a bit and it is done.
Even though the SAN companies have said tape is going to die, until another form of media (perhaps super-inexpensive flash media [1]) is as reliable as tapes and can be put in the Iron Mountain case and sent offsite for safekeeping for decades on end, tape will be with us. Only optical comes close to tape for long term archiving abilities.
[1]: I can see someone make flash media that is semi-smart where it is put in a specific case, shipped to an offsite warehouse, and that warehouse plugs in the cases into 5-12VDC. Then over time, the circuitry on the flash drives periodically checks the stored flash media for damage or bit rot, corrects errors by rewriting blocks, and good blocks it would periodically move to ensure that there is a high signal to noise level on all media. Of course, this requires power, while tapes can happily sit in a climate controlled warehouse and be still recoverable.
Re:Which 90% ? (Score:2, Interesting)
I work for a large resource company and we collect loads of data... some of which is valuable today and some of which is valuable tomorrow... interestingly what is of value tomorrow is dependent on the maturity for data consumption is today.......
so we collect the data not because it's of value today, but because we might analyse it tomorrow in a new way.
In other news... (Score:3, Interesting)
...at least 70% of the crap you store in your house isn't really needed, either. Do you really ever LOOK at the pictures hanging on the walls? Are you sure you're going to read every book you own, again?
Re:Which 90% ? (Score:4, Interesting)
If each piece of data has 90% probability of not beaing read again...
You discard only 10 pieces out of 100, or out of 1 billion, whatever...
The probability that none of these 10 pieces of data would have ever been needed again is 0.9^10 = 0.348 = 34.8%
Which means that you keep all of your data.
Caveats :
Solutions: (Score:5, Interesting)
a) Forbid *unmanaged* of documents. If the question: "where is the most up-to-date version of this document stored?" is systematically and easily answered then people can delete the crap from their laptops.
b) Forbid in-company attachments to mails. If the last version can be easily found, including the revision history, a link to this revision is worth *more* than the current state of the document. Most space in my inbox are totally useless attached documents.
c) Forbid the use of formats unsuitable for storing a certain kind of information. (Where i work, they use powerpoint/word files for electronics forms)
d) Provide a good archiving and backup service. Besides the quality improvement by using a service, also the 100th copy done in some unsystematic way of some data is prevented (forbid this explicitely)
e) Thin clients. store the data on a server. Deduplicate.
f) i would expect that most of the documents in a company can (and should) be stored in a database.
Why is this reported as news? (Score:1, Interesting)
Folks, hierarchical storage has been discussed in one form or another since the 70s (probably much earlier, but I'm not that old). Everybody and their mothers already have some implementation of archival media.
As for 90% of the data never being read, I beg to differ. Data is sliced and summarized many times in its lifetime (and sometimes those summaries need to be refreshed to include new dimensions or details), even if there's nobody really looking closely at the finest grain. But if you throw away the oft-unused detail, how can you re-summarize?
And one warning to all (mainly Dell): try to tell the judge that you deleted that important evidence of your wrong doing because it was "dead weight" and let me know how that goes.
Having said all that, vendors are apparently just recently becoming aware that there's a need for automated deprecation, for moving unused data to slower/cheaper storage and fetch it back efficiently when needed. From memory to local disk to network storage to slower/cheaper network storage to tape.
Re:Cost of storage (Score:3, Interesting)
Not true; a lot of data is harvested automatically these days. And if you're getting the data by having the customer fill something out, then you're not paying for the typing.
Re:Which 90% ? (Score:3, Interesting)
The problem is that so much data is made available without anyone ever considering how useful it might be. At least we've come some way in the last 20 years:
Back in the '70s and '80s I worked at many sites where mainframe ops used to clear tonnes of fanfold paper every day. This is why we had separate printer rooms: a bank of 6 or 8 barrel-printers belting out 132 columns of text at 1800 lines/minute created sacksful of dust.
Most of that rubbish was never read in any depth - it was physically impossible to do so before it became out of date, so most of that paper went straight to the shredders, which often shared space with the printers that created the stuff in the first place. I used to have fantasies about lining up the shredders directly behind the printers to save everybody the trouble of distributing the printouts.