Forgot your password?
typodupeerror
Businesses Data Storage Hardware

Dell Says 90% of Recorded Business Data Is Never Read 224

Posted by timothy
from the sounds-lowball-no-matter-the-methodology dept.
Barence writes "According to a Dell briefing given to PC Pro, 90% of company data is written once and never read again. If Dell's observation about dead weight is right, then it could easily turn out that splitting your data between live and old, fast and slow, work-in-progress versus archive, will become the dominant way to price and specify your servers and network architectures in the future. 'The only remaining question will then be: why on earth did we squander so much money by not thinking this way until now?'" As the writer points out, the "90 percent" figure is ambiguous, to put it lightly.
This discussion has been archived. No new comments can be posted.

Dell Says 90% of Recorded Business Data Is Never Read

Comments Filter:
  • by drinkypoo (153816) <martin.espinoza@gmail.com> on Saturday July 10, 2010 @08:12AM (#32859388) Homepage Journal

    People always bitch that they have to pay for Microsoft (or whatver) Office's features because they only use 5% of its functionality. But you buy all those features at once because you don't know which you will need in the future. Data warehousing is the same way. If you start taking data offline you'll just need that data. That's why analyses of very large data sets are performed before archiving.

    But what is really wanted is a way to cluster the database servers, with old data automatically cycled to the slowest, most remote nodes, and with the most frequently-altered data heavily replicated and aggressively synchronized.

  • Re:which 90% (Score:5, Informative)

    by Koby77 (992785) on Saturday July 10, 2010 @09:00AM (#32859600)
    I worked in a call center, and I can definitely believe that 90% of the data is never read again. However, when a customer is calling back (and is angry!), you don't have time on a live call to wait to see what's up with the account. Also there can be some litigious aspects, and a lot of information was recorded for C.Y.A. purposes. Again, you never know which part is needed for C.Y.A. purposes, but that 10% sure is valuable.

    So yeah, we needed to store ALL the account information, and we needed fast access to ALL of it ALL the time.
  • Re:which 90% (Score:2, Informative)

    by bwintx (813768) on Saturday July 10, 2010 @09:41AM (#32859770)

    Like the Coca Cola exec who remarked that he was pretty sure half of his advertising budget was wasted, he just wasn't sure which half.

    FWIW, and pointing this out only because I've seen this quote referenced so many times over the years...

    John Wanamaker, a 19th century entrepreneur, Lord Leverhulme, founder of consumer goods giant Unilever, and Franklin Winfield Woolworth, the founder of Woolworth's, have all been credited with the quote: "I know that half of my advertising is wasted. I just don't know which half."

    -- Citation [businessop...dideas.com]
    -- Google search [google.com]

  • by mbone (558574) on Saturday July 10, 2010 @10:48AM (#32860108)

    Tapes are not archival storage either. In either case, archival storage is a system, not a medium.

    I hope you are reading all of those tapes on a 5 year cycle, and writing new ones with the recovered data. I also hope you are making sure that the humidity and temperature are strictly controlled at all times in the tape storage room.

  • by vrmlguy (120854) <`samwyse' `at' `gmail.com'> on Saturday July 10, 2010 @12:26PM (#32860588) Homepage Journal

    I also hope you are making sure that the humidity and temperature are strictly controlled at all times in the tape storage room.

    That's why the OP said to use Iron Mountain [wikipedia.org]. They maintain the humidity and temperature at all times in their storage rooms.

    It costs a little extra, but if you want long term storage, rent some underground space. According to http://mic.imtc.gatech.edu/preservationists_portal/presv_costcompare.htm [gatech.edu], underground storage costs can get as low as $2/year per cubic foot (not including relocation, initial filing charges, retrieval & re-file charges) if you're buying four delivery trucks worth of space.

  • by mlts (1038732) * on Saturday July 10, 2010 @01:41PM (#32860968)

    5 year cycles are close enough. In business, with laws like Sarbanes Oxley, FERPA, HIPAA, PCI-DSS, and many others, if a business puts it on tape (where the maker says the archival life is in decades), drops it off at Iron Mountain, and has a documentable chain of custody system, should an audit happen and some tapes are not readable, they are off the hook. Management can look at the auditor and say that any missing data was stored in multiple places, and if anything is lost due to tape failures/bit rot over time, shit happens. The audit ends with the company passing, and life goes on. Fifty year audits are different (anything aerospace related needs a 50 year audit trail), but tape drives are more than enough to deal with the 7 years that most regulations require.

    Things are different if the data is worth keeping, versus sticking it on a tape to languish in a bucket offsite until the 7 years are up. For data worth keeping, it needs to be stored multiple places, and checked for issues every so often. Most businesses have multiple SANs, one at the main data center, one offsite and both are synced to deal with this. It is expensive, but it ensures that data doesn't "rot".

  • Re:Which 90% ? (Score:2, Informative)

    by BrokenHalo (565198) on Saturday July 10, 2010 @01:58PM (#32861058)
    We used to do something similar in a very manual process by keeping the most frequently access oracle data on the leading edge of the disk platters.

    I haven't really kept up to date with HDD technology in recent years, but there was a time when some operating systems (Data General's AOS/VS, for example) allowed you to keep your most frequently accessed files (or even records in a database) around the middle of the disk platter, on the principle that the heads spent more time on average around the middle than at the extremities. Bear in mind that this was in the days when such a drive would typically hold 700MB of data, and of course that this principle has no value if you partition that drive.

    Having said that, I remember testing this at the time when I was sysmgr at a large DG site, and didn't find any conclusive evidence as to the value of this concept, so ended up ditching it as more trouble than it was worth.
  • Re:Which 90% ? (Score:4, Informative)

    by alexhs (877055) on Saturday July 10, 2010 @02:16PM (#32861142) Homepage Journal

    For any given sample, 1/10th of them will be necessary.

    I'm sorry but you're wrong. That's not how stats are working.

    Let's play heads or tails.
    Each toss has a 50% chance of being heads.
    According to you, for any number of tosses, 50% of them will be heads. In other words, you're saying that there is a 100% chance that half of them will be heads.

    For a sample of two tosses, that would mean a 100% probability of one head(s) and one tail(s).
    I hope that you see how this is wrong. You would actually have 50% probability of one head and one tail, 25% probability of two heads, 25% probability of two tails.

    For a sample of size n, 10% probability for a piece of data to be necessary, the correct formula says that the probability for at least one element of the sample to be necessary is 1-(0.9^n), which quickly approches 1 (100%) as n increases.

    Now, a MUCH more useful set of data is probability over time. 1/10 within 10 years? 5 years? 1 week?

    It depends of what you mean by probability over time. What I can tell you is that as more time elpases, the probability of an element to be necessary (more correctly, to having been necessary) increases. The 90% never read is supposedly for an infinity of time (that's what "never" means, right ?).

  • Re:Which 90% ? (Score:3, Informative)

    by afidel (530433) on Saturday July 10, 2010 @05:26PM (#32862036)
    Look for auto tiering, most of the newer products from EMC now support it. The technology is OS agnostic because it is done at the block level. Compellant and Isilon are two other vendors I'm familiar with that do auto-tiering.
  • by afidel (530433) on Saturday July 10, 2010 @07:27PM (#32862942)
    Oracle's way ahead of you, they've had programatically partitioned tables for quite some time. Queries don't need to altered, if they call for data outside of the active tables range then the archive table(s) are automatically used.

The only thing cheaper than hardware is talk.

Working...