Forgot your password?
typodupeerror
Businesses Data Storage Hardware

Dell Says 90% of Recorded Business Data Is Never Read 224

Posted by timothy
from the sounds-lowball-no-matter-the-methodology dept.
Barence writes "According to a Dell briefing given to PC Pro, 90% of company data is written once and never read again. If Dell's observation about dead weight is right, then it could easily turn out that splitting your data between live and old, fast and slow, work-in-progress versus archive, will become the dominant way to price and specify your servers and network architectures in the future. 'The only remaining question will then be: why on earth did we squander so much money by not thinking this way until now?'" As the writer points out, the "90 percent" figure is ambiguous, to put it lightly.
This discussion has been archived. No new comments can be posted.

Dell Says 90% of Recorded Business Data Is Never Read

Comments Filter:
  • Which 90% ? (Score:5, Insightful)

    by mbone (558574) on Saturday July 10, 2010 @08:10AM (#32859380)

    I could believe the 90% number. There is plenty of data sitting around in case it is needed. Some of it will be needed. Much of won't be. How do you predict which is which ?

  • From the article:

    Opportunity too good to pass up

    It was just about then that one of my favourite bargain-hunting websites turned up a device called the CORAID EtherDrive. Take a look at the product range at CORAID, but don’t spend too long on it.

    That's the same device from a story I submitted yesterday [slashdot.org]. I hope they don't plan on getting a Z-Series running ZFS.

  • which 90% (Score:3, Insightful)

    by marmusa (557884) on Saturday July 10, 2010 @08:11AM (#32859386)
    Which 90% though? Like the Coca Cola exec who remarked that he was pretty sure half of his advertising budget was wasted, he just wasn't sure which half.
  • Re:Which 90% ? (Score:5, Insightful)

    by eldavojohn (898314) * <eldavojohn@gmFREEBSDail.com minus bsd> on Saturday July 10, 2010 @08:15AM (#32859404) Journal

    I could believe the 90% number. There is plenty of data sitting around in case it is needed. Some of it will be needed. Much of won't be. How do you predict which is which ?

    Yeah, as someone who has implemented a few auditing solutions where I work, I must confess that it seems to be 99% of the data we archive is never looked at again. A lot of it is due to policies and is only used after something goes dreadfully wrong. If they are well thought out, the metrics can be collected as the data is written instead of needing to search across the data.

    I think their "90% dead-weight rule" is really a misnomer as you could probably claim that 90% of Google's indexing is never read but we all know that it's the potential that data holds that makes it so valuable and necessary. If Google knew every future possible search then they could delete the data they will never use ... but how do they know they will never use it? How do I know that the auditing data will never have a use--by new metric or incident investigation? The truth is simply that you don't.

  • by shoppa (464619) on Saturday July 10, 2010 @08:16AM (#32859406)

    Interesting that this seems to have been written up as a "hardware" or "storage" topic.

    The problem is, that IT people dream up all these "write only" applications that record data, without any rational plan for what the data might actually be used for in the business.

    For example, some people worry about privacy when they go to the grocery store and know that all their purchases are being tracked by their loyalty card, or worry that the big bad US government is tapping all the E-mail.

    In fact, I'm 100% sure that some IT geek had some wet dream years ago about recording everybody's purchases and E-mail and phone call and it's being done every which way.;

    The true "IT application" issue is that there is no real business need for this data 99.999% of the time. It gets recorded, probably gets staged off to tape, maybe indexed in some giant table, and then ... sits there for years with no actual need for it.

    I'm sure the IT geeks who dreamed up the technical ability to record all this stuff, thought they were hot shit when they came up with it. Oh, man, those IT architects were just having a big go-round whipping this problem in scalability. In their heads, they were gonna record everything on disk, then go home and fuck the prom queen.

  • by sirwired (27582) on Saturday July 10, 2010 @08:17AM (#32859412)

    Automated Hierarchical Storage Management has literally been around for decades. It may be new-ish on low-end crap x86 servers, but for say, mainframe users, it isn't new at all.

    What is new is available implementation choices. When your tier choices are between enterprise disk and enterprise tape, you are biased towards keeping data on disk; there's still use cases for HSM with only high-end disk and tape, but they aren't as great. Now with lower-cost disk available, you have a cheap disk choice too, with fairly reasonable access time.

    SirWired

  • by 1u3hr (530656) on Saturday July 10, 2010 @08:32AM (#32859480)
    People always bitch that they have to pay for Microsoft (or whatver) Office's features because they only use 5% of its functionality. But you buy all those features at once because you don't know which you will need in the future.

    Bullshit. True only if you've never used a wordprocessor in your life before. If you have, you know what you use. And you can read the description of other features to decide if you want them.

    And this is a pointless analogy because if in the future you decide you do need the 3D porn embedding, you can upgrade to get it. If you don't backup some of your data, you can never change your mind if you find you need it 10 years later.

  • by mikael_j (106439) on Saturday July 10, 2010 @08:33AM (#32859486)

    The problem is, that IT people dream up all these "write only" applications that record data, without any rational plan for what the data might actually be used for in the business.

    These plans mostly come into being because us "IT people" (read: developers) know that the "business people" love changing the specs and they'll blame us if they want to start using data they didn't ask us to save and we tell them we can't save data retroactively (really, they'll basically blame the developers for not being able to time-travel). This is why we'd rather save everything than not save enough.

  • by DerekLyons (302214) <fairwater&gmail,com> on Saturday July 10, 2010 @08:42AM (#32859536) Homepage

    The problem is, that IT people dream up all these "write only" applications that record data, without any rational plan for what the data might actually be used for in the business.

    Seems to me that the IT folks shouldn't be making these decisions (what data to capture and store) any more than they should be deciding what to stock for the Memorial Day sale.

  • by icebraining (1313345) on Saturday July 10, 2010 @08:53AM (#32859578) Homepage

    No, I think Office features are different; everyone only uses 5%, but each person uses a different 5%.

  • Over 92% of fire extinguishers will never be used, we could probably save a bit of space by having the unneeded ones stored off-site, or in less accessible corners of the garage.

    Slightly more seriously, we can certainly answer this question posed by the linked article easily: "why on earth did we squander so much money by not thinking this way until now?" The answer is: because you are a moron. Anyone who has given even a moment's thought to storage has known this, either implicitly or explicitly, for a long time. So whoever's included in your "we," Steve Cassidy, is just profoundly stupid. I think that quite easily explains why you all squandered so much money by not thinking about this. Next question?

  • by Anonymous Coward on Saturday July 10, 2010 @09:04AM (#32859616)

    >

    But what is really wanted is a way to cluster the database servers, with old data automatically cycled to the slowest, most remote nodes, and with the most frequently-altered data heavily replicated and aggressively synchronized.

    George Santayana: "Progress, far from consisting in change, depends on retentiveness. When change is absolute there remains no being to improve and no direction is set for possible improvement: and when experience is not retained, as among savages, infancy is perpetual. Those who cannot remember the past are condemned to repeat it."

    The concept and implementations of hierarchical storage are http://en.wikipedia.org/wiki/Hierarchical_storage_management [wikipedia.org] several decades old in the mainframe world. Why did "we squander so much money by not thinking this way until now"? Because "we" are savages/infants who refuse to retain experience.

  • by drinkypoo (153816) <martin.espinoza@gmail.com> on Saturday July 10, 2010 @09:11AM (#32859628) Homepage Journal

    Bullshit. True only if you've never used a wordprocessor in your life before. If you have, you know what you use. And you can read the description of other features to decide if you want them.

    It doesn't make it unreasonable to purchase a lighter word processor with less features, but I for one would not want to support a word processor where you buy access to toolbar buttons. And if I'm doing database reporting (for which I have been paid in the past) I would not want to have to request that pieces of data be reloaded into the database so I can perform analyses. And further, if I have to do a year-by-year analysis, I do not want to have to load and unload data sets, crunching one year at a time. I want to build one report that goes forth and executes subreports to produce year-by-year reports without me having to sit at my desk and watch Crystal Reports grinding.

  • Re:Which 90% ? (Score:3, Insightful)

    by sco08y (615665) on Saturday July 10, 2010 @09:12AM (#32859632)

    I think their "90% dead-weight rule" is really a misnomer as you could probably claim that 90% of Google's indexing is never read but we all know that it's the potential that data holds that makes it so valuable and necessary.

    Another problem is figuring out _why_ data isn't used before archiving it. Is it not useful, or are the tools not in place to use it?

    If companies decide that the x% least used data will be shoved away in the attic, then "x% of data isn't useful" becomes a self-fulfilling prophecy.

  • So what? (Score:5, Insightful)

    by davidbrit2 (775091) on Saturday July 10, 2010 @09:26AM (#32859696) Homepage
    And if you didn't have that 10% that is eventually needed, you'd be totally screwed. Do we really need to play the 20/20 hindsight game every time somebody thinks of something like this?
  • Exactly. (Score:4, Insightful)

    by brusk (135896) on Saturday July 10, 2010 @09:54AM (#32859824)
    I wasted money on a dictionary that has tens of thousands of words but have only ever looked up a few hundred. I should have bought one that just had the words I would actually need.
  • by mbone (558574) on Saturday July 10, 2010 @10:50AM (#32860128)

    Well over 99% of all lifeboats are never used.

  • Re:Which 90% ? (Score:3, Insightful)

    by jeffmeden (135043) on Saturday July 10, 2010 @11:58AM (#32860420) Homepage Journal

    Bingo. The first thing I thought of is "sure 90% goes to waste but you don't know *which* 90% until after the fact"...

    Is Dell working on a patent to send information back from the future about what stored data is never used again? I just hope they don't stumble on the Slashdot comment archives, the future-tubes would be clogged indefinitely.

  • by Dcnjoe60 (682885) on Saturday July 10, 2010 @12:07PM (#32860478)

    Rate of access does not equal importance of data. How important are, say, dental records or DNA? To the majority of people, probably not too important. However, in law enforcement, they could be very important. The US military has DNA records on all of its members. However, unless you are dead and they are trying to identify your body, 99% of it is just stored and never used.

    Medical records are stored and unlikely to be used on a regular basis, however, someone coming into the emergency room at the local hospital with chest pains, access to those records in a quick and timely manner may be important.

    What the author seems to be proposing, however, is that records be stored on the basis of how often they will be needed (needed frequently - high speed storage, once in a blue moon, slow or offline storage). In reality, data should be stored on the cost associated with it not being available when needed.

    Using the medical example, it seems that patient data would have a high cost of not being available when needed (death). Payroll information, however, which is needed somewhat frequently, has a lower cost if not available (employee having to wait for the information). As such, the metric should not be on how often the data is accessed, but instead on how vital quick access is.

  • The problem is-- (Score:3, Insightful)

    by Chris Mattern (191822) on Saturday July 10, 2010 @12:08PM (#32860484)

    If you can't figure out which 10% you'll need later, you can't use this fact to cut down on your data storage.

  • Re:Which 90% ? (Score:2, Insightful)

    by Anonymous Coward on Saturday July 10, 2010 @12:23PM (#32860568)

    the metrics can be collected as the data is written instead of needing to search across the data.

    Yet if you are only going to ever look at it once then why bother optimizing to that case? I have also seen cases where doing this you loose some other piece of information. Like my example bellow you maybe right now only care about total time at a drop off. But maybe at some future point you care more about the time it started and ended? So careful what you prune.

    Having implemented a few systems myself one of the first questions I ask is 'how do you want to archive the data?'. Most people get a deer in the headlight look. Large databases affect performance in the long run. So prune your data. In many cases this is worth doing. You made bad decisions in the past. The data is gone after the regulatory period. No records of what happened. In many cases pruning data is a good decision. As there is no data to support you doing something wrong 20 years ago, even though you have fixed the issue now. Is it morally right? No. Good business sense, sometimes.

    Many companies are data hoarders. They gloom onto data and never let it go. They *might* make a report someday. But their culture will never let it happen as they really do not care to improve. They merely want to give the impression that they do. Hence the hoarding of data. Unless a cultural shift of actually wanting to improve the way things are that data is useless. In fact I would say a waste of resources.

    I have seen business's that truly use these data warehouses to great effect. Then I have seen *MANY* others that collect the data but then dont really do anything with it. Its just a report they can hand to their manager to show 'they are doing something'. Sure you can measure things. But are you going to do anything with it?

    It is also about knowing what to ask. Like one I saw 'my drivers are always way late to their last drop off'. Yet the right question was 'why is my driver not able to get out of the yard in the morning'. The drivers were making up most of time at the drop offs during the day. But eventually got way behind by the end of the day. The root cause was 200 people were all starting their shift at the same time and there were not enough dock spaces for trailers. So many drivers stood around waiting to be loaded up. But it took someone in the yard looking at the reports and said 'what if we shifted half the guys work shift by a half hour'. It worked. My point? You can sample the hell out of things. Have petabytes of data. Yet if you do not have people willing and able to ask the right questions to look at the data it is useless. Many companies are not willing to do this. As many people see their jobs as 'essential' and do not want to jeopardize that essentiality in any way.

  • Re:Which 90% ? (Score:5, Insightful)

    by Mspangler (770054) on Saturday July 10, 2010 @12:43PM (#32860688)

    Note that I'm working from a process control perspective in a chemical plant, but 90% of data written is never read again sounds about right for when things are going well. It's when something goes wrong and you have to figure out what went wrong at exactly what time and what the regulatory consequences were that having all that previously unread data suddenly becomes very interesting indeed.

    And also when you start looking at a system in detail to see if you can increase output, or change a composition, all that usually ignored data becomes very valuable.

  • Re:Which 90% ? (Score:3, Insightful)

    by Cylix (55374) * on Saturday July 10, 2010 @01:04PM (#32860808) Homepage Journal

    I'm afraid they will run into issues if they do. There are already storage providers that will determine what data you are accessing frequently and move said data chunk to the faster storage area. Conversely it will move less frequently accessed data to the slower and cheaper bulk disks.

    It's a nifty optimization/shuffle technique that allows you to mix ssd, sas and sata disks for their various needs. The best part is it is rather auto-magic.

    We used to do something similar in a very manual process by keeping the most frequently access oracle data on the leading edge of the disk platters.

    The problem with all of these approaches is the data may not be needed now. Hell, I would certainly say that 90% of the data I store is useless. Except when they want to roll back to a certain period in the archive's life or we lose a chunk of data. The other half of the time is just legal requirements that necessitates storing EVERYTHING.

  • Re:which 90% (Score:3, Insightful)

    by itwerx (165526) <itwerx@gmail.com> on Saturday July 10, 2010 @01:37PM (#32860940) Homepage

    "...we needed to store ALL the account information, and we needed fast access to ALL of it ALL the time."

    Which is why decent needs analysis is critical. In other situations that would not be the case.

    I must say this line at the end of the article does more to reflect the ignorance of the author than anything else, "...why on earth did we squander so much money by not thinking this way until now?"
          Who is this "we", kemosabe? Smart IT people have been thinking this way since the dawn of computers. Think of the huge storage rooms of archive, (not backup!), tapes that were around back in the mainframe days. We might store a higher percentage of it online nowadays but there's still a brisk market in optical storage arrays, high-speed tape libraries, various utilities for automatic email and database record archiving etc etc

  • Re:Coincidence? (Score:4, Insightful)

    by hairyfeet (841228) <{bassbeast1968} {at} {gmail.com}> on Saturday July 10, 2010 @04:21PM (#32861700) Journal

    Probably SOX and other data required for CYA. I have set up small business networks for quite a few businesses, and while I don't know about 90% I'd say a good 70% of the data they had me set up backup solutions for was stuff they would never break out unless a CYA situation came up like an IRS audit. The simple fact is you have to keep a LOT of stuff to CYA nowadays, and most of that stuff won't be used in any other situation.

    So while I'm not sure about the 90% part at least from my own experience I can believe 70-80% easy. With the possibility of lawsuits (both you suing them for unpaid bills or them suing you because they decide they don't like the work) IRS audits, SOX, there is a whole lot of data that unless a specific set of circumstances come up will be WORN. That is just a part of doing business in the digital age.

We are Microsoft. Unix is irrelevant. Openness is futile. Prepare to be assimilated.

Working...