Forgot your password?
typodupeerror
Data Storage IT

27 Billion Gigabytes to be Archived by 2010 178

Posted by timothy
from the if-not-sooner dept.
Lucas123 writes "According to a Computerworld survey of IT managers, data storage projects are the No. 2 project priority for corporations in 2008, up from No. 4 in 2007. IT teams are looking into clustered architectures and centralized storage-area networks as one way to control capacity growth, shifting away from big-iron storage and custom applications. The reason for the data avalanche? Archive data. In the private sector alone electronic archives will take up 27,000 petabytes (27 billion gigabytes) by 2010. E-mail growth accounts for much of that figure."
This discussion has been archived. No new comments can be posted.

27 Billion Gigabytes to be Archived by 2010

Comments Filter:
  • by Valacosa (863657) on Tuesday January 01, 2008 @06:07PM (#21877100)
    In other words, 27 Exabytes?

    Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible. Scientific notation and SI exist for a reason.
    • by mincognito (839071) on Tuesday January 01, 2008 @06:35PM (#21877292)

      Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible. Scientific notation and SI exist for a reason.
      Exactly! For the thousandth time, let's cut out the exaggerated and sensational writing Slashdot! If I had a dollar for every sensational headline I've read here, not to mention the gazillion overstated comments I read here per day, I'd be a billionaire by now!
    • by phoebusQ (539940) on Tuesday January 01, 2008 @06:37PM (#21877308)
      SI does exist for a reason: to allow for short, precise, descriptive, standardized measurements. However, the point of the numbers in this article is to show how absurdly large this amount of data really is. This isn't a scientific paper, it's a piece of journalism. In that case, there's nothing wrong with using numbers that aren't completely reduced to demonstrate scale.
      • Re: (Score:3, Insightful)

        by SharpFang (651121)
        his isn't a scientific paper, it's a piece of journalism. In that case, there's nothing wrong with using numbers that aren't completely reduced to demonstrate scale.

        No, standard != wrong.

        In this case, there's precisely the same thing wrong that is with all of journalism: use specific language constructs to push certain emotional messages along with information. AKA manipulation.
        • by wed128 (722152)
          I think you mean

          !standard = wrong

          or maybe even

          ~standard = wrong

          but

          (standard != wrong) == wrong
    • by Yez70 (924200)
      Journalists must write at an eighth grade level or the majority of their readers would not be able to understand them. Of course, for an arrogant intellect, such as yourself, maybe you should just stop reading so you can be happy.
    • by HappyEngineer (888000) on Tuesday January 01, 2008 @09:59PM (#21878534) Homepage
      Here is my helpful reference page for big numbers [g42.org]. I love big numbers. I'm actually working on a site right now which will help people to visualize big numbers. I can't give out the url yet because it'll be another month or two before it's ready to be seen. But, it'll have many fun options like Cow Stacking and Hamster Canyon.

      Cow stacking is where you select cow as the animal and from earth to moon as the place and you'll see a graphic of cows being stacked to the moon and the number of cows which would be required to complete that stack.

      Hamster Canyon will be where you select a hamster and the Grand Canyon and you'll see a picture of the Grand Canyon filled with hamsters and a number that indicates the total number of hamsters required to fill the canyon.
      • Re: (Score:3, Funny)

        by Anonymous Coward

        Hamster Canyon will be where you select a hamster and the Grand Canyon and you'll see a picture of the Grand Canyon filled with hamsters and a number that indicates the total number of hamsters required to fill the canyon.
        That's much better than Libraries of Congress. Most people haven't even seen the Library of Congress, but who hasn't seen huge piles of hamsters?
    • by SeaFox (739806) on Wednesday January 02, 2008 @12:59AM (#21879500)

      Note to science and tech journalists: please stop stringing together "millions" and "billions" in an attempt to make the numbers seem large, impressive, and incomprehensible.


      Joe Sixpacks digest technobabble at a rate that is relevant to them. While few would know what an Exabyte is, most would know what a Gigabyte is since they deal with numbers that size in relation to their own computing systems. I think it's less writing for sensationalism than it is writing in a language your audience will understand.
    • Re: (Score:2, Informative)

      by WaroDaBeast (1211048)
      SI only seems to exist outside the UK and the US -- talking about the ordinary people.
    • You know, a SaganByte of storage. It would have to store Billions of billions of bytes.
  • by thesymbolicfrog (907527) <sloanes.k@g m a il.com> on Tuesday January 01, 2008 @06:07PM (#21877102)
    From the summary:
    "E-mail growth accounts for much of that figure."

    We're archiving spam?
    • by 4D6963 (933028) on Tuesday January 01, 2008 @06:19PM (#21877182)

      We're archiving spam?

      Which raises a question I find interesting, do we check for redundancy when archiving mails, in a way so that we can save a hell of a lot of space on spam (and other legitimate automated messages), since spam is by definition essentially the same message sent to a number of persons. Also, couldn't correlating stored mails for redundancy allow for better spam identification (although it would be no silver bullet since legitimate automated messages are often redundant).

      • Re: (Score:2, Insightful)

        Good Spamers uses multiple methods of fooling spam scans.

        ~They use pictures of text, instead of text, so it takes more effort to filter based on content.

        ~They use random text at the bottom of their message to give the filter something to read.

        ~They generate random noise to superimpose over the picture. Every batch has a different noise layer.


        I'm sure they do more [IANASB - spam bot - so I wouldn't know the details] but the slight differences between what WE would perceive as the same message foil
        • by 4D6963 (933028)

          OK so basically you're dismissing my entire idea (which was part a question, I mean why wouldn't it be done to a certain extent already?) just because some an unknown (by you and me) ratio of the spam data isn't redundant.

          That would be kind of like saying "Why bother with implementing compressed file systems! Most people fill their disks with file that can't be significantly compressed anyways!". Sure, but you've still got millions of copies of the exact same Nigerian scams out there which are stored withou

          • Re: (Score:2, Insightful)

            I'm simply saying, the same thing that stops spam from being blocked in the first place stops your idea from coming to fruition. Millions of almost, but not quite, the exact same Nigerian scams are sent/stored without us having the ability of accurately checking for redundancy. With ~95% of all email being spam, you could make millions if you developed a program/process for correctly identifying multiple emails that are almost, but not quite, the exact same email CORRECTLY as spam, instead of let's say...
            • by 4D6963 (933028)

              I see, but my idea is more focused on solving the storage problem, and to get around the "95% redundancy" problem my idea was based on cutting messages into blocks depending on whether they're redundant or unique, as described here [slashdot.org].

      • by goodtim (458647) on Tuesday January 01, 2008 @06:59PM (#21877456) Journal

        Actually, I have a partial answer to this question. As a sysadmin for a Novell GroupWise email system, I can tell you that the actually message data for duplicate incoming messages (such as spam that is sent to many people at the same time) are only stored on disk once. Some sort of "pointer" is used to reference the messages to the individual users mailboxe's. Check out the docs [novell.com] if you are interested.

        That said with about 1400 users (spread across multiple postoffices), we have probably about 400gb of email data. We are able to keep it low, by having a 120 day retention policy. After that point, email can be archived locally, otherwise its deleted. Independant of that, and to comply with regulations and disaster recovery scenarios, email data is backed up and replicated offsite using disk-to-disk backup (eVault [evault.com] in case anyone is interested).

        This gives us the ability to archive email for up to 27 years or something like that (with relatively low storage costs because the disk-to-disk is incremental, storing changes at the per-block level).

        As for Microsoft Exchange, I have not the slightest clue how data is stored.

        • I suspect that the behaviour you're describing is only for the case when multiple deliveries occur via a single SMTP transaction (i.e. multiple RCPT TO commands before DATA) rather than the general case of messages-which-happen-to-be-identical, which is what the OP was positing.

          Either that, or when the sending system sends the same message in multiple transactions (i.e. poor mailer, or a mailer interrupted by a 452 response code) and the messages have the same Message ID header.0

          That said, the original pos
          • by 4D6963 (933028)

            That said, the original poster makes an assumption that identical-looking messages are likely to be indistinguishable

            No, I make the assumption that identical-looking messages have most of their data in common, and that this common data, even if only a chunk of the message starting and stopping at an arbitrary point, could be stroed efficiently.

            That means cutting messages into blocks, if it is found that some part has something in common with another one, to store common blocks of data all in one place. Th

            • "That means cutting messages into blocks, if it is found that some part has something in common with another one, to store common blocks of data all in one place."

              Substitute "words" for "blocks" and you will find you have invented a dictonary.
              • by 4D6963 (933028)

                Substitute "words" for "blocks" and you will find you have invented a dictonary.

                Duh, of course by blocks I mean blocks of a significant threshold size. You're just nitpicking ;-)

                • Re: (Score:3, Informative)

                  by TapeCutter (624760)
                  "You're just nitpicking ;-)"

                  Ummm, no. I have CS degree and 20yrs experience. What you are talking about is the attacking the problem of redundant information [wikipedia.org] by comparing blocks, this has already been 'solved'. ;)
                  • by 4D6963 (933028)

                    Ummm, no. I have CS degree and 20yrs experience.

                    And? You were nitpicking anyways... Yay, a Wikipedia link that's barely even relevant! Anyways, maybe that's already been 'solved', but the question is not whether this has ever been solved but if it's ever been implemented as such for e-mail storage. But maybe you can tell me what's flawed with my idea of (large) block redundancy detection for e-mail storage to begin with instead of rubbing your credibility in my face.

      • we check for redundancy when archiving mails, in a way so that we can save a hell of a lot of space on spam

        I could see that helping if the same spam is sent to the clients on your network, but it doesn't account for all the subsequent iterations of the spam.

        YMMV, but I see a lot of spam carrying highly varied introductory garbage (to attempt to fool spam filtering software, of course). Some of my email accounts easily receive 10x as much spam as legitimate email, which would make a redundancy check difficult to apply.

        But if it works for you, then more power to you.

    • Exactly. E-mail use is declining in non-company use to IM and text messaging. Due to spam and other factors I would highly disagree that E-mail will grow that much. With cross-platform IM clients such as Pidgin, the OS is no problem for IM and in young people both IMs and text messaging have made E-mail needless.
      • by wed128 (722152)
        So, basically what your saying is, that in South Korea, E-mail is for old people?
    • homeland security.
    • Re: (Score:3, Interesting)

      by LoudMusic (199347) *

      From the summary:
      "E-mail growth accounts for much of that figure."

      We're archiving spam?
      No, we have associates using their email as a file storage device - sending documents to eachother through email rather than just sending an email that says "Your 38MB file is on the file server in /X/here/where/there/document.type".
    • by igny (716218)
      We're archiving spam?

      Archiving is the best way to deal with any unnecessary and unneeded information, spam included. So many times I archived my workfiles with the thought that if I don't open that archive in 12 months, it is all junk and I can just toss it away. I believe my brain is working the same way only faster. What are we talking about again?
    • From the summary:
      "E-mail growth accounts for much of that figure."
      We're archiving spam?

      Ignoring even the spam issue, there's also the issue that Outlook encourages people to include the previous message in its entirety, causing an O(n^2) effect for legitimate message chains; that is, every message in a conversation tends to include all previous messages. This not only increases archival size, but it also causes mailboxes to approach their seemingly arbitrary upper bound on mailbox size much more rapi

      • Too bad Microsoft doesn't have a research department where some of the many boffins who work for them could solve some of these interesting problems and provide useful technology. Oh, wait, they do. Too bad nothing from Microsoft research ever seems to see the real world. MS Management would never go for selling software that people actually want. Only loser companies who aren't monopolies do that.

        The biggest problem I found with Outlook is that its performance is O(n^2) based on the number of messages
  • by Urger (817972) on Tuesday January 01, 2008 @06:07PM (#21877108) Homepage

    E-mail growth accounts for much of that figure.

    They should have that looked at. A good dermatologist could remove it.
  • Distributed Storage (Score:3, Informative)

    by Anonymous Coward on Tuesday January 01, 2008 @06:08PM (#21877110)
    Some big projects are generating too many data that they have problems to deal with all that.
    For example the Folding@home is implementing a distributed storage mechanism for their data and we'll likely have a new @home project soon - Storage@home.
    http://en.wikipedia.org/wiki/Storage@home [wikipedia.org]
    http://www.stanford.edu/~beberg/Storage@home2007.pdf [stanford.edu]
    http://folding.stanford.edu/English/Papers#ntoc7 [stanford.edu]
  • by Zordak (123132) on Tuesday January 01, 2008 @06:12PM (#21877124) Homepage Journal
    E-mail is the biggest burden on the storage space, and so much of that is garbage (I'm not even talking about spam---most "legitimate" e-mail is garbage). I wonder if there would be appreciable negative repercussions to deleting most of it. It seems like as often as not, all you get from archived e-mails is well-documented and discoverable "smoking guns" when you get sued. What if we just stored less of it? Would it be that bad? How likely is it that you're going to need some random Word document from 1998? Not criticizing---I'd really like to know.
    • Re: (Score:2, Insightful)

      In the U.S., it's the law that a company must retain all electronic documents just in case they do ever have to go to court, for whatever reason. IMO, this is one of those very poorly thought out laws as 1) how do you punish a company for contempt when they can't hand over their e-mails because of 2) almost nobody currently archives all of their e-mails. Also, how do you prove that you've not deleted any? Plus, how does anybody ever sort through them all during discovery? I pity that law clerk.
      • by kestasjk (933987)
        This already happened when MS lost a bunch of e-mail relating to the IE case, didn't it?
      • It's not ALL companies as you state in your post. Regulations requiring e-mail archives are only for publicly traded companies (ie: on the stock exchanges). Private companies have no such requirement.

    • by phoebusQ (539940)
      In the US (and I'm sure other places as well), companies are required to archive electronic data.
    • by ZorbaTHut (126196)
      Every once in a while I need to dig out an ancient email from my email repository. I don't have any way of knowing which one ahead of time - sometimes it's something obviously important, sometimes it turns out to be something incredibly unimportant (one of my friends deleted an important Livejournal entry once accidentally, but I'd responded to the entry with a mostly-unimportant comment and Livejournal emails me with the entire entry text when I do that. Surprise! It's important!)

      On top of that, the sheer
    • by kent_eh (543303)
      I seem to recall several recent articles about new data retention laws requiring companies to do just that - store potentiality incriminating e-mails for absurdly long periods of time.

      So, to answer your question:

      What if we just stored less of it?

      You might get fined or jailed.
      • by Zordak (123132)
        I know of no such law. I know that the Federal Rules of Civil Procedure require litigants to produce archived data, and I know that litigants can be sanctioned for destroying data in bad faith. The Rules also provide a safe harbor for data destroyed in good faith in accordance with a reasonable data retention policy. So what's reasonable? What is the real probability that a business will have non-litigation problems?
  • by Maskirovka (255712) on Tuesday January 01, 2008 @06:13PM (#21877132)
    article summary:

    Users in a lot of places use their email as a document management system. This is somewhat effective on an individual basis, but in large organizations shared documents get duplicated dozens or even hundreds of times as each user has their own copy. In the next few years products like Sharepoint will alleviate some of that, though storage is cheap enough that it may not be worth the cost to both reeducate users and build the infrastructure for it. A SAN can hold real a lot of word documents and PDFs after all...
    • Users in a lot of places use their email as a document management system. This is somewhat effective on an individual basis, but in large organizations shared documents get duplicated dozens or even hundreds of times

      That's exactly the message of this book [oreilly.com]. Email, although widely used, is neither practical nor effective as a means of divulging information in a company. And duplication of information is the lesser problem.

      For instance, suppose someone leaves the company, either permanently or in a vacation, a

      • Email is good for communication, a company wiki (or other sort of document/information management system that is web-based) is good for knowledge storage/retrieval/transfer.
    • I don't get it. Most large companies have servers that store documents and such, along with that, most computers have 40 gig- 120 gig hard drives and drives up to 1 TB or so can be bought for cheap. How are we running out of space in a large company? And why "archive" E-mail thats stored on the computer AND an E-mail server?
      • by leenks (906881)
        Go and work for a large company and find out. You can't use the hard drive in a workstation to store anything other than applications - the machine will (out of necessity) be a standard image that will get blasted from time to time with updates, or when something breaks on the Windows install.

        For enterprise storage, hard drives are not cheap. Yes, you can buy domestic IDE drives for cheap, but check the prices on SAS or "enterprise grade" storage. A large company will have potentially petabytes of data - ba
    • by Znork (31774) on Tuesday January 01, 2008 @06:53PM (#21877416)
      Better article summary:

      Storage vendors want to sell expensive solutions to gullible execs, pay analysts to produce credible-sounding FUD scenarios.

      "monthly e-mail traffic at more than 30 million messages, vs. 17 million just one year ago."

      Like, wow. In the meantime 500GB disks cost the same or less than 250GB disks did a year ago.

      "The university settled on an IBM storage infrastructure that will afford the institution 350TB of capacity"

      350TB? 350 disks? Half that in a year and a quarter in 2? That's not really a huge amount of storage. Anymore. It's an amount of storage I could go order from my friendly online computer store and get delivered tomorrow.

      The fact is, corporate storage isnt driving the market anymore, the consumer market is. Most people I know have more storage in their home PC than the average server requires. Companies want to save video? Consumers want their PVR's to save the cable-tv stream.
      • by leenks (906881) on Tuesday January 01, 2008 @07:27PM (#21877668)
        More like 1000 or 2000 disks, not 350. 1TB drives haven't really hit the enterprise yet. The biggest SAS drives in use are still 300GB.

         
        • I've got an array of web servers we just brought up (10 of them) that are Supermicro boxes with 1TB drives in a RAID1 configuration (2 drives). They're SATA2 instead of SAS, but they're still quick as hell and have to deal with a LOT of daily log files.
        • by bmgoau (801508)
          I work for a wholesale company in Sydney Australia, and we ship terabytes a week, if we had the order it wouldnt be to big a jump for us to provide 350 terabyte hard drives. At the same time we're seeing huge sales surges in NAS/SAS units of 1 terabyte and up. For consumers, the 1TB Western Digital World Book is the best example, we sell alot of those to media and entertainment stores already.

          In only a year the size and value of hard disk drives has increased monumentally, and tomorrow at work i see no sign
  • 2010 (Score:5, Funny)

    by Anonymous Coward on Tuesday January 01, 2008 @06:15PM (#21877156)
    All these archives are yours except Europa. ATTEMPT NO WRITINGS THERE.
  • by jd (1658) <.imipak. .at. .yahoo.com.> on Tuesday January 01, 2008 @06:19PM (#21877190) Homepage Journal
    Things like Libraries of Congress, Libraries of Alexandria, Spams per Square Inch. You know, the units that people have become familiar with. Besides which, are they power-two gigagytes or SI gigabytes? Also, how much bandwidth is needed to shift all that data? In the standard Imperial units of Clay Tablets per German Juggernaut per unit of French motorway, naturally.
  • Surprising . . . (Score:4, Insightful)

    by cashman73 (855518) on Tuesday January 01, 2008 @06:20PM (#21877194) Journal
    That 90% of that 27,000 petabyte figure isn't for archiving p0rn,... Although I guess, from the corporate IT perspective, they're not worried about backing up p0rn, since most people probably don't do that at work.

    But it is mostly email they're talking about here, and I bet a HUGE part of this archiving is:

    1. spam
    2. Email forwards that have been sent 1,000 times that still have all the original message headers attached
    3. Non-business-related multimedia emails sent by administrative assistants using the company's email and time to send and receive cutesy messages from/to their family & friends
    4. Business-related powerpoint and multimedia emails by non-techie PHBs that don't know how to transfer such files via FTP, and who are too damn lazy to use a thumbdrive

    Yep! Solve problems 1-3, and you'd vastly decrease the amount of email that you have to archive! I won't complain about #4, since I actually value my job, but it would be nice if more PHBs knew more about tech,...

    • Re: (Score:3, Insightful)

      by houghi (78078)
      About 4. I do not understand management where I am.

      I make several excel files every week for reporting. They are located on a shared drive. Only extra data is added every monday, yet instead of puting a link to the files, or the directory, management wants me to send them by email every week to several people.

      Utterly stupid, if you ask me.
      • by ZorbaTHut (126196)
        The directory is backed up and version-controlled, right?

        Because if not, that might be an (admittedly crummy) attempt at a backup system.
        • by igny (716218)
          My way to archive my email is of course version controlled. Every month I just archive my inbox, date it, and send it to myself via email.
  • For Fucks sake (Score:3, Insightful)

    by Colin Smith (2679) on Tuesday January 01, 2008 @06:30PM (#21877256)
    Just delete the crap.

     
  • 30 million emails went through the pitt.edu email servers last year, and my account there didn't get squat during the christmas break! I wonder where all the email is going? Although the university is closed anyway, so that might have something to do with it,...

    I suppose if I was crazy enough, I'd post my address here on slashdot to see if we can slashdot Pitt's email servers,... maybe we can turn 30 million messages into 60 million messages. On second thought, I don't want 30 million messages,... ;-)

    • by JustNiz (692889)
      It sounds like someone might be using your servers for sending/forwarding spam. Your system might be telling us all how we can "improve ur p3n1s size" or "help Dr. mbongo from Burkina-faso move $99999999 into your account".
  • by petes_PoV (912422) on Tuesday January 01, 2008 @06:42PM (#21877342)
    E-mail growth accounts for much of that figure

    And a great deal of video archive from CCTV as well I expect.
    The question that arises is how would you index all this?

    • And a great deal of video archive from CCTV as well I expect.
      The question that arises is how would you index all this?


      By time. And then you can go by difference and then by motion.

      You could even have a second pass running that picks out faces and objects. These can then be compared to another database of similar faces and objects. All of these would then also be stored with references back to the original video.

      It can be as simple or as complicated as you want. The technology exists today (and I'm sure is b
    • Unfortunately the idea was crushed by ruthlessly greedy band of small-minded bloodsuckers with large legal staffs.

      They called it "Napster."
  • by HockeyPuck (141947) on Tuesday January 01, 2008 @07:02PM (#21877484)
    FTFA:

    Mounting interest in these approaches highlights a pronounced shift away from "big-iron storage" - traditional storage arrays typically composed of custom application-specific integrated circuits, RAID controllers, and fixed-disk and cache-scalability ceilings.
    Now TFA goes on to say customers are turning towards Network Appliance as a company that uses COTS parts and software. They use an intel CPU and FC/GigE adapters from other vendors, but I wouldn't call them 100% COTS. It's not like it's a generic PC built from FRYS with JBOD on the back.

    NetApp is a great company and makes a great product aimed for a specific market segment: Fileservices (NFS/CIFS). I don't see many customers tossing out the EMC DMX, HDS Tagmastore or IBM Shark for a FC enabled netapp array. I also don't see a lot of FICON shops asking netapp to support FICON.

    Now the phase storage mgmt is entering is the 'good enough' phase. Does my organization need the current generation of "high end" arrays? Maybe not. The current generation of midrange with its better or cheaper $/GB and increasingly parallel featureset to the highend arrays, is starting to looking more attractive to many customers.
    • Re: (Score:3, Insightful)

      by phoebusQ (539940)
      FTFA, RAID, TFA, COTS, CPU, FC, GigE, FRYS, JBOD, CFS, CIFS, EMC, DMX, HDS, IBM, FC, FICON... 17+ acronyms in one post...that's pretty impressive. Do you kiss your mother with that mouth? :)
      • by HockeyPuck (141947) on Tuesday January 01, 2008 @07:25PM (#21877644)
        FRYS isn't an acronym... :)

        and yes I do.
      • From the F*cking article, Redundant Array of Inexpensive Disks, The F*cking article, Commercial Off-The-Shelf, Central Processing Unit, Fiber Channel, Gigabit Ethernet, Fry's Electronics (not an acronym), Just a Bunch Of Disks, Caching File System, Common Internet File System, Electro-Magnetic Compatibility, DataMining Extensions, Hierarchical Data System, International Business Machines, Fiber Channel (again), FIber CONnectivity. Didn't Read The F*cking Article (RTFA) yet, so some acronyms with more than
        • by HockeyPuck (141947) on Tuesday January 01, 2008 @08:36PM (#21878104)
          We're talking storage (sorry DASD) here... It's all about...

          Hooking up a pair of EMC DMX's (or IBM ESSes, or HDS USPs) over a pair of OC48s for SRDF/PPRC/USR unless you are a zOS shop, then you could run XRC. Since this is a BC/DR plan, we'll run it over FCIP protected by IPSec over a DWDM leased line, which must be protected by a UPSR/BLSR, otherwise in the event of a link failure, the R1s will split from the R2s.

          Then you're SOL.
  • Redundant Data (Score:2, Interesting)

    by tm8992 (919320)
    I wonder how much of this data is really redundant--copies of other data. How many emails can really be unique? How many employees download the same video a hundred times on the company's server? As network speeds increase, it will be less necessary for multiple users to store the same thing (think streaming those videos), so could this really be an exaggeration of future storage requirements? Could a better system be designed to minimize redundancy?
  • that will be lost or stolen as company employees fail to properly encrypt back-ups, leave laptops in their car while running in for a latte or some such? Seriously, though, the article says storage is corporations' number 2 concern. What's number one from this survey? Is it security?
  • Just ZIP up the data to a smaller zip file. Then zip the zip file to and even smaller zip file. Repeat until all your data is compressed into a couple of megs. :-)
  • by PPH (736903) on Wednesday January 02, 2008 @02:31PM (#21884832)
    ... most of this will be documents in formats older than Office 2003 [slashdot.org].
  • ...and to think human genome is just a puny 800MB.

"The greatest warriors are the ones who fight for peace." -- Holly Near

Working...