27 Billion Gigabytes to be Archived by 2010 178
Lucas123 writes "According to a Computerworld survey of IT managers, data storage projects are the No. 2 project priority for corporations in 2008, up from No. 4 in 2007. IT teams are looking into clustered architectures and centralized storage-area networks as one way to control capacity growth, shifting away from big-iron storage and custom applications. The reason for the data avalanche? Archive data. In the private sector alone electronic archives will take up 27,000 petabytes (27 billion gigabytes) by 2010. E-mail growth accounts for much of that figure."
So, in other words... (Score:5, Interesting)
"E-mail growth accounts for much of that figure."
We're archiving spam?
how much is surveillance data? (Score:3, Interesting)
And a great deal of video archive from CCTV as well I expect.
The question that arises is how would you index all this?
Re:So, in other words... (Score:5, Interesting)
Actually, I have a partial answer to this question. As a sysadmin for a Novell GroupWise email system, I can tell you that the actually message data for duplicate incoming messages (such as spam that is sent to many people at the same time) are only stored on disk once. Some sort of "pointer" is used to reference the messages to the individual users mailboxe's. Check out the docs [novell.com] if you are interested.
That said with about 1400 users (spread across multiple postoffices), we have probably about 400gb of email data. We are able to keep it low, by having a 120 day retention policy. After that point, email can be archived locally, otherwise its deleted. Independant of that, and to comply with regulations and disaster recovery scenarios, email data is backed up and replicated offsite using disk-to-disk backup (eVault [evault.com] in case anyone is interested).
This gives us the ability to archive email for up to 27 years or something like that (with relatively low storage costs because the disk-to-disk is incremental, storing changes at the per-block level).As for Microsoft Exchange, I have not the slightest clue how data is stored.
Re:Wow, welfare for programmers... (Score:4, Interesting)
And what do data-archiving rules have to do with welfare for programmers? Maybe for disk manufacturing firms or data admins, but programmers?
Redundant Data (Score:2, Interesting)
a helpful reference page for large numbers (Score:5, Interesting)
Cow stacking is where you select cow as the animal and from earth to moon as the place and you'll see a graphic of cows being stacked to the moon and the number of cows which would be required to complete that stack.
Hamster Canyon will be where you select a hamster and the Grand Canyon and you'll see a picture of the Grand Canyon filled with hamsters and a number that indicates the total number of hamsters required to fill the canyon.
Re:So, in other words... (Score:3, Interesting)
"E-mail growth accounts for much of that figure."
We're archiving spam?