Data Deduplication Comparative Review 195
snydeq writes "InfoWorld's Keith Schultz provides an in-depth comparative review of four data deduplication appliances to vet how well the technology stacks up against the rising glut of information in today's datacenters. 'Data deduplication is the process of analyzing blocks or segments of data on a storage medium and finding duplicate patterns. By removing the duplicate patterns and replacing them with much smaller placeholders, overall storage needs can be greatly reduced. This becomes very important when IT has to plan for backup and disaster recovery needs or when simply determining online storage requirements for the coming year,' Schultz writes. 'If admins can increase storage usage 20, 40, or 60 percent by removing duplicate data, that allows current storage investments to go that much further.' Under review are dedupe boxes from FalconStor, NetApp, and SpectraLogic."
Second post (Score:2, Funny)
Same as the first.
Re:Um.. (Score:3, Funny)
I already do this (Score:4, Funny)
After an analysis of a 1TB drive, I noticed that roughly 95% were 0's with only 5% being 1's.
I was then able to compress this dramatically. I just record that there are 950M 0's and 50M 1's. The space taken up drops to around 37 bits. Throw in a few checksum bits, and I am still under eight bytes.
I am not sure what is so hard about this disaster recovery planning. Heck, I figure I am up for a promotion after I implement this.
Re:Don't forget to weigh in the cost (Score:3, Funny)
Satan, is that you?
Cheers,
Re:Wrong layer (Score:3, Funny)
Sounds like what we need is a giant table of all possible byte values up to 2^n length, then we can just provide the index to this master table instead of the data itself. I call this proposal the storage-storage tradeoff where, in exchange for requiring large amounts of storage, we require even more storage. I'll even throw in the extra time requirements for free.