Data Deduplication Comparative Review 195
snydeq writes "InfoWorld's Keith Schultz provides an in-depth comparative review of four data deduplication appliances to vet how well the technology stacks up against the rising glut of information in today's datacenters. 'Data deduplication is the process of analyzing blocks or segments of data on a storage medium and finding duplicate patterns. By removing the duplicate patterns and replacing them with much smaller placeholders, overall storage needs can be greatly reduced. This becomes very important when IT has to plan for backup and disaster recovery needs or when simply determining online storage requirements for the coming year,' Schultz writes. 'If admins can increase storage usage 20, 40, or 60 percent by removing duplicate data, that allows current storage investments to go that much further.' Under review are dedupe boxes from FalconStor, NetApp, and SpectraLogic."
Don't forget to weigh in the cost (Score:3, Informative)
The shiny new NetApp appliance that my PHB decided to blow the last of our budget on saves around 30% by using de-dupe, however we could have had 3 times conventional storage for the same cost.
NetApp is neat and all but horribly overpriced.
Re:Don't forget to weigh in the cost (Score:3, Informative)
Was it near the end of the fiscal year? Good department managers know that if they use up their full budget, then it's harder to argue for a budget cut next year. Managers will sometimes blow any excess funds at the end of the year on things like this for that very reason.
Re:Wrong layer (Score:3, Informative)
It's not fully automatic, I assume? Since that would cause a major slowdown.
For manual dedupes, btrfs can do that as well, and a part of vserver patchset (not related to the main functionality) includes a hack that works for most Unix filesystems.
Use ZFS. It offers dedupe, compression, etc. (Score:4, Informative)
ZFS offers dedupe, and is even available in prepackaged NAS distributions such as Nexenta and OpenNAS. You too can have these great features, for much less than NetApp and friends.
Re:Wrong layer (Score:5, Informative)
It is fully automatic and it's not that much of a slow down. The reduced IO might actual provide a performance boost.
Re:Wrong layer (Score:5, Informative)
Re:Um.. (Score:3, Informative)
AFAIK this is pretty much how every compression algorithm works. No need to give it a fancy name.
The reason it has a different name is to distinguish this from a compressed file system. The blocks of data are not compressed in these systems. Imagine that you have a file system that stores lots of vmware images. In this system, there are lots of files that store the same information because the underlying data is OS system files and applications. Even if you compress each image, you will still have lots of blocks that have duplicate values.
Deduplication says that the file system recognizes and eliminates duplicate blocks across the entire file system. If a given block has redundant data within it, that redundancy is not removed because the blocks themselves are not actually compressed. This is the difference between a compressed file system and a deduplicated file system. In fact, there is no reason that you could not combine both of these methods into a single system.
Re:Use ZFS. It offers dedupe, compression, etc. (Score:3, Informative)
Except NexentaStor (3.0.3) has an OpenSolaris upstream (which has gone away, by the way) kernel bug that hanged our Nexenta test box. Not a real good first impression.
Re:Wrong layer (Score:3, Informative)
The latest stable version of zfs-fuse, 0.6.9, includes pool version 23 which has dedup support. Haven't tried it out yet, though.
http://zfs-fuse.net/releases/0.6.9 [zfs-fuse.net]
Re:De-Dupe on Linux? (Score:2, Informative)
Re:Wrong layer (Score:3, Informative)