Data Deduplication Comparative Review

Data Deduplication Comparative Review 195

Posted by samzenpus on Wednesday September 15, 2010 @07:10PM from the a-little-order-please dept.

snydeq writes "InfoWorld's Keith Schultz provides an in-depth comparative review of four data deduplication appliances to vet how well the technology stacks up against the rising glut of information in today's datacenters. 'Data deduplication is the process of analyzing blocks or segments of data on a storage medium and finding duplicate patterns. By removing the duplicate patterns and replacing them with much smaller placeholders, overall storage needs can be greatly reduced. This becomes very important when IT has to plan for backup and disaster recovery needs or when simply determining online storage requirements for the coming year,' Schultz writes. 'If admins can increase storage usage 20, 40, or 60 percent by removing duplicate data, that allows current storage investments to go that much further.' Under review are dedupe boxes from FalconStor, NetApp, and SpectraLogic."

Data Deduplication Comparative Review

This discussion has been archived. No new comments can be posted.

Search 195 Comments Log In/Create an Account

Comments Filter:

Don't forget to weigh in the cost (Score:3, Informative)

by leathered ( 780018 ) writes: on Wednesday September 15, 2010 @07:19PM (#33594114)

The shiny new NetApp appliance that my PHB decided to blow the last of our budget on saves around 30% by using de-dupe, however we could have had 3 times conventional storage for the same cost.
NetApp is neat and all but horribly overpriced.

Re:Don't forget to weigh in the cost (Score:3, Informative)

by hardburn ( 141468 ) writes: <hardburn@wumpus-ca[ ]net ['ve.' in gap]> on Wednesday September 15, 2010 @07:35PM (#33594262)

Was it near the end of the fiscal year? Good department managers know that if they use up their full budget, then it's harder to argue for a budget cut next year. Managers will sometimes blow any excess funds at the end of the year on things like this for that very reason.

Re:Wrong layer (Score:3, Informative)

by KiloByte ( 825081 ) writes: on Wednesday September 15, 2010 @07:40PM (#33594310)

It's not fully automatic, I assume? Since that would cause a major slowdown.
For manual dedupes, btrfs can do that as well, and a part of vserver patchset (not related to the main functionality) includes a hack that works for most Unix filesystems.

Use ZFS. It offers dedupe, compression, etc. (Score:4, Informative)

by jgreco ( 1542031 ) writes: on Wednesday September 15, 2010 @07:48PM (#33594382)

ZFS offers dedupe, and is even available in prepackaged NAS distributions such as Nexenta and OpenNAS. You too can have these great features, for much less than NetApp and friends.

Re:Wrong layer (Score:5, Informative)

by phantomcircuit ( 938963 ) writes: on Wednesday September 15, 2010 @07:51PM (#33594400) Homepage

It is fully automatic and it's not that much of a slow down. The reduced IO might actual provide a performance boost.

Re:Wrong layer (Score:5, Informative)

by suutar ( 1860506 ) writes: on Wednesday September 15, 2010 @07:52PM (#33594404)

Actually, it is automatic. ZFS already assumes you have a multithreaded OS running on more cpu than you probably need (e.g. Solaris), so it's already doing checksums (up to and including SHA256) for each data block in the filesystem. Comparing checksums (and optionally entire datablocks) to determine what blocks are duplicates isn't that much extra work at that point, although for deduplication you probably want to use a beefier checksum than you might choose otherwise, so there is some increase in work. http://blogs.sun.com/bonwick/entry/zfs_dedup [sun.com] has some more information on it. Getting it onto my linux box, now.. there's the rub. userspace ZFS exists, but I've only seen one pointer to a patch for it that includes dedup, and I haven't heard any stability reports on it yet.

Re:Um.. (Score:3, Informative)

by cetialphav ( 246516 ) writes: on Wednesday September 15, 2010 @07:58PM (#33594464)

AFAIK this is pretty much how every compression algorithm works. No need to give it a fancy name.
The reason it has a different name is to distinguish this from a compressed file system. The blocks of data are not compressed in these systems. Imagine that you have a file system that stores lots of vmware images. In this system, there are lots of files that store the same information because the underlying data is OS system files and applications. Even if you compress each image, you will still have lots of blocks that have duplicate values.
Deduplication says that the file system recognizes and eliminates duplicate blocks across the entire file system. If a given block has redundant data within it, that redundancy is not removed because the blocks themselves are not actually compressed. This is the difference between a compressed file system and a deduplicated file system. In fact, there is no reason that you could not combine both of these methods into a single system.

Re:Use ZFS. It offers dedupe, compression, etc. (Score:3, Informative)

by lisany ( 700361 ) writes: <slashdot@thDEGASedoh.com minus painter> on Wednesday September 15, 2010 @08:21PM (#33594668)

Except NexentaStor (3.0.3) has an OpenSolaris upstream (which has gone away, by the way) kernel bug that hanged our Nexenta test box. Not a real good first impression.

Re:Wrong layer (Score:3, Informative)

by hoytak ( 1148181 ) writes: on Wednesday September 15, 2010 @09:09PM (#33595142) Homepage

The latest stable version of zfs-fuse, 0.6.9, includes pool version 23 which has dedup support. Haven't tried it out yet, though.
http://zfs-fuse.net/releases/0.6.9 [zfs-fuse.net]

Re:De-Dupe on Linux? (Score:2, Informative)

by suutar ( 1860506 ) writes: on Wednesday September 15, 2010 @09:28PM (#33595312)

There's a few. I've read there's a patchset for ZFS on FUSE that can do deduplication; there's also opendedup [slashdot.org] and lessfs [lessfs.com]. The problem is that none of these has been around long enough to be considered bulletproof yet, and for a filesystem whose job is to play fast and loose with file contents in the name of space savings, that's kinda worrisome.

Re:Wrong layer (Score:3, Informative)

by TheRaven64 ( 641858 ) writes: on Thursday September 16, 2010 @06:42AM (#33597770) Journal

Nexenta is developed by the people behind the Illumous Foundation, who have created a 'spork' of OpenSolaris, which will continue to import code from each of the source dumps that Oracle has said they will do after each Solaris release, will fix bugs, and will replace the binary-only components of OpenSolaris with open ones.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Data Deduplication Comparative Review 195

Data Deduplication Comparative Review More Login

Data Deduplication Comparative Review

Don't forget to weigh in the cost (Score:3, Informative)

Re:Don't forget to weigh in the cost (Score:3, Informative)

Re:Wrong layer (Score:3, Informative)

Use ZFS. It offers dedupe, compression, etc. (Score:4, Informative)

Re:Wrong layer (Score:5, Informative)

Re:Wrong layer (Score:5, Informative)

Re:Um.. (Score:3, Informative)

Re:Use ZFS. It offers dedupe, compression, etc. (Score:3, Informative)

Re:Wrong layer (Score:3, Informative)

Re:De-Dupe on Linux? (Score:2, Informative)

Re:Wrong layer (Score:3, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot