Data Deduplication Comparative Review 195
snydeq writes "InfoWorld's Keith Schultz provides an in-depth comparative review of four data deduplication appliances to vet how well the technology stacks up against the rising glut of information in today's datacenters. 'Data deduplication is the process of analyzing blocks or segments of data on a storage medium and finding duplicate patterns. By removing the duplicate patterns and replacing them with much smaller placeholders, overall storage needs can be greatly reduced. This becomes very important when IT has to plan for backup and disaster recovery needs or when simply determining online storage requirements for the coming year,' Schultz writes. 'If admins can increase storage usage 20, 40, or 60 percent by removing duplicate data, that allows current storage investments to go that much further.' Under review are dedupe boxes from FalconStor, NetApp, and SpectraLogic."
Wrong layer (Score:5, Insightful)
Filesystems should be doing this.
Which filesystem should be doing this??? (Score:2, Insightful)
Filesystems should be doing this.
The one on your desktop machine, or the primary NAS storage that you access shared data from, or the backup server that ends up getting it all anyway? You see, this is a shared database problem. If your local filesystem does this, then it has to 'share' knowledge of all the unique blocklets with every other server/filesystem that wishes to share in this compressed file space. De-duplication is a means of compression that works across many filesystems - or at least it can be, if it is properly implemented.
Re:Don't forget to weigh in the cost (Score:4, Insightful)
More disk is still so much cheaper it really cannot be justified on that front. More disks also mean more IOPS, so reducing sinning platters can be a bad thing.
There are some reasons to go for it, but even with thousands of clients it may or may not be suitable for what you are doing.
Re:Wrong layer (Score:3, Insightful)
Open Solaris is dead, and there are kernel bugs in the latest version, so good luck with that. I looked at doing it at one time and due to fears about Opensolaris I stayed away. I consider myself lucky.
Ya it is (Score:4, Insightful)
Something you start to appreciate when you are called on to do a really high availability, high reliability system is to have features like this. For one thing it reduces the time it takes to get a replacement. Unless a drive fails late at night, you get one the next day. You don't have to rely on someone to notice the alert, place the order, etc. It just happens. Also, like most high end support companies, their shipping time is fairly late so even late in the day it is next day service. What arrives is the drive you need, in its caddy, ready to go.
Then there's just the fact of having someone else help monitor things. It's easy to say "Oh ya I'll watch everything important and deal with it right away," but harder to do it. I've known more than a few people who are not nearly as good at monitoring their critical system as they ought to be. A backup is not a bad thing.
You have to remember that the kind of stuff you are talking about for things like NetApps is when no downtime is ok, when no data loss is ok. You can't say "Ya a disk died and before we got a new on in another died so sorry, stuff is gone."
Not saying that your situation needs it, but there are those that do. They offer other features along those lines like redundant units, so if one fails the other continues no problem.
Basically they are for when data (and performance) is very important and you are willing to spend money for that. You put aside the tech-tough guy attitude of "I can manage it all myself," and accept that the data is that important.
Re:Ya it is (Score:3, Insightful)
I mean have the nagios server order the drive without any human intervention.
Also if it was really critical you would keep several disks ready to go on site. You know for when you can't wait for next day. Also like netapp you too can have many hot spares in the volume.
If you have problems with people not noticing or reacting to alerts you need to fire them.
Re:Wrong layer (Score:3, Insightful)
Sweet, thanks for the pointer. I was also concerned about the death of OpenSolaris but it sounds like Nexenta may be just what I want.
Nexenta is built off Open Solaris and is, therefore, also dead - though it may take longer for the thrashing to stop.
Re:Wrong layer (Score:4, Insightful)
Filesystems should be doing this.
No, block devices should be doing this. Then you get the benefits regardless of which filesystem you want to layer on top.
Re:Don't forget to weigh in the cost (Score:4, Insightful)
Re:Ya it is (Score:2, Insightful)
Re:Ya it is (Score:3, Insightful)
Developing a monitoring system for a complicated piece of storage that reacts properly to every possible failure mode is a massive undertaking. It will take a lot of time just to figure out everything that you need to monitor, and the possible values for them during normal operation; let alone actually test that your system correctly detects and responds to every possibility.
If your business is providing SAN management/support services, then I can see this as being worthwhile. It's a massive investment in technology and skills amongst your staff, but if that's what you make your money doing, it may well give you a competitive edge.
But if your business is anything else, why are you going to invest so much into something that's really just a background piece of infrastructure? What's your plan for retaining the staff that know how the monitoring system works, and know your storage system in sufficient detail to be able to understand all the things it's checking, etc?
If you really have the expertise on-hand to implement such a thing in a way that you're comfortable relying on, why on earth wouldn't you use them for something more productive that will actually make your business money? Again, if your business is monitoring storage infrastructure, it makes sense. If your business is anything else, why are you spending the time of highly skilled people to implement something you can easily buy off-the-shelf (i.e. a standard support contract)?
Re:Don't forget to weigh in the cost (Score:3, Insightful)