RAID's Days May Be Numbered 444
storagedude sends in an article claiming that RAID is nearing the end of the line because of soaring rebuild times and the growing risk of data loss. "The concept of parity-based RAID (levels 3, 5 and 6) is now pretty old in technological terms, and the technology's limitations will become pretty clear in the not-too-distant future — and are probably obvious to some users already. In my opinion, RAID-6 is a reliability Band Aid for RAID-5, and going from one parity drive to two is simply delaying the inevitable. The bottom line is this: Disk density has increased far more than performance and hard error rates haven't changed much, creating much greater RAID rebuild times and a much higher risk of data loss. In short, it's a scenario that will eventually require a solution, if not a whole new way of storing and protecting data."
Enlighten me (Score:4, Insightful)
(Certain) RAID (levels) address the issue of potential dataloss due to hardware malfunction. How does moving to an Object-Based Storage Device address this issue better? Actually, I don't see how RAID and OSD are mutually exclusive.
Harddisks, not RAID (Score:5, Insightful)
Now that's a stupid article.
It basically says, you can't read a harddisk more than X times before you get an error on some sector, so RAID is dead. That's a logical nonsequitur. RAID is a generic technology that also applies to flash memory cards, USB sticks, anything you can store data on basically. The base technique says "given this reliability, you can up the reliability if you add some redundancy". There's no link to harddisks other than that that's what they're used for right now.
RAID is here to stay (Score:5, Insightful)
Disclaimer: I work for a storage vendor.
> FTA: The real fix must be based on new technology such as OSD, where the disk knows what is stored on it and only has to read and write the objects being managed, not the whole device
OSD doesn't change anything. The disk has failed. How has OSD helped?
> FTA: or something like declustered RAID
Just skimming that document it seems to claim: only reconstruct data, not white space, and use a parity scheme that limits damage. Enterprise arrays that have native filesystem virtualisation (WAFL for example) already do this. RAID 6 arrays do this.
Lets recap. Physical devices including SSDs will fail. You need to be able to recover from failure. The failure could be as bad as the entire physical device failing, or as bad as a single sector being unreadable. In the former case a RAID reconstruct will recover the data but you'll hit RAID recovery errors due to the raw amount of data that needs to be recovered. Enterprise arrays mitigate the risk of recovery errors by using RAID 6. They could even recover the data from a DR mirrored system as part of the recovery scheme.
And when RAID 6 has a high enough risk that it's worth expanding the scheme everyone will start switching from double parity schemes to triple parity schemes since their much less expensive in terms of spindle count than RAID 6+1.
One assumption is, at some point in the future, reconstructions will be a continual occurring background task just like any other background task that enterprise arrays handle. As long as there is enough resiliency and performance isn't impacted then it doesn't matter if a disk is being rebuilt.
I thought RAID was about spindle count (Score:5, Insightful)
I admit I'm not an expert, but I was under the impression that RAID was mainly about ensuring you a large number of spindles and some redundancy so you can serve data quickly even if a couple of drives fail while the servers are under pressure. Surely you would not rely on a RAID to avoid data loss since you should be keeping external backups anyway?
Re:Hardware RAID is dead (Score:5, Insightful)
First of all, "Hardware RAID" is still software, just executed by dedicated circuits. The distinction is kind of moot. For low-cost, low performance systems, software can run on your main box to perform this task, but for high-end applications you'll want dedicated hardware to take care of it, so your machine can do what it needs to do with more zeal.
So my guess is that you're not working for a storage vendor. I haven't seen many people switch to SW RAID recently. If anything, the Unix world is finally crawling out of its "lvm striping" hole. Most servers anywhere are running on stuff like HP's Proliants, and I don't see customers ship back the SmartArray controllers.
Wrong assumptions (Score:5, Insightful)
The article assumes that when within a RAID5 array a drive encounters a single sector failure (the most common failure scenario), an entire disk has to go offline, be replaced and rebuilt.
That is utter nonsense, of course. All that's needed is to rebuild a single affected stripe of the array to a spare disk. (You do have spares in your RAID setups, right?)
As soon as the single stripe is rebuilt, the whole array is again in a fully redundant state again - although the redundancy is spread across the drive with a bad sector and the spare.
Even better, modern drives have internal sector remapping tables and when a bad sector occurs, all the array has to do is to read the other disks, calculate the sector, and WRITE it back to the FAILED drive.
The drive will remap the sector, replace it with a good one, and tada, we have a well working array again. In fact, this is exactly what Linux's MD RAID5 driver does, so it's not just a theory.
Catastrophic whole-drive failures (head crash, etc) do happen, too. And there the article would have a point - you need to rebuild the whole array. But then - these are by a couple orders of magnitude less frequent than simple data errors. So no reason to worry again.
*sigh*
Re:simple idea (Score:5, Insightful)
Re:Wrong assumptions (Score:2, Insightful)
Even if only a sector in a disk has failed, I'd mark the entire disk as failed and replace it as soon as I could. Maybe I'm paranoid, but I've seen many times that when something starts to fail, it continues failing at increasing speed.
Re:Bogus outdated thinking (Score:5, Insightful)
I admit I haven't RTFA, but I don't quite get your statement of "And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.", I can't see how an SSD is a replacement for a raid-5 array. Everyone I know who uses a raid-5 uses it for large amounts of storage with a basic level of protection against data loss. I could justify replacing a raid-0 set up with a SSD.
That said I definitely couldn't afford an SSD that would be able to replace the raid-5 in my pc (4x500GB usable space of 1.34TB), the largest SSD listed on ebuyer.com are 250GB @ £360 each, I would need 8 to match my raid 5 setup which is £2880 which is probably enough to build 2 reasonable machines both with a 1.34TB raid-5 using normal HDDs.
Re:Bogus outdated thinking (Score:5, Insightful)
And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.
Huh ? That's like saying show me 3 people who have a nice pair of running shoes and I'll show you 3 guys who can't afford a car.
Re:Bogus outdated thinking (Score:2, Insightful)
I admit I haven't RTFA, but I don't quite get your statement of "And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.", I can't see how an SSD is a replacement for a raid-5 array. Everyone I know who uses a raid-5 uses it for large amounts of storage with a basic level of protection against data loss........
I hope your not mixing up Raid with a backup.
Raid when used for protecting your computer will not protect your data it just makes your system able to tolerate hard drive failure.
Re:Ask what does Google do (Score:3, Insightful)
A search engine doesn't mind losing data, most of the storage is essentially just a cache or summary of the internet and can be regenerated. That said, Google already have so many mirrors for performance reasons that actual data loss is practically impossible.
Re:ZFS (Score:1, Insightful)
But does on run on Linux?
I wish someone would just make a friggin kernel patch to add real ZFS support to Linux. You can't distribute pre-built Linux kernels with ZFS support due to licensing issues, BUT you could distribute a kernel patch that we can then apply to our kernels and compile ourselves and everything would be OK legally as long as you don't redistribute the patched binaries.
Re:Solved a Long Time Ago (Score:2, Insightful)
Re:simple idea (Score:5, Insightful)
Enterprise arrays are also very VERY different from what most people know as RAID. Smart controllers, smart drive cages, drives that are a magnitude better than the consumer grade garbage.
The Summary talks about how speed has not kept up with capacity, Yes that is correct in the low grade consumer junk. Enterprise server class RAID drives are a different story. The 15,000 RPM drives I have in my RAID 50 array here on the Database server are insanely fast. Plus server class drives are not silly unstable capacities like 1Tb or 1.5Tb they area "OMG small" 300gb size but are stable as a rock.
So I guess the question is, Is the summary talking about RAID on junk drives or RAID on real drives?
Re:RAID is here to stay (Score:3, Insightful)
Even this doesn't handle the other side of the scenario...
Buy your box of drives and put them in a RAID-6. Chances are you just bought all of the drives at the same time, from the same vendor, and they're probably all the same model of the same brand. Chances are also very good that they're from the same manufacturing lot. You've got N "identical" drives. Install them all into your drive enclosure, power the whole thing up, build your RAID-6, put it into service.
Now all of your "identical" drives are running off of the same power supply, getting the same voltage. There's likely to be some temperature gradient inside the box, but overall they're all at similar temperatures. They have the same number of POH, the same number of read requests, same number of write requests. In essence, they remain very nearly "identical" through their service life.
Next, let one drive fail. What are your chances of having a second drive failure, especially when you power the RAID down to replace the first failing drive?
That's what I've heard some anecdotal evidence from, from those who manage this type of thing where I work. RAIDs tend not to have single-drive failures, or at least tend to have "time clustered" drive failures. Plan for it.
Re:Harddisks, not RAID (Score:3, Insightful)
RAID is here to stay for a while no doubt, but it's a response to a series of problems that has problems of it's own. You can take 5+1 drives make an array where one bad chassis slot can indeed take the whole thing out, or you make a bunch of mirrors at the expense of capacity, or you can stripe one scary large fragile volume.In production it's about performance & availability. Realize that the whole data integrity thing is relative and merely an illusion. It's kinda like on Futurama when they had the tanker with 1k hulls. The only solution to the first case is double the hardware, which is a major investment and recurring cost (rack space/electricity, stamps). Murphy's law tell's us that indeed "shit happens", so there are no guarantees.
Although I didn't read the article I suspect it's promoting the cloud paradigm, which is the current ultimate expression of redundancy.
Re:Bogus outdated thinking (Score:5, Insightful)
Raid when used for protecting your computer will not protect your data it just makes your system able to tolerate hard drive failure.
... Which will protect my data when a drive fails.
RAID-5 means that I can have 3x500GB drives with 1GB of space, and not have the same worry (total loss of data) that I would if a 1x1TB drive failed.
We know it doesn't replace backup. We know it doesn't protect against theft, fire, malicious data destruction etc etc. You do realise who you're talking to, don't you? This is an IT article on Slashdot. Telling people on this thread that RAID isn't a replacement for regular backups is like telling a mechanic that a stick of celery is not a suitable replacement for a piston.
Re:Worked-around a Long Time Ago (Score:3, Insightful)
Re:Bogus outdated thinking (Score:4, Insightful)
Solved, but at a price (Score:2, Insightful)
You are absolutely correct in that the mainframe world has dealt with all of the modern recovery issues. But think of the actual USE of storage these days. What used to be a colossal database is now just a bunch of a bunch of home videos from my camcorder. Not only has the cost of storage dropped to nearly nothing, the threshold for using it has dropped even lower. I'm perfectly willing to commit a few megabytes every time I push the button on my digital camera. I remember college, where my mainframe disk quota was a mere 256K.
Today's challenge is to get mainframe-class recovery without bringing back to mainframe-style prices. Some of this is controlled by the way we USE data storage. And then there is all the "savings" we get from server consolidation. Everything we do to consolidate just makes storage management a bigger headache. The trick is to evolve not just the low-level, "invisible" management of storage, but the high level applications as well. If I don't truly NEED to have 10TB on a single mount point, perhaps I should have multiple volumes, distribute my storage, and find a way to be happy with twenty 500GB volumes instead. The easiest way to avoid the recovery time of a 10TB RAID set is to not build one.
I was in mainframe IT long before RAID was commonplace. We commonly faced limits of 450MB on indexed files, because that's as much as you could get from a hard drive back in the early 1980's. Modern Oracle DBAs must be scratching their heads at all of the tablespace management options that seem so redundant when you have RAID storage. This was the pre-RAID method of storage management, in which database container files could be of any size, mounted anywhere, and utilized in all sorts of creative ways to circumvent the hardware limitations of storage in those days. Today, it represents little more than an opportunity to inadvertently bring out the worst of both worlds by setting up these two storage methodologies in conflict with each other.
Re:RAID concept is fine, it's that HDs are too big (Score:3, Insightful)
"The fundamental problem here isn't the RAID concept, is that the throughput and access times of spinning rust haven't changed much in 30 years."
Uh, there's another bigger problem. The drive error rate (when reading data) hasn't changed that much either while data on a drive has dramatically increased.
When doing a rebuild when you've lost all redundancy a single read error means the rebuild will fail. Increase the size of a drive (while keeping error rates constant) and you increase the likelihood of a rebuild failure.
Re:Harddisks, not RAID (Score:2, Insightful)
Re:Bogus outdated thinking (Score:5, Insightful)
That's why I prefer software RAID.
SATA vs FC/SAS: grapes and oranges (Score:4, Insightful)
The chart he's using goes from SCSI, to fiberchannel, to SAS... to SATA. When you go from professional/server interfaces to hobby/desktop ones, of course the rebuild time skyrockets. If you did this article a few years ago and slid ATA in as the last data point instead of fiberchannel, you'd be seeing the knee showing up then instead of now. How about looking at 2010 and doing the calculations with 6 Gb SAS interconnect and 3 Gb drives, instead of 1.5 Gb SATA and 1 Gb drives?
Re:simple idea (Score:4, Insightful)
You'd need a whole new way of keeping the head off the platter. You'd have a problem with lubricants vaporizing. Heat would be a problem as well.
Re:Worked-around a Long Time Ago (Score:5, Insightful)
Consider Google Docs.
If you have so much data that you're likely to encounter an error when rebuilding your RAID array, I don't think Google Docs is going to cut it.
Re:Dear Seagate, Western Digital, et. al: (Score:3, Insightful)
Without spindle redundancy...
or logic element redundancy...
or power supply redundancy...
or cable interconnect redundancy...
add to that the cost of adding dedicated RAID hardware to every single drive (that's an expensive PLD), and it's no wonder it's not on the market. High cost - no return.
Re:RAID is here to stay (Score:3, Insightful)
Basically you are suggesting someone would make and then sell a disk which could only be read, entirely, 10 times in it's entire life time?
Well that's easily solved. We won't buy those disks.
Re:Worked-around a Long Time Ago (Score:3, Insightful)
Re:fill the drive with helium (Score:3, Insightful)
there are these things called filters.
They work pretty well.