Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

RAID's Days May Be Numbered 444

Posted by kdawson on Friday September 18, 2009 @05:15AM from the time-to-try-flit dept.

storagedude sends in an article claiming that RAID is nearing the end of the line because of soaring rebuild times and the growing risk of data loss. "The concept of parity-based RAID (levels 3, 5 and 6) is now pretty old in technological terms, and the technology's limitations will become pretty clear in the not-too-distant future — and are probably obvious to some users already. In my opinion, RAID-6 is a reliability Band Aid for RAID-5, and going from one parity drive to two is simply delaying the inevitable. The bottom line is this: Disk density has increased far more than performance and hard error rates haven't changed much, creating much greater RAID rebuild times and a much higher risk of data loss. In short, it's a scenario that will eventually require a solution, if not a whole new way of storing and protecting data."

This discussion has been archived. No new comments can be posted.

RAID's Days May Be Numbered

Search 444 Comments Log In/Create an Account

Comments Filter:

simple idea (Score:3, Interesting)

by shentino ( 1139071 ) writes: <shentino@gmail.com> on Friday September 18, 2009 @05:16AM (#29463913)

Don't consider an entire drive is dead if you get a piddly one-sector error.
Just mark it read only and keep chugging.

Share
twitter facebook
Bogus outdated thinking (Score:5, Interesting)

by twisteddk ( 201366 ) writes: on Friday September 18, 2009 @05:28AM (#29463967)

The author says it himself in the article:
"And running software RAID-5 or RAID-6 equivalent does not address the underlying issues with the drive. Yes, you could mirror to get out of the disk reliability penalty box, but that does not address the cost issue."
but he hasn't adressed the fact that today you get 100 times as much diskspace for the same cost as you did 10 years ago when cost was a factor. In real life cost isn't a factor when it comes to datastorage, simply because it's really low in real life projects, as compared to the other costs in a project requiring storage. So if you want the reliability you go get a mirror. Drivespace is dirt cheap.
As for the rebuildtimes, fine, go buy FASTER drives. I dont see the problem. HP and many other vendors have long been trying to sell combined raid soltions (like the EVA) where you mix high storage with high performance drives (like SSD vs. SATA).
The only real argument for the validity of this article is the personal use of drives/storage. And name 3 people you know who run raid-5 on their personal PCs, and I'll show you 3 guys who can't afford an SSD drive.

Share
twitter facebook
There are always more solutions... (Score:1, Interesting)

by Anonymous Coward writes: on Friday September 18, 2009 @05:37AM (#29464009)

Probably the next meta solution after RAID 6 will be something like ZFS, where the filesystem that works not just on the fs-specific layer, but on the LVM layer so it can log CRCs of files and immediately be able to tell if a file got corrupted (and perhaps fix it with some ECC records.) One can see a filesystem not just writing a RAID layer, but taking recovery data and storing that away as filesystem metadata.
Of course, there is always doing redundant arrays of RAID clusters, say three groups, two data, one parity, or mirroring RAID 5 volumes. You have the usual tradeoffs: The more fancy the RAID scheme, the more disks you need, and the more computing you have to do for every bit thrown at and read off the array.
Long term solution? A move to something other than magnetic storage. This could be optical, it could be SSD if some advance allows very large density increases, or something unknown. The technology would have to have a rate of failure magnitudes better than magnetic, as well as a cost on par with magnetic for it to completely work. Holographic storage has languished for a while, perhaps as the technology improves for that, we may see drives using 3D blocks of that replacing the old fashioned spindles.

Share
twitter facebook
Hardware RAID is dead (Score:3, Interesting)

by PiSkyHi ( 1049584 ) writes: on Friday September 18, 2009 @05:46AM (#29464049)

Hardware RAID is dead - software for redundant storage is just getting started. I am looking forward to making use of btrfs so I can have some consistency and confidence to how I deal with any ultimately disposable storage component.
The ZFS folks have been doing it fine for some time now.
Hardware RAID controllers have no place in modern storage arrays - except those forced to run Windows

Share
twitter facebook
Non-issue ... (Score:4, Interesting)

by Lazy Jones ( 8403 ) writes: on Friday September 18, 2009 @05:47AM (#29464053) Homepage Journal

Modern RAID arrays show no dramatic performance degradation while rebuilding, also with RAID-50/RAID-60 arrays, only a fraction of the disk accesses is slower than usually when a single drive is replaced.
For enterprise level storage systems, this is also a non-issue because of thin provisioning.

Share
twitter facebook
If you want smaller drives... (Score:5, Interesting)

by asdf7890 ( 1518587 ) writes: on Friday September 18, 2009 @06:05AM (#29464133)

If you want smaller drives to speed up rebuild times then, erm, buy smaller drives? You can get ~70Gb 10Krpm and 15Krpm drives fairly readily - much smaller than the 500-to-2000-Gb monsters and faster too. You can still buy ~80Gb PATA drives too, I've seen them when shopping for larger models, though you only save a couple of peanuts compared to the cost of 250+Gb units.
If you can't afford those but still don't want 500+Gb drives because they take too long to rebuild if the array is compromised and needs a rebuild, and management won't let you buy bog standard 160Gb (or smaller) drives as they only cost 20% less than 750Gb units without the speed benefits of the high cost 15Krpm ones, how about using software RAID and only using the first part of the drive? Easily done with Linux's software RAID (partition the drives with a single 100Gb (for example) partition, and RAID that instead of the full drive) and I'm sure just as easy with other OSs. You'll get speed bonuses too: you'll be using the fastest part of the drive in terms of bulk transfer speed (most spinning drives are arranged such that the earlier tracks have higher data density) and you'll have lower latency on average as the heads will never need to move the full diameter of the platter. And you've got the rest of the drive space to expand onto if needed later. Or maybe you could hide your porn stash there.

Share
twitter facebook
ZFS, Anyone? (Score:2, Interesting)

by Tomsk70 ( 984457 ) writes: on Friday September 18, 2009 @06:09AM (#29464145)

I've managed to get this going, using the excellent FreeNAS - although proceed with caution, as only the beta build supports it, and I've already had serious (all data lost) crashes twice.
However the principle is sound, and I'm sure this will become standard before long - the only trouble being that HP, Dell and the like can't simply offer upgrades for existing RAID cards - due to the nature of ZFS, it needs a 'proper' CPU and a gig or two or RAM. Even so, it does protect against many of the problems now besetting RAID (which was never meant to handle modern, gargantuan disk sizes).

Share
twitter facebook
Fountain codes? (Score:3, Interesting)

by andrewagill ( 700624 ) writes: on Friday September 18, 2009 @06:11AM (#29464155) Homepage

What about fountain codes [wikipedia.org]? The coding there is capable of recovering from a greater variety of faults.

Share
twitter facebook
Re:Worked-around a Long Time Ago (Score:5, Interesting)

by Anonymous Coward writes: on Friday September 18, 2009 @06:22AM (#29464213)

But really none of that should be necessary for the general case. Storing data in different physical locations is a good but entirely unrelated issue- the main problem of disk reliability is still very much in need of a solution. That's pretty much the point of the article: You can come up with various solutions which move the problem around, give multiple fallbacks for when something goes wrong.. but there's still the problem of things going wrong in the first place. I shouldn't need to use 12 separate disks spread across the globe just for basic reliability / redundancy

Parent Share
twitter facebook
Old news (Score:2, Interesting)

by EmTeedee ( 948267 ) writes: on Friday September 18, 2009 @06:22AM (#29464217) Journal

Read that before on slashdot. Why RAID 5 Stops Working In 2009 [slashdot.org]

Share
twitter facebook
Parity declustering (Score:5, Interesting)

by Biolo ( 25082 ) writes: on Friday September 18, 2009 @06:37AM (#29464273)

Actually I like the parity declustering idea that was linked to in that article, seems to me if implemented correctly it could mitigate a large part of the issue. I have personally encountered the hard error on RAID5 rebuild issue, twice, so there definitely is a problem to be addressed...and yes, I do now only implement RAID6 as a result.
For those who haven't RTFATFALT (RTFA the f*** article links to), parity declustering, as I understand it, is where you have, say, an 8 drive array, but where each block is written to only a subset of those drives, say 4. Now, obviously you loose 25% of your storage capacity (1/4), but consider a rebuild for a failed disk. In this instance only 50% of your blocks are likely to be on your failed drive, so immediately you cut your rebuild time in half, halving your data reads, and therefore your chance of encountering a hard error. Larger numbers of disks in the array, or spanning your data over fewer drives, cuts this further.
Now, consider the flexibility you could build into an implmentation of this scheme. Simply by allowing the number of drives a block spans to be configurable on a per block basis, you could then allow any filesystem that is on that array to say, on a per file basis, how many disks to span over. You could then allow apps and sysadmins to say that a given file needs to have the maximum write performance, so diskSpan=2, which gives you effectively RAID10 for that file (each block is written to 2 drives, but with multiple blocks in the file is likely to be written to a different pair of drives, not quite RAID10, but close). Where you didn't want a file to consume 2x its size on the storage system, you could allow a higher diskSpan number. You could also allow configurable parity on a per block basis, so particularly important files can survive multiple disk failures, temp files could have no parity. There would need to be a rule however that parity+diskSpan is less than or equal to the number of devices in the array.
Obviously there is an issue here where the total capacity of the array is not knowable, files with diskSpan numbers lower than the default for the array will reduce the capacity, numbers higher will increase it. This alone might require new filesystems, but you could implement todays filesystems on this array as long as you disallowed the per-block diskSpan feature.
This even helps for expanding the array, as there is now no need to re-read all of the data in the array (with the resulting chance of encountering a hard error, adding huge load to the system causing a drive to fail, etc). The extra capacity is simply available. Over time you probably want a redistribution routine to move data from the existing array members to the new members to spread the load and capacity.
How about you implement a performance optimiser too, that looks for the most frequently accessed blocks and ensures they are evenly spread over the disks. If you take into account the performance of the individual disks themselves, you could allow for effectively a hierarchical filesystem, so that one array contains, say, SSD, SAS and SATA drives, and the optimiser ensures that data is allocated to individual drives based on the frequency of access of that data and the performance of the drive. Obviously the applications or sysadmin could indicate to the array which files were more performance sensitive, so influencing the eventual location of the data as it is written.

Share
twitter facebook
Re:Fountain codes? (Score:1, Interesting)

by Anonymous Coward writes: on Friday September 18, 2009 @06:44AM (#29464311)

Why fountain codes ? Any other erasure code http://en.wikipedia.org/wiki/Erasure_codes [wikipedia.org] will do the job. Parity and Reed Solomon codes used in RAID are in fact erasure codes.

Parent Share
twitter facebook
Remembering an article earlier this week: (Score:4, Interesting)

by Chrisq ( 894406 ) writes: on Friday September 18, 2009 @06:44AM (#29464315)

Will scalable distributed storage systems like Hadoop [wikipedia.org] and Google File System take over from RAID?

Share
twitter facebook
RAID concept is fine, it's that HDs are too big (Score:5, Interesting)

by trims ( 10010 ) writes: on Friday September 18, 2009 @06:45AM (#29464323) Homepage

As others have mentioned, this is something that is discussed on the ZFS mailing lists frequently.
For more info there, check out the digest for zfs-discuss@opensolaris.org
and, in particular, check out Richard Elling's blog [sun.com]
(Disclaimer: I work for Sun, but not in the ZFS group)
The fundamental problem here isn't the RAID concept, is that the throughput and access times of spinning rust haven't changed much in 30 years. Fundamentally, today's hard drive is no more than 100 times as fast (both in throughput and latency) than a 1980s one, while it holds well over 1 million times more.
ZFS (and other advanced filesystems) will now do partial reconstruction of a failed drive (that is, they don't have to bit copy the entire drive, only the parts which are used), which helps. But there are still problems. ZFS's pathological case results in rebuild times of 2-3 WEEKS for a 1TB drive in a RAID-Z (similar to RAID-5). It's all due to the horribly small throughput, maximum IOPs, and latency of the hard drive.
SSDs, on the other hand, are no where near the problem. They've got considerably more throughput than a hard drive, and, more importantly, THOUSANDS of times better IOPS. Frankly, more than any other reason, I expect the significant IOPS of the SSD to signal the death knell of HDs in the next decade. By 2020, expect HDs to be gone from everything, even in places where HDs still have better GB/$. The rebuild rates and maintenance of HDs simply can't compete with flash.
Note: IOPS = I/O Per Second, or the number of read/write operations (irregardless of size) which a disk can service. HDs top out around 350, consumer SSDs do under 10,000, and high-end SSDs can do up to 100,000.
-Erik

Share
twitter facebook
doesn't raid 10 solve this? (Score:3, Interesting)

by davros-too ( 987732 ) writes: on Friday September 18, 2009 @07:13AM (#29464441) Homepage

Um, don't schemes like raid 1+0 solve the parity rebuild problem? Even in the worst case of full disk loss, only one disk needs to be rebuilt and even for a large disk that doesn't take very long. Am I missing something?

Share
twitter facebook
Re:RAID is here to stay (Score:3, Interesting)

by paulhar ( 652995 ) writes: on Friday September 18, 2009 @07:20AM (#29464485)

RAID 1 has much less reliability than RAID 6. Assume a typical case: one disk totally fails. You then start to reconstruct - in a RAID 1 scheme a single sector error will result in the rebuild failing. Not great.
In RAID 6 you start the rebuild and you get a single sector error from one of the drives you're rebuilding from. At that point you've got yet another parity scheme available (in the form of the RAID 6 bit) that figures out what that sector should have been and then continues the rebuild. Then you go back and decide what to do about that drive that had the second error.
A lot of drive failures aren't full head crashes or motor errors but just single sector, track, bits of dirt on the platter style errors. Other than the affected area the drive can be read.
With RAID 6 you can fail two disks completely and still access the data. You're still reading from the same ten 10TB disks in your example and if the implementation of RAID 6 is optimal (RAID-DP) you aren't having to read additional data from the same physical disks.
In the world you describe with 10TB drives it sounds like you'd just not be able to use the disks at all since any process that reads from the disks will kill them. There are a few things that could happen:
1. Disks get more reliable. Hasn't happened much yet but...
2. We switch to different packaging. Instead of making disks larger we cram more of them into the same space similar to CPU cores - same MTBF per disk but lots of them presented out by one physical interface.
3. We change technologies completely. SSD (interesting failure modes there too... needs RAID)
I guess we'll find out in only a few years...

Parent Share
twitter facebook
Re:Worked-around a Long Time Ago (Score:5, Interesting)

by plover ( 150551 ) * writes: on Friday September 18, 2009 @07:42AM (#29464609) Homepage Journal

Actually, storing data in a multiple data center / high availability environment is a completely related issue. The summary above talks of "entirely different paradigms." Cloud storage would be multiple data center based, which is entirely different from keeping the only copy on your local drives. In this concept, your machine would have enough OS to boot, and enough hard drive space to download the current version of whatever software you are leasing. Your personal info would always be maintained in the data centers, and only mirrored locally. Have a home failure? Drop in a new part or even a new PC, (possibly with an entirely different operating system, such as Chrome,) connect to the service, and you're 100% back.
It's no longer a novel concept for the home market. Consider Google Docs. It's not even being sold as "safer than RAID", it's being touted as "get it from anywhere" or "share with your friends". Safer than RAID is just a bonus.
So are we ready to move all our personal information to clouds? I certainly am not, but Google Docs are wildly popular and a lot of people are. I long ago learned that I can't look to myself to judge what the mainstream attitudes are in many things.

Parent Share
twitter facebook
RAID 4 has a dedicated parity drive, not 5 (Score:5, Interesting)

by Targon ( 17348 ) writes: on Friday September 18, 2009 @07:43AM (#29464615)

RAID 4 is where you have one dedicated parity drive. RAID 5 solves this by spreading the parity information for each drive to all the other drives in the array. RAID 6 adds a second parity block for increased reliability, but as a result of the increased write for that extra parity block, it slows down write speeds.
The real key to making RAID 4, 5, or 6 work is that you really need 4-6 drives in the array to take advantage of the design. I wouldn't say that it will fall out of favor though, because having solid protection from a single drive going bad really is critical for many businesses. Backups are all well and good for if your system crashes, but for most businesses, uptimes are more critical yet. So, backups for data so corruption problems can be rolled back, and RAID 5,6,10 for stability and to avoid having the entire system die if one drive goes bad. What takes more time, doing a data restore from a backup for when an individual application has problems, or having to restore the entire system from a backup, with the potential that the backup itself was corrupted?
With that said, web farms and other applications can get away with just using a cluster approach instead of a single well designed machine(or set of machines) have become popular, but there are many situations which make a system with one or more RAID arrays a better choice. The focus on RAID 0 and 1 for SMALL systems and residential setups has simply kept many people from realizing how useful a 4-drive RAID 5 setup would be.
Then again, most people go to a backup when they screw up their system, not because of a hard drive failure. With techs upgrading hardware before they run into a hard drive failure, the need for RAID 1, 4, 5, and 6 has dropped.
I will say this, since a RAID 5 array can rebuild on the fly(since it keeps working even if one drive fails), the rebuild time itself does not significantly impact system availability. Gone are the days when a rebuild has to be done while the system is down.

Share
twitter facebook
RAID6 with enterprise hardware is reliable (Score:3, Interesting)

by niola ( 74324 ) writes: <jon@niola.net> on Friday September 18, 2009 @07:56AM (#29464713) Homepage

I use RAID6 for several high-volume machines at work. Having double parity plus a hot spare means rebuild time is no worry.
But if you are not a fan you can always throw something together with ZFS's RAIDZ or RAIDZ2 which is also distributed parity but the ZFS filesystem checksums and keeps multiple (distributed) copies of every block to detect and fix data corruption before it becomes a bigger problem.
People using ZFS have been able to detect silent data corruption from a faulty power supply that other solutions would never have found just because of the checksumming process.

Share
twitter facebook
Re:RAID is here to stay (Score:1, Interesting)

by Anonymous Coward writes: on Friday September 18, 2009 @08:23AM (#29464879)

Imagine a disk fails on every 100TB of reads (10^14). You have ten 1TB data disks. Imagine you keep them in perfect rotation so they've spent 10, 20, 30, 40, 50, 60, 70, 80, 90 and 100% of their lifetime. The last disk dies and you replace it with a new drive (0%). To rebuild the drive you read 1TB from each data disk and use whatever parity you need. They've now spent 11, 21, 31, 41, 51, 61, 71, 81, 91 and 1% (your new disk) of their lifetime and you can read another 9TB before you need a new disk.
Except that it doesn't work anywhere NEAR like that. The lifetime of disks are much, much greater than the read cost to rebuild a failed drive. You're definitely not spending 1% of its total lifetime. You're not even spending 1/1000th of that 1%. You couldn't measure the difference between having to rebuild that drive and just using the disk as a new one.
The vast majority of hard disk failures are manufacturing issues, not end-of-lifetime for an average drive issue. You don't have a raid system because the average lifetime of a harddisk is small, you get a raid system because of the outliers. Every once in a while, a manufacturer puts out a deathstar, and it's going to fail a month after you put it in. At the same time, you'll have disks in there that are going to keep chugging away for five years straight, and you'll eventually replace them because you want a bigger disk, not because they've failed.

Parent Share
twitter facebook
I'm not sure I get it (Score:3, Interesting)

by Joce640k ( 829181 ) writes: on Friday September 18, 2009 @08:27AM (#29464913) Homepage

Is he saying that you can never read a whole hard disk because it will fail before you get to the end?
That's what it seems like he's saying but my hard disks usually last for years of continuous so I'm not sure it's true.

Share
twitter facebook
Re:simple idea (Score:3, Interesting)

by Coren22 ( 1625475 ) writes: on Friday September 18, 2009 @08:49AM (#29465119) Journal

They aren't talking about drive speeds as much as failure rate:
The bottom line is this: Disk density has increased far more than performance and hard error rates haven't changed much, creating much greater RAID rebuild times and a much higher risk of data loss.
They are talking about the MTBF of drives has not gone up as fast as the capacity, and the fact that a missed write is actually quite likely with a modern high capacity drive. Even saying drive speeds haven't gone up is very accurate, 15k RPM drives have been around for quite a while now, at least for 10 years, and there has not been an improvement in speed in that time. Where are my 30k RPM drives?~
Also, I have a bit of a problem with your statement about OMG small enterprise drives. Enterprise drives have caught up to consumer drives in size, you can now buy 1TB SAS drives; they are just OMG expensive compared to the consumer drives.

Parent Share
twitter facebook
Re:simple idea (Score:4, Interesting)

by alva_edison ( 630431 ) writes: <ThAlEdison.gmail@com> on Friday September 18, 2009 @08:55AM (#29465157)

The problem becomes space in the data center. I don't know about you, but we're trying to cram Petabytes into existing computer rooms and coming up short. Plus you don't address Tier 2 or Tier 3 storage which tends to be on SATA or near-line SAS both of which have the ridiculous size problem. Calling 15,000 RPM fast in the datacenter is also misleading because those are the speeds we've been at for a few years now, 10GB iSCSI (or FCoE, which bypasses the collison problem) is about to render that untenable. The current solution tends toward storage virtualization (in this case virtualization means excessive amounts of high-speed cache in front of controllers and less control on where controllers allocate space). The future is most likely some kind of grid technology (like XIV from IBM). Where any blcok is on two random drives in the array, and only the controller knows where. This means that drive rebuilds become subject to swarm speeds (since there is an equal chance that it is pulling data from every other drive in the tower).

Parent Share
twitter facebook
Re:simple idea (Score:5, Interesting)

by paulhar ( 652995 ) writes: on Friday September 18, 2009 @09:02AM (#29465215)

You're not likely to see 30k RPM drives any time soon. The speed of a 15k drive means that the outer edge of the 3 1/2" drive is spinning pretty fast... getting close to the speed of sound and the lions share of power consumed by 15k drives is consumed in counteracting the air buffeting the heads. With 2 1/2" drives we could go faster but while drives are open to the air it's not likely we'll see much in the short term.
It's why CDROM speeds haven't gone up much since the old day of 52x.
As areal density improves the drives will be able to push out more raw MB/sec just like DVD is better than CD, but in terms of IOPs it's not likely to dramatically improve.

Parent Share
twitter facebook
Dear Seagate, Western Digital, et. al: (Score:5, Interesting)

by ThreeGigs ( 239452 ) writes: on Friday September 18, 2009 @09:09AM (#29465255)

Here's what I want, folks:
A 5.25 inch device with 5 double-sided platters running at 5400 RPM. Basically the same size as a desktop CD/DVD drive, ala Quantum Bigfoot.
I want 8 sides of the platters dedicated to data, and the other two sides dedicated to parity (or one parity and the other servo), essentially a self-contained RAID on a single disk.
I want all data heads to write and read simultaneously, in Parallel. The idea is to have 64 byte sectors on each platter which are recombined into a 512-byte result. 8 heads writing and reading in paralell means HUGE throughput for sequential operations.
It's RAID 5 or 6 on a single disk, although without spindle redundancy.
And I also want a high-performance option: 2 sets of read/write heads 180 degrees apart, which effectively would cut seek times in half, making the drive perform more like a 10k RPM drive. With current densities, that's 12 TB in the volume of a DVD drive. It solves speed, sector error recovery and capacity issues. The only thing missing is a data bus that can handle the throughput.

Share
twitter facebook
Re:RAID is here to stay (Score:3, Interesting)

by maraist ( 68387 ) * writes: <michael.maraistN ... gmail.n0spam.com> on Friday September 18, 2009 @09:10AM (#29465281) Homepage

I don't understand what your failure rate strategy is. First of all, there's no such thing as saying you are 90% or 10% of the way through a disk's life.. It's a probability distribution, who's probability is dramatically effected by the current events (and somewhat related to historical events). A drive might be at a 0.00005% probability of failure at any given moment, but then a large sustained read occurs which adjusts the heat and causes voltage fluctuations , so now you're operating at 0.001% probability.

Then a drive dies in hot-swap-mode, a drive spins down, then another spins up, this has massive voltage fluctuations as well as slight tension on the cabling which causes reflections in the wiring which increases your probability of failure to say 0.02%. (I'm totally making up numbers, but the trends are what's important).

So the act of powering down/up or hot-swaping intrinsically increases the probability of co-disk-failures, unless you have a very expensive system with separate AC/DC converters (e.g. fully decoupled) and obviously isolated frames, heat-compartments, etc.

BUT, you can mitigate this by having 3+-way redundancy (RAID-1; I honestly don't understand the point of using slower RAID-5 / RAID-6 anymore). So when one drive fails, you have addressed the probability of a second failure. There is a geometric reduction in probability that 3 or 4 or 5 simultaneous drives fail. Meaning even at the peek risky part of the drive-swap operation, if you have say 2% probability that another drive will fail, then there is 0.004% probability that two drives will fail simultaneously. 0.0008% that three fail, etc.

This isn't strictly correct, of course, because the probabilities are not fully independent. You have many common components, and thus their probabilities are intertwined. But sufficient to say the probabilities are less.

Now I say 3+way RAID-1 because it may be silly to swap out a single drive when one goes bad. The process I would recommend (if you have a sufficiently advanced RAID controller, and non-super-expensive disks), is this:

5-way RAID-1 with 2 powered down disks (thus effectively 3-way RAID-1)
On a drive failure, power up the two disks and initiate their syncing.
Swap out the error'd drive, and and initiate it's syncing.

For a brief-while, you have 2 valid, 2 semi-valid, and 1 semi-semi-valid drive.

As the drives sync-up(may take over 24 hours), power-down the original remaining 2 and remove them.

Recycle the good disks into JBOH (Just a bunch of hardware) clustering. Meaning boot-disks / log-file disks in say RAID-1, swapping out the oldest drive.

You can either buy several 4-way/5-way RAID controllers, or get a single 15-disk RAID controller for under than $1k. This allows you to have multiple logical volumes and share the 'spun-down disks', So now you're really only using 3 disks per logical-volume, though having two logical volumes with bad disks does reduce your ideal reliability somewhat. But this gives you 4 volumes which can be combined into RAID-10. You could build such a system for under $6k with various mixtures of high-end and low-end disks (for different partition requirements, boot/OS/linear-logging (RAID-1), random-write-data (RAID-10)).

If the data is super critical, use a block-level master-slave replication. Ideally your application supports direct master-slave or better yet, multi-master.

And if you're JBOH (Just a Bunch Of Hardware) clustering, then trivial RAID-1 with 2 or 3 disks (in software-raid) is all you need. Note, I use 3-disk RAID10 on my home linux machine, (that plus DVD drive fills up my IDE slots) - pretty clever technique. Yes I know virtually all MB's have hardware RAID these days, but unless they've got an extra 4Gig of buffer-RAM in them, they're pointless in my opinion, plus they're non-portable (screw transparent windows support, you can't distinguish disk errors from forced reboots anyway).

Parent Share
twitter facebook
Re:RAID is here to stay (Score:3, Interesting)

by LWATCDR ( 28044 ) writes: on Friday September 18, 2009 @09:21AM (#29465399) Homepage Journal

Well the logical thing IMHO is after the first year you put in a new drive and do an array rebuild after making a backup.
Drives are really cheap and I would do that for as long as the array is in use.
Reuse the old drives in desktops if they are SATA.
Not perfect but it keeps you from having an array of old drives in your server.

Parent Share
twitter facebook
Comment removed (Score:3, Interesting)

by account_deleted ( 4530225 ) writes: on Friday September 18, 2009 @09:26AM (#29465457)

Comment removed based on user account deletion

Share
twitter facebook
Re:Bogus outdated thinking (Score:3, Interesting)

by Svartalf ( 2997 ) writes: on Friday September 18, 2009 @09:37AM (#29465549) Homepage

RAID5 is not backup. It's resilience for bringing the whole system down with a failure.
RAID was originally developed to make what we consider small storage capacities (then massive) affordable and reasonably reliable.
You're using RAID5 in it's "intended" use- but an SSD of the same capacity will be inherently MORE reliable (by a factor of how many of those magnetic disks you remove) than your system design right now.
From personal experience with a system customer base of literally thousands of enterprise class servers spread out over many companies, RAID doesn't work QUITE the way people make it out to be. We're ripping it out of the equipment and reverting to warm backups instead- the RAID1 design they fielded made the servers unstable.
The field engineer crowd (one of my friends worked with Nortel in the field engineer group and my brother is a manager for outsource company doing a lot of the same work with the same customers...) HATES RAID.
Blow a controller? Better hope you have an identical one in stock. You can't just swap out a differing controller of the same brand or pop a different brand in- they all do things ever so slightly differently on the disks.
Blow a disk? Better hope you can get the new drive in there and integrate it properly before you lose another.
Disks don't have the reliability we once thought they had.
RAID doesn't do what most people thinks it does for them.

Parent Share
twitter facebook
Re:simple idea (Score:3, Interesting)

by denis-The-menace ( 471988 ) writes: on Friday September 18, 2009 @09:37AM (#29465553)

Why not add multiple heads to the same platter?
Keep the disk spinning at 15K but add heads with their own actuator and everything. One could read only the other write only. Whatever makes sense.

Parent Share
twitter facebook
Re:simple idea (Score:5, Interesting)

by Firethorn ( 177587 ) writes: on Friday September 18, 2009 @10:15AM (#29466017) Homepage Journal

Even partial evacuation would help, but you run into the problem that the read heads are designed to use the air to keep them from contacting the platters, so you'd need to replace that effect somehow.
The Space shuttle and ISS even have special sensors to shut the hard drives down if the air pressure goes too low. Reading about which was how I found out that hard drives are designed to use air.
Not to mention that you're now trying to build an air tight container, but if you're looking at ultra-high performance drives that's less of an issue.
Still, you have to look at how much such a drive would cost, and whether the cost would ever be repaid - if I was looking at investing in such technology I'd be concerned that Flash would outpace my vacuum drives before I got them released. Even if I DO manage to find a niche, would the niche last long enough against flash memory that's getting faster and cheaper so quickly?
For certain data sets and access patterns, flash is already much cheaper than the old raid options - the best example I saw was a dataset of a few hundred gigabytes that was mostly read-only, but accessed so much so randomly they had to mirror it on 10 hard drives to meet the read demands. One professional level SSD performed BETTER, while costing less than half of the setup.

Parent Share
twitter facebook
fill the drive with helium (Score:4, Interesting)

by speedtux ( 1307149 ) writes: on Friday September 18, 2009 @10:29AM (#29466203)

Filling the drive with helium should help; the speed of sound in helium is 3x higher than in air, and it offers less resistance.
(Hydrogen would be even better, but it has a tendency to interact with metals in unfortunate ways.)

Parent Share
twitter facebook
Re:simple idea (Score:4, Interesting)

by Gothmolly ( 148874 ) writes: on Friday September 18, 2009 @10:37AM (#29466303)

Can you say "instantaneous heat death" ? Vacuum is an excellent insulator.

Parent Share
twitter facebook
Re:Dear Seagate, Western Digital, et. al: (Score:3, Interesting)

by adisakp ( 705706 ) writes: on Friday September 18, 2009 @02:24PM (#29469437) Journal

I want 8 sides of the platters dedicated to data
More platters == more mass. Which translates to more power required for the motor, higher energy usage and much more heat generated by the drive. Generating more heat == quicker hardware failures. Also with bigger / larger / more platters, it's much harder to spin the platters faster. Usually more platters == slower RPM drive speed and much slower seek rates. If you can do fewer, smaller, and lighter platters, you can make the drive spin faster and perform better -- this is exactly what the Velociraptor does with it's high RPM 2.5" format.

Also, using only one side of the platter is often faster and more reliable because the head arm weighs less (1/2 the heads) so they don't have as much mass to impede fast seeking or to cause vibration. Plus you don't have to worry about the alignment on both sides of the platter. This is one reason why the highest speed drives do not necessarily even use both sides of the platter.
It's RAID 5 or 6 on a single disk, although without spindle redundancy.
No it's not... what happens if the control electronics fail, the arm actuator, or the spindle motor? RAID 5/6 have whole disk redundancy. You just have data redundancy on the platters - not full hardware redundancy. Also, all this extra components you want to add to the drive will just make it more complicated and have more points of failure so the drives will actually fail earlier.
And I also want a high-performance option: 2 sets of read/write heads 180 degrees apart, which effectively would cut seek times in half, making the drive perform more like a 10k RPM drive.
Except that it would slow the drive down by making calibration harder and slower. Moving the head arms causes vibration and movement in the drive. One arm would not be able to reliably read while the other was moving unless the drive was spinning slower to begin with.

The moral of your story is that you have some interesting ideas, but believe it or not, most of them have already been tried and rejected well before coming to market because they weren't feasible or reliable or didn't actually result in performance improvements in a cost effective manner.

Parent Share
twitter facebook
Re:simple idea (Score:3, Interesting)

by pjr.cc ( 760528 ) writes: on Friday September 18, 2009 @02:50PM (#29469785)

Unfortunately all that is quite a myth for the most part.
Having worked in storage for a aeons the reality is that the difference between enterprise and "consumer grade rubbish" has very little to do anything but tollerance. If you picked up a 300G 10k enterprise drive and compared it to the consumer grade rubbish you'd find nothing different. It used to be the case, way back when, that they were very different but because consumer grade drives have gotten so much better its just not worth the expense of building the same drive for enterprise as for consumers with slightly different specs. What is different is the acceptable tollerances, when a platter comes off the line if its within 2% of its manufacturing tollerances its ok to use for entperise and if its higher they throw it into consumer. The reality is that most drives are in that "better than 2% tollerance" range and that is simply because the processes to make them have gotten so good over the years. The point is that when you hit your magic tollerance number, the drive is capable of 100% duty cycle.
So essentially, the difference between "consumer" and "enterprise" when it comes to the casing, the platters, the heads and the motors is zero. There are alot of different spec drives out there today ranging from 146gb (typically the smallest you'll find these days) all the way to 2gb with speeds form 7200 to 15000 rpm and enterprise is the only place that uses all of them, but they still come off the same manufacturing line. The drivers behind it all come down to the consumer itself, in enterprise its often about performance, and with consumers its about size. Very conveniently building bigger consumer grade drives typically means improving the performance of a drive in ways that scale straight back to the enterprise. Sure, you wont see many users throwing around 15k rpm drives, but thats more because its unnecessary.
So why is it that in the mid-to-low server range do we find 300gb 15k drives? Because its a cheap way of getting performance - and that is fairly important at that end of the market where servers need to be cheap and theres alot of competition (you know, 1-2ru with 4-8 drives and a raid card, no san).
So what else differs between the two? Interface. In the mid-to-low server range we start talking SAS and this is more to do with being able to talk to several drives at once (Again not something alot of consumers do other than with usb drives perhaps). The SAS interface is quite brilliant cause it can scale quite well to a larger number of drives than can SATA and does it very cheaply. It also takes alot of load off the server when it comes to processing data transfer (for a large number of drives). But in that same space you WILL find sata drives going up to 2tb (often servers lag consumers in size simply because of certification, not because of anything to do with stability). To call a 1tb drive unstable is rather silly in reality.
Now the BIG end of town - SAN's. These days in most SAN's you'll find a mix of SATA and Fibre channel (some do do SAS as well, but its uncommon though its changing). In the SAN end of town (the big boy game) you'll see it all. 7.2k rpm 2tb SATA's sitting in the same array along side 146g 15k RPM fibre channel and its all about trading off storage density/cost to performance. Consider this: 10 1tb sata drives can consume (easily) a 8gbps FC interface - OUCH! Now alot of SAN arrays start at around 4 FC intercaes and go up to maybe 16, but they'll be supporting literally thousands of drives. Alot of the SAN industry realised some time ago that throwing 2tb SATA's into an array made alot of sense because SAN interfaces have grown very slowly in terms of throughput and single HD interfaces have grown very quickly. There are even several very popular arrays that only do SATA and that was the driver behind "enterprise" grade large-storage drives (i.e. entperise grade 1tb+ sata drives). At the server you still get the fibre channel performance. The critical difference is that the array does more work
Read the rest of this comment...

Parent Share
twitter facebook
Re:Parent is spot on (Score:1, Interesting)

by Anonymous Coward writes: on Friday September 18, 2009 @04:40PM (#29471209)

I do like mirroring more than this parity stuff.
Let's assume I have e.g. a RAID10 containing 100TB of Data. I loose one disk. and then I loose the second one. CrashBoomBang!
Of course I'm not stupid and I have a backup of all my 100TB of Data. What traditionally would happen now is, I will need to restore all of my Data, as I don't know what I had on these disks. Takes ages!
Now comes the magic of ZFS. Through the combination of ZFS as a Volume manager and a file system, ZFS can tell me exactly which files are missing. I will then replace both failed disk. If using 1TB Disks, I don't have to restore 100TB, but only up to 1TB. This will minimize the recovery time up to 100 times.

Parent Share
twitter facebook
Re:fill the drive with helium (Score:3, Interesting)

by dkf ( 304284 ) writes: <donal.k.fellows@manchester.ac.uk> on Friday September 18, 2009 @08:17PM (#29473063) Homepage

Filling the drive with helium should help; the speed of sound in helium is 3x higher than in air, and it offers less resistance.
(Hydrogen would be even better, but it has a tendency to interact with metals in unfortunate ways.)

Thinking about it, methane might be a more practical choice. Yes, it's denser than helium so the effect won't be anything like as strong (the speed of sound in methane is only about 40% faster) but it's also very cheap and available, and won't cause too many problems from interacting with the rest of the drive. Having to seal the drive is an issue, yes, but that's not far off what's needed now; it's imperative that dust is kept out of the platter enclosure anyway...

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

simple idea (Score:3, Interesting)

Bogus outdated thinking (Score:5, Interesting)

There are always more solutions... (Score:1, Interesting)

Hardware RAID is dead (Score:3, Interesting)

Non-issue ... (Score:4, Interesting)

If you want smaller drives... (Score:5, Interesting)

ZFS, Anyone? (Score:2, Interesting)

Fountain codes? (Score:3, Interesting)

Re:Worked-around a Long Time Ago (Score:5, Interesting)

Old news (Score:2, Interesting)

Parity declustering (Score:5, Interesting)

Re:Fountain codes? (Score:1, Interesting)

Remembering an article earlier this week: (Score:4, Interesting)

RAID concept is fine, it's that HDs are too big (Score:5, Interesting)

doesn't raid 10 solve this? (Score:3, Interesting)

Re:RAID is here to stay (Score:3, Interesting)

Re:Worked-around a Long Time Ago (Score:5, Interesting)

RAID 4 has a dedicated parity drive, not 5 (Score:5, Interesting)

RAID6 with enterprise hardware is reliable (Score:3, Interesting)

Re:RAID is here to stay (Score:1, Interesting)

I'm not sure I get it (Score:3, Interesting)

Re:simple idea (Score:3, Interesting)

Re:simple idea (Score:4, Interesting)

Re:simple idea (Score:5, Interesting)

Dear Seagate, Western Digital, et. al: (Score:5, Interesting)

Re:RAID is here to stay (Score:3, Interesting)

Re:RAID is here to stay (Score:3, Interesting)

Comment removed (Score:3, Interesting)

Re:Bogus outdated thinking (Score:3, Interesting)

Re:simple idea (Score:3, Interesting)

Re:simple idea (Score:5, Interesting)

fill the drive with helium (Score:4, Interesting)

Re:simple idea (Score:4, Interesting)

Re:Dear Seagate, Western Digital, et. al: (Score:3, Interesting)

Re:simple idea (Score:3, Interesting)

Re:Parent is spot on (Score:1, Interesting)

Re:fill the drive with helium (Score:3, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals