Forgot your password?
typodupeerror
Data Storage Hardware

Are RAID Controllers the Next Data Center Bottleneck? 171

Posted by Soulskill
from the many-varied-pipes dept.
storagedude writes "This article suggests that most RAID controllers are completely unprepared for solid state drives and parallel file systems, all but guaranteeing another I/O bottleneck in data centers and another round of fixes and upgrades. What's more, some unnamed RAID vendors don't seem to even want to hear about the problem. Quoting: 'Common wisdom has held until now that I/O is random. This may have been true for many applications and file system allocation methodologies in the recent past, but with new file system allocation methods, pNFS and most importantly SSDs, the world as we know it is changing fast. RAID storage vendors who say that IOPS are all that matters for their controllers will be wrong within the next 18 months, if they aren't already.'"
This discussion has been archived. No new comments can be posted.

Are RAID Controllers the Next Data Center Bottleneck?

Comments Filter:
  • Re:distibution (Score:3, Informative)

    by Ex-MislTech (557759) on Saturday July 25, 2009 @12:59PM (#28819651)

    This is correct, there are laws on the books in most countries that prohibit the exposure of medical and other data
    to risk by putting it out in the open. Some have even moved to private virtual circuits, and the SAN's with fast
    access via solid state storage of active files works fine, and it moves less accessed data to drive storage,
    but none the less quite fast and SAS technology is faster than SCSI tech in throughput.

  • Re:distibution (Score:3, Informative)

    by Ex-MislTech (557759) on Saturday July 25, 2009 @01:01PM (#28819667)

    An example of SAS throughput pushing out 6 Gbps.

    http://www.pmc-sierra.com/sas6g/performance.php [pmc-sierra.com]

  • The second question is, with processors coming with 8 cores, why have some separate specialized controller that handles RAID and not just do it in software?

    I much prefer s/ware raid (Linux kernel dm_mirror), it removes a complicated piece of h/ware which is just another thing to go wrong. It also means that you can see the real disks that make up the mirror and so monitor it with the smart tools.

    OK: if you do raid5 rather than mirroring (raid1) you might want a h/ware card to offload the work to, but for many systems a few terabyte disks are big and cheap enough to just mirror.

  • Not quite (Score:4, Informative)

    by greg1104 (461138) <gsmith@gregsmith.com> on Saturday July 25, 2009 @01:11PM (#28819763) Homepage

    There may need to be some minor rethinking of controller throughput for read applications on smaller data sets for SSD. But right now, I regularly saturate the controller or bus when running sequential RW tests against a large number of physical drives in a RAID{1}0 array, so it's not like that's anything new. Using SSD just makes it more likely that will happen even on random workloads.

    There are two major problems with this analysis though. The first is that it presumes SSD will be large enough for the sorts of workloads people with RAID controllers encounter. While there are certainly people using such controllers to accelerate small data sets, you'll find just as many people who are using RAID to handle large amounts of data. Right now, if you've got terabytes of stuff, it's just not practical to use SSD yet. For example, I do database work for living, and the only place we're using SSD right now is for holding indexes. None of the data can fit, and the data growth volume is such that I don't even expect SSDs to ever catch up--hard drives are just keeping up with the pace of data growth.

    The second problem is that SSDs rely on volatile write caches in order to achieve their stated write performance, which is just plain not acceptable for enterprise applications where honoring fsync is important, like all database ones. You end up with disk corruption if there's a crash [mysqlperformanceblog.com], and as you can see in that article once everything was switched to only relying on non-volatile cache the performance of the SSD wasn't that much better than the RAID 10 system under test. The write IOPS claims of Intel's SSD products are garbage if you care about honoring write guarantees, which means it's not that hard to keep with them after all on the write side in a serious application.

  • All wrong. (Score:3, Informative)

    by sirwired (27582) on Saturday July 25, 2009 @01:16PM (#28819797)

    1) Most high-end RAID controllers aren't used for file serving. They are used to serve databases. Changes in filesystem technology don't affect them one bit, as most of the storage allocation decisions are made by the database.
    2) Assuming that a SSD controller that can pump 55k IOPS w/ 512B I/O's can do the same w/ 4K I/O's is stupid and probably wrong. That is Cringely math; could this guy possibly be as lame?
    3) The databases high-end RAID arrays get mostly used for do not now, and never have, used much bandwidth. They aren't going to magically do so just because the underlying disks (which the front-end server never even sees) can now handle more IOPS.

    All SSD's do is flip the Capacity/IOPS equation on the back end. Before, you ran out of drive IOPS before ran out of capacity. Now, you get to run out of capacity before you run out of IOPS on the drive side.

    Even if you have sufficient capacity (due to the rapid increase in SSD capacity), you are still going to run out of IOPS capacity on the RAID controller before you run out of IOPS or bandwidth on the drives. The RAID controller still has a lot of work to do with each I/O, and that isn't going to change just because the back-end drives are now more capable.

    SirWired

  • Re:distibution (Score:3, Informative)

    by lgw (121541) on Saturday July 25, 2009 @01:38PM (#28819979) Journal

    SAS technology is faster than SCSI tech in throughput

    "SCSI" does not mean "parallel cable"!

    Sorry, pet peev, but obviously Serial Attached SCSI [wikipedia.org] (SAS) is SCSI. All Fibre Channel storage speaks SCSI (the command set) all USB storage too. And iSCSI? Take a wild guess. Solid state hard drives that plug directly into PCIe slots with no other data bus? Still SCSI command set. Fast SATA drives? The high end ones often have a SATA-to-SCSI bridge chip in front of SCSI internals (and SAS can use SATA cabling anyhow these days).

    Pardon me, I'll just be over here grumbling about this.

  • Re:iscsi, 10gig (Score:2, Informative)

    by Anonymous Coward on Saturday July 25, 2009 @01:45PM (#28820025)

    Of course. NFS provides an easy to use concurrent shared filesystem that doesn't require any cluster overhead or complication like GFS or GPFS.

  • by mysidia (191772) on Saturday July 25, 2009 @01:51PM (#28820075)

    Well, ZFS is great, but don't get that mixed up with software RAID. It's not. The storage redundancy algorithms used by ZFS are not the RAID algorithms, such that using ZFS is much better than using EITHER hardware or software RAID.

    ZFS provides performance and data integrity assurance that standard RAID does not. Primarily, because filesystem level data is checksummed, and it should be almost impossible for silent data corruption to occur at the storage device level, except cases where the data written actually matches the checksums, (a later 'zpool scrub' should detect it, if ZFS is implemented properly).

    But aside from ZFS, software RAID (and even fakeraid/hostraid hardware adapters that perform RAID in the driver) really really suck both in terms of reliability, data integrity, and performance when you need to push things to the maximum, compared to a good hardware RAID controller; software RAID is measurably slower on the same CPU and memory.

    SMART provides so little of what you need to be doing to keep a reliable array, it isn't even funny.

    Good hardware controllers keep metadata and do frequent consistency checks / "scrubs" / surface scans, to ensure every bit of data is periodically read from every drive, so HDD firmware has an opportunity to fix errors before they become "unrecoverable read errors".

    Hardware controllers will also detect when a hard drive is having a problem that cannot be easily identified by software. Hard drives are direcly plugged into the controller; it can detect things such as abnormal command response latencies.

    A software controller can't be sure the abnormal latency isn't due to other workload on the bus, or "not a drive failing", so the HW controller is more responsive to failure.

    HW contollers also provide writethrough caching, and sometimes have a BBU with a full writeback cache, which drastically helps performance, and reduces the RAID performance penalty, which software RAID doesn't mitigate, but in fact makes worse.

    Oh yes, and Good controllers also have monitoring and administration tools for various OSes, including Linux, Windows, and Solaris, produced by the manufacturer.

    Many of the good controllers come equipped with audible alarms and terminals for you to plug drive failure LEDs into, so that anyone near the server can know a drive has failed, and which one.

  • Re:Not quite (Score:1, Informative)

    by Anonymous Coward on Saturday July 25, 2009 @06:00PM (#28821979)

    First a quick clarification: Intel X25 series SSDs do not use their RAM as a data writeback cache. Intel ships racks full of both M and E series drives, with those drives living in a RAID configuration. They couldn't pull that off if the array was corrupted on power loss.

    While it would be nice if this were true, since Intel's FAQ [intel.com] references a write cache and database-oriented tests like the one I referenced show data corruption, the paranoid (which includes everyone who works on database and similar enterprise apps) have to presume there's still a problem until some trustworthy studies to the contrary appear. Please let me know if you're aware of any. Your argument of "they couldn't pull that off" is not a data point, because millions of hard drives with a lying write cache are shipped every year to people who think they're just fine, and who don't experience corruption on power loss. Those same drives show corruption just fine if you do a database-oriented corruption test on them.

    Until I see SSD vendors giving very clear statements about their write caching and they start passing tests specifically aimed at discovering this type of corruption, you have to assume that the situation with them is just as bad as it's always been with regular IDE or SATA disks--drives lie. The only such test I've seen so far using the Intel drives is from Vadim, the X25-E failed. It would be great if the coverage you were doing at PC Perspective, expanded to cover this issue fully; write-cache enabled? [jasonbrome.com], diskchecker.pl [livejournal.com], and faking the sync [livejournal.com] have good introductions to this issue and how to run such tests yourself.

  • Re:iscsi, 10gig (Score:3, Informative)

    by drsmithy (35869) <drsmithy@@@gmail...com> on Saturday July 25, 2009 @06:15PM (#28822121)

    Does anyone actually still use NFS?

    Of course. It's nearly always fast enough, trivially simple to setup, and doesn't need complicated and fragile clustering software so that multiple systems can access the same disk space.

"Text processing has made it possible to right-justify any idea, even one which cannot be justified on any other grounds." -- J. Finnegan, USC.

Working...