 
			
		
		
	
		
		
		
		
		
		
			
				 
			
		
		
	
    
	Are RAID Controllers the Next Data Center Bottleneck? 171
			
		 	
				storagedude writes "This article suggests that most RAID controllers are completely unprepared for solid state drives and parallel file systems, all but guaranteeing another I/O bottleneck in data centers and another round of fixes and upgrades. What's more, some unnamed RAID vendors don't seem to even want to hear about the problem. Quoting: 'Common wisdom has held until now that I/O is random. This may have been true for many applications and file system allocation methodologies in the recent past, but with new file system allocation methods, pNFS and most importantly SSDs, the world as we know it is changing fast. RAID storage vendors who say that IOPS are all that matters for their controllers will be wrong within the next 18 months, if they aren't already.'"
		 	
		
		
		
		
			
		
	
Wait. You mean my SAN is Dead? (Score:5, Insightful)
Hardware RAID's are not exactly hopping off the shelf and I think many shops are happy with fiberchannel.
Let's do another reality check: this is enterprise class hardware. Are you telling me you can get SSD RAID/SAN in a COTS package that is cost approximate to whatever is available now? Didn't think so....
Let's face it, in this class of hardware things move much more slowly.
Re:distibution (Score:3, Insightful)
That's fine for some things but I really don't want my confidential client work-product mirrored around the world. Despite all the cloud hype there is still a subset of data that I really do NOT want to let outside my corporate walls.
enterprise storage (Score:4, Insightful)
Storage has been the performance bottleneck for so long, it's a happy problem if you actually must increase the bus speeds/cpu processors/get faster memory on raid cards to keep up. Seems to me the article(or at least the summary) was written by someone hadn't been following enterprise storage for very long...
Hardware RAID becoming less relevant every day. (Score:2, Insightful)
The first question is really, why RAID a SSD? It's already more reliable than a mechanical disk, so that argument goes out the window. You might get some increased performance, but that's often not a big factor.
The second question is, with processors coming with 8 cores, why have some separate specialized controller that handles RAID and not just do it in software?
Re:enterprise storage (Score:1, Insightful)
Damn straight! IO has been the bottleneck for at least 40 years. SSD is slowly opening doors to a brighter future, but we're a long way from the realistic capacity needs for business. Although I've yet to see real benchmarking that is designed for hundreds of simultaneous tasks, all the figures I see are largely rubbish assuming the user does one or two things. How about testing them on web services like digg, or on company mail servers instead of fake throughput and "feel" tests?
Re:BAD MATH (Score:5, Insightful)
The last part of that sentence is particularly interesting in the context of this article. "Motherboard RAID" is, outside of the very highest end motherboards, usually just bog-standard software raid with just enough BIOS goo to make it bootable. Hardware RAID, by contrast, actually has its own little processor and does the work itself. Of late, general purpose microprocessors have been getting faster, and cores in common systems have been getting more numerous, at a substantially greater rate than hardware RAID cards have been getting spec bumps(outside of the super high end stuff, I'm not talking about whatever EMC is connecting 256 fibre channel drives to, I'm talking about anything you could get for less than $1,500 and shove in a PCIe slot). Perhaps more importantly, the sophistication of OS support for nontrivial multi-disk configurations(software RAID, ZFS, storage pools, etc.) has been getting steadily greater and more mature, with a good deal of competition between OSes and vendors. RAID cards, by contrast, leave you stuck with whatever firmware updates the vendor deigns to give you.
I'd be inclined to suspect that, for a great many applications, dedicated hardware RAID will die(the performance and uptime of a $1,000 server with a $500 RAID card will be worse than a $1,500 server with software RAID, for instance) or be replaced by software RAID with coprocessor support(in the same way that encryption is generally handled by the OS, in software; but can be supplemented with crypto accelerator cards if desired).
Dedicated RAID of various flavors probably will hang on in high end applications(just as high end switches and rouers typically still have loads of custom ASICs and secret sauce, while low end ones are typically just embedded *nix boxes on commodity architectures); but the low end seems increasingly hostile.
Re:I/O is random? What have you been smoking? (Score:3, Insightful)
I think we need a mod option to mod down the article summary: -1, stupid editor.
You had your chance [slashdot.org].
Re:enterprise storage (Score:4, Insightful)
Ah... pointing the finger at the storage... My favorite activity. Listening to DBAs, application writers, etc point the finger at the EMC DMX with 256GB of mirrored cache and 4Gb/s FC interfaces. You point your finger and say, "I need 8Gb FibreChannel!. Yet when I look at your hba utilization over a 3mo period (including quarter end, month end etc..) I see you averaging a paltry 100MB/s. Wow. Guess I could have saved thousands of dollars with going with 2Gb/s HBAs. Oh yeah, and you have a minimum of two HBAs per server. Running a nagios application to poll our switchports for utilization, the average host is running maybe 20% utilization of the link speed, and as you beg, "Gimme 8Gb/s FC", I look forward to your 10% utilization.
You do sound like you know what you're doing, but there is quite a difference between average utilization and peak utilization. I have some servers that average less than 5% usage on a daily basis, but will briefly max out the connection about 5-6 times per day. For some applications, more peak speed does matter.
Re:I/O is random? What have you been smoking? (Score:5, Insightful)
All the important operations tend to be random. For a file server, you may have twenty people accessing files simultaneously. Or a hundred, or a thousand. For a webserver, it'll be hitting dozens or hundreds of static pages and, if you have database backend, that's almost entirely random as well.
For people consolidating physical servers to virtual servers, you now have two, three, ten or twenty VMs running on one machine. If every one of those VMs tries to do a "sequential" IO, it gets interlaced by the hypervisor into all the other sequential IOs. No hypervisor would dare tell all the other VMs to sit back and wait so that every IO is sequential. That delay could be seconds or minutes or hours.
Now imagine all that, and take into account that the latest Intel SSD gets around 6600 IOPS read and write. A good, fast hard drive gets 200. So you could put thirty three hard drives in RAID 0 and have the same number of IOPS, and your latency would still be worse. All the RAID0 really does for you is give you a nice big queue pipeline, like in a CPU. Your IO doesn't really get done faster, but you can have many more running simultaneously.
Given that SSDs are easily three to four times faster on sequential IO and an order of magnitude faster on random IO, I don't think it's that implausible to believe that the industry isn't ready.
Re:enterprise storage (Score:4, Insightful)
Sort of true, but not entirely accurate.
Is the on-demand response slow? Stats lie. Stats mislead. Stats are only stats. The systems I'm monitoring would use more I/O if they could. Those basic read/write graphs are just the start. How's the latency? Any errors? Pathing setup good? Are the systems queuing i/o requests while waiting for i/o service response?
And traffic is almost always bursty unless the link is maxed - you're checking out a nice graph of the maximums too, I hope? That average looks mighty deceiving when long periods are compressed. At an extreme over months or years, data points can be days. Overnight + workday could = 50%. No big deal on the average.
I have a similiar usage situation on many systems, but the limits are generally still storage dependent issues like i/o latency (apps make a limited number of requests before requests start queuing), poorly grown storage (a few luns there, a few here, everything is suddenly slowing down due to striping in one over-subscribed drawer), and sometimes unexpected network latency on the SAN (switch bottlenecks on the path to the storage).
Those graphs of i/o may look pitiful, but perhaps that's only because the poor servers can't get the data any faster.
Older enterprise SAN units (even just 4 or 5 years ago) kinda suck performance wise. The specs are lies in the real world. A newer unit, newer drives, newer connects and just like a server, you'll be shocked. What'cha know, those 4Gb cards are good for 4Gb after all!
Every year, there's a few changes and growth, just like in every other tech sector.
Re:Not quite (Score:3, Insightful)
Most enterprise level SSDs have BBWC [google.com] already for exactly that reason. On those systems fsync is a noop. I for one am looking forward to SSDs in enterprise level applications, we could easily consolidate current database servers that are IOPS bottlenecked, with very low levels of CPU and non-caching memory utilization. BBWC solves the "oh, but we need to honour fsync" kind of problems. We're looking at a performance increase of 10-20x (IOPS) easily if >500G enterprise level SSDs become available for database servers. Even if prices/GB stay way above SAN prices, it's still more than worth it to switch.
Re:Not quite (Score:3, Insightful)
You can't turn fsync into a complete noop just by putting a cache in the middle. A fsync call on the OS side that forces that write out to cache will block if the BBWC is full for example, and if the underlying device can't write fast enough without its own cache being turned on you'll still be in trouble.
While the cache in the middle will improve the situation by coalescing writes into the form the SSD can handle efficiently, the published SSD write IOPS numbers are still quite inflated relative to what you'll actually see. What I was trying to suggest is that the performance gap isn't nearly as large as suggested by the article of TFA once you start building real-world systems around them. After all, regular discs benefit from the write combining to lower seeks you get out of a BBWC, too, even more than the SSDs do.
The other funny thing you discover if you benchmark enough of these things is that a regular hard drive confined to only use as much space as a SSD provides is quite a bit faster too. When you limit a 500GB SATA drive to only use 64GB (a standard bit of short stroking [tomshardware.com]), there's a big improvement in sequential and seek speeds there. If you want to be fair, you should only compare your hard drive's IOPS when it's configured to only provide as much space as the SSD you're comparing against.