SSDs: The New King of the Data Center? 172
Nerval's Lobster writes "Flash storage is more common on mobile devices than data-center hardware, but that could soon change. The industry has seen increasing sales of solid-state drives (SSDs) as a replacement for traditional hard drives, according to IHS iSuppli Research. Nearly all of these have been sold for ultrabooks, laptops and other mobile devices that can benefit from a combination of low energy use and high-powered performance. Despite that, businesses have lagged the consumer market in adoption of SSDs, largely due to the format's comparatively small size, high cost and the concerns of datacenter managers about long-term stability and comparatively high failure rates. But that's changing quickly, according to market researchers IDC and Gartner: Datacenter- and enterprise-storage managers are buying SSDs in greater numbers for both server-attached storage and mainstream storage infrastructure, according to studies both research firms published in April. That doesn't mean SSDs will oust hard drives and replace them directly in existing systems, but it does raise a question: are SSDs mature enough (and cheap enough) to support business-sized workloads? Or are they still best suited for laptops and mobile devices?"
20x faster (Score:3, Informative)
Silver Bullet (Score:5, Informative)
We have hundreds of SSDs in production servers. We couldn't survive without them. For heavy database workloads, they are the silver bullet to I/O problems, so much so that running a database on regular disk has become almost unimaginable. Why would you even try to do that?
Near-line storage only: Has been for some years. (Score:5, Informative)
Single component failure not a big deal any more. (Score:5, Informative)
I think that the wide range adoption of server SSDs also shows how far server installations have progressed toward eliminating all single points of failure.
In the passt HA and 'five nines' was something only done by a few niches, like telephony provider switches or banking big iron. Today it is common in many cloud installations and most sizeable server setups. A single component failing will not stop your service.
If your business can support the extra cost for the SSDs, a failing drive will not stop you and the performance of the service will see great improvements anyway. The power savings may even make the SSD not so costly after all.
Re:Long-term, not short-term (Score:5, Informative)
We've been using SSDs in our servers since late 2008, starting with Fusion-io ioDrives and Intel drives since then - X25-E and X25-M, then 320, 520 and 710, and now planning to deploy a stack of S3700 and S3500 drives. Our main cluster of 10 servers has 24 SSDs each, we have another 40 drives on a dedicated search server, and smaller numbers elsewhere.
What we've found:
* Read performance is consistently brilliant. There's simply no going back.
* Random write performance on the 710 series is not great (compared to the SLC-based X25-E or ioDrives), and sustained random write performance on the mainstream drives isn't great either, but a single drive can still outperform a RAID-10 array of 15k rpm disks. The S3700 looks much better, but we haven't deployed them yet.
* SSDs can and do die without warning. One moment 100% good, next moment completely non-functional. Always use RAID if you love your data. (1, 10, 5, or 6, depending on your application.)
* Unlike disks, RAID-5 or 50 works pretty well for database workloads.
* We have noted the leading edge of the bathtub curve (infant mortality), but so far, no trailing edge as older drives start to wear out. Once in place, they just keep humming along.
* That said, we do match drives to workloads - SLC or enterprise MLC for random write loads (InnoDB, MongoDB) and MLC for sequential write/random read loads (TokuDB, CouchDB, Cassandra).
Re:Long-term, not short-term (Score:3, Informative)
If you do RAID5 or RAID6, you should match your RAID block exactly to the write block size of the SSD. If you do not, then you will generally need two writes to each SSD for every actual write performed. This will reduce the lifetime for the SSD and reduces the efficiency. Most RAID controllers have no way of doing this automatically and it is not easy to learn what the write block size is on an SSD (it is not generally part of the information on the drive).
Re:Great for some apps (see netflix blog) (Score:5, Informative)
How does that make sense.
As the link to Netflix pointed out -- they benchmarked the entire system with the same REST API in front.
They configured one cluster of SSD-based servers; which another cluster of spinning-disk-with-large-RAM-based servers. It took a cluster of 15 SSD-backed servers to match the throughput of 84 RAM+Spinning servers. With throughput matched, the SSD-based cluster provided better latency and lower cost.
TL/DR: "Same Throughput, Lower Latency, Half Cost".