Sun Adding Flash Storage to Most of Its Servers 113
BobB-nw writes "Sun will release a 32GB flash storage drive this year and make flash storage an option for nearly every server the vendor produces, Sun officials are announcing Wednesday. Like EMC, Sun is predicting big things for flash. While flash storage is far more expensive than disk on a per-gigabyte basis, Sun argues that flash is cheaper for high-performance applications that rely on fast I/O Operations Per Second speeds."
We are going to have two layers of storage (Score:4, Interesting)
But most of what makes up the volume on current computers (log files, backups, video/audio) can be committed to a regular hard drive.
Re: (Score:2, Interesting)
Re:We are going to have two layers of storage (Score:5, Interesting)
I was thinking about this at Fry's the other day when trying to decide whether I could trust the replacement Seagate laptop drive similar to the one that crashed on me Sunday, and I concluded that the place I most want to see flash deployed is in laptops. Eventually, HDDs should be replaced with SSDs for obvious reliability reasons, particularly in laptops. However, in the short term, even just a few gigs of flash could dramatically improve hard drive reliability and battery life for a fairly negligible increase in the per-unit cost of the machines.
Basically, my idea is a lot like the Robson cache idea, but with a less absurd caching policy. Instead of uselessly making tasks like booting faster (I basically only boot after an OS update, and a stale boot cache won't help that any), the cache policy should be to try to make the hard drive spin less frequently and to provide protection of the most important data from drive failures. This means three things:
That last part is the best part. As data gets written to the hard drive, if the disk is not already spinning, the data would be written to the flash. The drive would spin up and get flushed to disk on shutdown to ensure that if you yank the drive out and put it into another machine, you don't get stale data. It would also be flushed whenever the disk has to spin up for some other activity (e.g. reading a block that isn't in the cache). The cache should also probably be flushed periodically (say once an hour) to minimize data loss in the event of a motherboard failure. If the computer crashes, the data would be flushed on the next boot. (Of course this means that unless the computer had boot-firmware-level support for reading data through such a cache, the OS would presumably need to flush the cache and disable write caching while updating or reinstalling the OS to avoid the risk of an unbootable system and/or data loss.)
As a result of such a design, the hard drive would rarely spin up except for reads, and any data frequently read would presumably come out of the in-kernel disk cache, so basically the hard drive should stay spun down until the user explicitly opened a file or launched a new application. This would eliminate the nearly constant spin-ups of the system drive resulting from relatively unimportant activity like registry/preference file writes, log data writes, etc. By being non-volatile, it would do so in a safe way.
This is similar to what some vendors already do, I know, but integrating it with the OS's buffer cache to make the caching more intelligent and giving the user the ability to request backups of certain data seem like useful enhancements.
Thoughts? Besides wondering what kind of person thinks through this while staring at a wall of hard drives at Fry's? :-)
Re: (Score:2)
Re:We are going to have two layers of storage (Score:5, Informative)
Five years ago, I would have agreed. These days, some of the better flash parts are rated as high as a million write cycles. If we're talking about 4 GB of flash, a million write cycles on every block would take a decade of continuous writes at 10 megabytes per second. Real-world workflows obviously won't hit the cache nearly that hard unless your OS has a completely worthless RAM-based write caching algorithm.... Odds are, the computer will wear out and be replaced long before the flash fails. That said, in the event of a flash write failure, you can always spin up the drive and do things the old-fashioned way. And, of course, assuming you put this on a card inside the machine, if it does fail, you wouldn't have to replace the whole motherboard to fix the problem.
That said, to reduce thrashing of the write cache, it might be a good idea to add a cap of a meg or two and spin up the hard drive asynchronously once the write cache size exceed that limit. Continue writing to the flash to avoid causing user delays while the HD spins up (huge perceived user performance win there, too) and flush once the drive is up to speed.
You could also do smart caching of ephemeral data (e.g. anything in /tmp, /var/tmp, etc.). Instead of flushing changes those files to disk on close, wait to flush them until there's no room for them in the RAM buffer cache, and then flush them to the flash. After all, those directories get wiped on reboot anyway, so if the computer crashes, there's no advantage to having flushed anything in those directories to disk....
BTW, in the last week, I've lost two hard drives, both less than a year old. I'm not too impressed with the write lifetimes of Winchester disk mechanisms. :-)
Re: (Score:1)
Adding another layer of caching via flash just means we'll have more things to go wrong.
Re:We are going to have two layers of storage (Score:5, Insightful)
Because write caches in RAM go away when your computer crashes, the power fails, etc. Battery-backed RAM is an option, but is a lot harder to get right than a USB flash part connected to an internal USB connector on a motherboard.... In-memory write caching (without battery backup) for more than a handful of seconds (to avoid writing files that are created and immediately deleted) is a very, very bad idea. There's a reason that no OS keeps data in a write cache for more than about 30 seconds (and even that is about five times too long, IMHO).
Write caching is the only way you can avoid constantly spinning up the disk. We already have lots of read caching, so no amount of improvement to read caching is likely to improve things that dramatically over what we have already.
Even for read caching, however, there are advantages to having hot block caches that are persistent across reboots, power failures, crashes, etc. (provided that your filesystem format provides a last modified date at the volume level so you can dispose of any read caches if someone pulls the drive, modifies it with a different computer, and puts the drive back). Think of it as basically prewarming the in-memory cache, but without the performance impact....
Re: (Score:2)
Re: (Score:2)
I've never had one die on me yet.
They've worked fabulously until now and there is no reason why they will die overnight.
Price, raw speed (as opposed to seek) and size are the main factors.
Your caching ideas sound awfully like what Linux does out of the box with RAM.
And you want to *reduce* writes to SSDs, not increase them.
Re: (Score:2)
So I wouldn't exactly call them reliable. I don't store anything important on just a single HD any more. But I don't think I'd do that with a SSD either.
Hard drives just ain't reliable (Score:2)
Hard drives are hardly unreliable.
I've never had one die on me yet.
You're either incredibly lucky, haven't owned many hard drives or lying. I'll presume you're honest (despite your handle) and the other two are equally likely in my opinion. Personally I've had at least 15 hard drives crap out on me over the years, not those of counting friends, family and coworkers which sends the number well into triple digits. And those are just those I've seen first hand. Add in the ones I know about at companies I've worked in and the number is in the thousands easily. Hard drive
Re: (Score:2)
I also admit that I use RAID 5 on my home server (my main data store) but not on my other computers.
How the hell do you get power surges in a computer?
You must be using some pretty dodgy power supplies.
They should die, absorbing the surge rather than letting it through.
Writes to a hard drive dont matter in the slightest.
They are the same as reads in terms of wear and tear.
If you write to SSDs a lot then your looking at it having a ve
Re: (Score:2)
I also admit that I use RAID 5 on my home server (my main data store) but not on my other computers.
Actually I use RAID 0 on one of my servers. Still lost two hard drives on it.
How the hell do you get power surges in a computer?
Power supplies do not filter out all surges and by design they can't help with power dips. Flip you box on/off 20 times in under 5 seconds and you'll likely have some dead equipment. Have a lightning strike in your vicinity and you likely will have some friend equipment regardless of your power supply make. Recently I had a loose neutral wire on my main in my house which made voltages swing by + or - 40 volts. Not good for t
Re: (Score:2)
Re: (Score:2)
You have something seriously wrong in that case.
Even bare bones UPSes wont allow the voltage fluctuate.
If mine detects anything astray it will 'rebuild' the power completely and give my computer *exactly* 240v.
Re: (Score:2)
Erm. You have a UPS on your computer and it still has dodgy power?
You have something seriously wrong in that case.
No, that's why I don't have hard drives dying from surges anymore. I learned my lesson long ago after I fried one too many bits of electronics. I've lived in too many areas with dodgy power to trust what comes out of the wall anymore. Now my main problems are static and shock from the odd bit of dropped equipment.
:-)
Though I did once have a UPS and the attached computers fry due to a nearby lightning strike. Get 1.21gigawats across a UPS and it doesn't matter what you've got protecting it.
Even bare bones UPSes wont allow the voltage fluctuate.
That's not
Re: (Score:2)
Re: (Score:2)
the cache policy should be to try to make the hard drive spin less frequently
Re: (Score:2)
Yes and no. You're right that spinning up and down causes more mechanical wear on the spindle motor. However, leaving drives in laptops running continuously is also bad. Hard drives don't like heat, and laptop enclosures are not designed to dissipate heat from the drive. They basically have zero airflow across the drive, so the top of the drive enclosure and the case develop this layer of heated air that further insulates the drive from dissipating heat.
Further, AFAIK, spindle motors haven't been the
Re: (Score:2)
Yes, IDE flash drives are pretty expensive, but you can get a 32GB CF card and a CF-IDE adapter for around $150 last time I checked. Supposedly the tech that allows for 32GB CF also makes 64GB possible, which is the sweet spot for me on a laptop, but I don't seem to be
Re: (Score:2)
Last I checked, flash write performance (at least for CF and USB stuff) still left something to be desired. A laptop hard drive still exceeds the speed of flash when writing by a factor of about 2-3 or thereabouts. The read performance is almost even for 5400 RPM laptop drives, and within a factor of 1.5 of even 7200 RPM drives, so that's more livable. You probably wouldn't want to capture audio or video to a flash drive or do other tasks that involve continuous high speed writes, at least for now.
Comp
Sun is Afraid of THIS! (Score:3, Interesting)
The perfect thing for serving media, where you don't need a few GB per customer, you need the same few GB served o
Re: (Score:1)
x500.
Re:We are going to have two layers of storage (Score:5, Informative)
Also, throw in wear-leveling and spare sectors. a million writes to a file system sector doesn't mean a million writes to a particular physical sector (could be 1000 writes each to a 1000 different sectors) and when a sector does wear out, it simply gets put out of service and is replaced with a spare one. this same principle is used in mechanical hard drives. if a sector is problematic to read from/write to, it gets marked as bad and the file system sector is remapped to somewhere else.
SSDs could quite likely last longer than mechanical hard drives in this regard.
Re: (Score:1, Interesting)
Re: (Score:2)
In order to do wear leveling you have to have additional metadata, which will take up additional space on disk. With SSD you pay a premium per byte over a magnetic disk. Do you really want a file system that's not going to make the best use of the space you just bought for an arm and a leg?
Re: (Score:2, Insightful)
So, as density of flash goes up, write cycle lifetime potentially goes down.
HDDs have the same issue of bits being less "durable" as capacity goes up. However, the media never wears out for HDD. Furthermore, it is already accepted that there will be many bit errors and these are simply corrected with error correction codes and mapping out bad sectors.
As far as reliability goes, everybody t
Re: (Score:1)
I was sure that figure was upwards of a million cycles per sector in modern flash chips.
I keep seeing this figure of a million cycles per sector but have yet to see it on a datasheet.
To paraphrase the above response;
"No marketing or sales executive will ever countenance the downward estimation of erase/write cycles per sector. If the chip can survive one million cycles per sector, then by God it's going to say so on the sales info."
Last time I checked no one was listing the lifetime of their chip at more than 100k cycles per sector.
What manufacturer is quoting anything near one million erase/write cycles per sector?
Re:We are going to have two layers of storage (Score:5, Funny)
Re:We are going to have two layers of storage (Score:4, Interesting)
In addition, assume that 90% of ram-drive accesses go to 10% of the storage, you can see that effectively you are burning a lot of energy with zero gain. Multiply by up-time.
Flash has the potential of greatly improving performance/watt for most servers.
Re: (Score:3, Insightful)
Re: (Score:2)
Random access, relational, transactional, adjective-al databases.
Random access time is (evidently) much better on flash. That's how Vista ReadyBoost works - there's a performance boost (a tiny one) if you let it put the non-sequential parts of the swap file onto a flash key.
I imagine that you could increase performance for some types of databases by running on a solid-state drive.
Re:We are going to have two layers of storage (Score:5, Informative)
SSDs really shine for OLTP databases. Lots of random IO occurs on these databases (as opposed to data warehouses that use lots of sequential IO).
Normal hard drives are horrible for random IO because of mechanical limitations. Think about trying to switch tracks on a record player thousands of times per second; this is whats happening inside a hard drive (under a random IO load). Its amazing mechanical HDDs work as well as they do.
Re: (Score:3, Interesting)
Also, DRAM burns ~8W/GB (more if FB-DIMMS), Flash burns only 0.01W/GB. Thus swapping inactive pages to Flash allows you to use your DRAM more effectively, improving your performance/W.
From a different perspective: you have a datacenter and you are energy c
Re: (Score:2)
Re: (Score:1)
Also, with the OS and programs, any limited number of write to the device is a non-issue.
Two layers, but not those ones (Score:5, Insightful)
High frequency, low volume operations - metadata journalling, certain database transactions - will go to flash, and low frequency, high volume operations - file transfers, bulk data moves - will go to regular hard drives. SSDs aren't yet all that much faster for bulk data moving, so it makes the most economic sense to put them where they're most needed: Where the IOPs are.
Back in the day, a single high-performance SCSI drive would sometimes play the same role for a big, cheap, slow array. Then, as now, you'd pay the premium price for the smallest amount of high-IOPs storage that you could get away with.
Re: (Score:2)
Registers, on chip cache, off chip cache, Ram, Flash, HD, LongTerm Low Volatility (tape, CD/DVD etc)
Re: (Score:2)
Write cycles. again. (Score:4, Insightful)
Re:Write cycles. again. (Score:4, Insightful)
Then, in a fit of wisdom, a few posters, all of whom will be modded down as flamebait, will say "There's room for both and price/performance does matter, at least for now."
Re: (Score:2, Insightful)
Re: (Score:2)
This is no longer an issue n00b.
MOD DOWN (Score:1, Troll)
*Search for "mutual funding money" AND "Samuel L. Jackson" - quite possibly the best dub to "clean up" some "dirty language"
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Good (Score:3, Insightful)
I, for one, welcome our new flash disk overlords
Re:Samsung 256GB Flash Drive (Score:5, Interesting)
Really? What other top 5 computer manufacturer has been putting flash drives in SERVERS? I've seen a few laptops, but I haven't seen any used in servers or storage systems. (EMC and a few others have announced plans to do it, but haven't released anything AFAK)
Also, their "thumper" server has 48 drives in it. Would you want to pay around $1000 per drive to fill that up?
Re: (Score:2, Informative)
Re: (Score:1, Informative)
Re:Samsung 256GB Flash Drive (Score:4, Insightful)
Just because you don't want to, doesn't mean everyone else doesn't want to also.
Re: (Score:2)
Expensive machines (Score:2)
Re: (Score:1, Interesting)
This problem hasn't been solved by the drive manufs, although their marketing depts have convinced many!
what drives are for. (Score:3, Insightful)
Re: (Score:1)
Re: (Score:3, Insightful)
You can go either way with it, but I think faster (and smaller) drives are more attractive than bigger and slower.
You need to compete against the sequential speed of a 15,000 rpm SCSI drive too (SSD will beat them dead on access speed, but not all workloads are small random reads)
big deal (Score:4, Funny)
Re:big deal (Score:5, Funny)
I can usually read into the comment if someone is joking or not... but this one... I dunno... Could go either way....
Re: (Score:1, Offtopic)
Re: (Score:2)
+1 point for dissing Microsoft.
+1 point for being honest.
Re: (Score:2)
Re: (Score:2)
Good news for us (Score:1)
Re: (Score:1)
You'll know they're becoming mainstream when, two weeks from now, you start getting mails from contract house recruiters perusing your resume on Monster looking for "SSD Storage Engineers" with "10 or more years of experience on Sun equi
This will even further ZFS (Score:5, Interesting)
Re: (Score:2, Interesting)
Current versions of ZFS have the feature where the ZIL (ZFS Intent Log) can be separated out of a pool's data devices and onto it's own disk. Generally, you'd want that disk to be as fast as possible, and these SSDs will be the winner in that respect. Can't wait!
As far as I know, contiguous writing of large chunks of data is slower for flash drives than plain HDD's. I'm guessing the ZIL is some kind of transactional journal log, where all disk writes go before they hit the main storage section of the filesystem? I don't think you'd get much of a speed bonus. SSDs are only really good for random access reads like OLTP databases.
Re:This will even further ZFS (Score:4, Interesting)
I have been actively researching a vendor who will supply this type of device. Currently we're testing with Gigabyte i-Ram cards, connected in through a separate SATA interface. (Note: Gigabyte are battery backed SDRAM
Fusion-IO is a vendor who is making a board for Linux - but as near as I can tell the cards aren't available yet, and when they are - they won't work with Solaris anyway!
The product which Neil Perrin did his testing with (umem/micromemory) with their 5425CN card doesn't work with current builds of Solaris. Umem is also a pain to work with
I hope Sun lets me buy these cards separately for my HP proliant servers. Of course if they didn't, this is one thing that might make me consider switching to Sun Hardware! (Hey HP/Dell - are you reading this??)
I'm surprised that it is big enough to talk about. (Score:5, Interesting)
If Sun expects to sell a decent number of flash disks, or is looking at making changes to their systems based on the expectation that flash disks will be used, then it is interesting news; but otherwise it just isn't all that dramatic. While flash and HDDs are very different in technical terms, the present incarnations of both technologies are virtually identical from a system integration perspective. This sort of announcement just doesn't mean much at all without some idea of expected volume.
Re:I'm surprised that it is big enough to talk abo (Score:5, Insightful)
Re: (Score:3, Funny)
Re: (Score:1, Funny)
Wha??!?!? (Score:1, Insightful)
(notice the GB GIGABYTES vs. gb GIGABITS)
Yeah.. I know I could buy a 4gb/s FC RAMSAN unit -- anybody got $50k laying around
Re:I'm surprised that it is big enough to talk abo (Score:5, Informative)
You have no idea what you are talking about. Sun customers demand that the product Sun sells them have known reliability properties and that Sun guarantees their products properly interact with each other. It takes a significant amount of resources to do this validation. At the same time SSDs and HDDs react very differently to load and can have all sorts of side effects if the OS/application is not prepared to deal with them.
Re:I'm surprised that it is big enough to talk abo (Score:1)
Re: (Score:3, Informative)
IOPS (Score:5, Informative)
From what I've been able to discern this is actually true only in read-mostly applications and applications where writes are already in neat multiples of the flash erase block size.
If you're doing random small writes your performance is likely to be miserable, because you'll need to erase blocks of flash much larger than the data actually being changed, then rewrite the block with the changed data.
Some apps, like databases, might not care about this if you're able to get their page size to match or exceed that of the underlying storage medium. Whether or not this is possible depends on the database.
For some other uses a log-oriented file system might help, but those have their own issues.
In general, though, flash storage currently only seems to be exciting for random read-mostly applications, which get a revolting performance boost so long as the blocks being written are small enough and scattered enough. For larger contiguous reads hard disks still leave flash in the dust because of their vastly superior raw throughput.
Vendors, however, make a much larger margin on flash disk sales.
This article (PDF) may be of interest:
Understanding Flash SSD performance [managedflash.com]
(google text version) [209.85.173.104].
Re: (Score:1)
Power consumption and heat dissipation (Score:4, Interesting)
http://www.stec-inc.com/green/storage_casestudy.php [stec-inc.com]
http://www.stec-inc.com/green/green_ssdsavings.php [stec-inc.com] (You have to request the whitepaper to see this one.)
RAID 4, anyone? (Score:4, Insightful)
This has benefits for workloads that issue many small randomly located reads and writes: if the requested data size is smaller than the block size, a single disk can service the request. The other disks can independently service other requests, leading to much higher random access bandwidth (though it doesn't help latency).
One of the side effects of this is that the parity disk must be much faster than the data disks, since it must service all requests, to provide the parity info. Here SSD shines, with its quick random access times, but poor sequential performance. Interesting, no?
Cheaper than RAM? (Score:4, Insightful)
Okay, you would still need a HDD for backing store, but in many server applications involving databases (high performance dynamic web servers for example) a normal RAID can cope with the writes - it's the random reads accessing the DB that cause the bottleneck. Having 200GB of database in RAM with HDDs for backing store would surely be higher performance than SSD.
For things where writes matter like financial transactions, would you want to rely on SSD anyway? Presumably banks have lots of redundancy and multiple storage/backup devices anyway, meaning each transaction is limited by the speed of the slowest storage device.
Re: (Score:2)
I don't know that you'd really need a lot of flash memory for this, maybe only a few meg allowing for spacing out the writes to avoid wear, but flash could allow you to do write caching when you normally wouldn't trust it, because it won't go away if you lose power.
IBM (Score:1)
Re: (Score:1, Funny)
Re: (Score:1, Redundant)
Re: (Score:2, Insightful)
Re:Lifespan? (Score:5, Informative)
Re: (Score:2, Insightful)
The fact that flash is only really well suited for infrequent writes and frequent non-contiguous reads doesn't bode well for its utility in OLTP applications.
Re: (Score:2)
Re: (Score:1, Funny)