Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Sun Adding Flash Storage to Most of Its Servers

Posted by timothy on Wed Jun 04, 2008 12:22 PM
from the add-it-to-everything-please dept.
BobB-nw writes "Sun will release a 32GB flash storage drive this year and make flash storage an option for nearly every server the vendor produces, Sun officials are announcing Wednesday. Like EMC, Sun is predicting big things for flash. While flash storage is far more expensive than disk on a per-gigabyte basis, Sun argues that flash is cheaper for high-performance applications that rely on fast I/O Operations Per Second speeds."
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by javilon (99157) on Wednesday June 04 2008, @12:27PM (#23655425) Homepage
    I would put the operating systems, binaries and configuration files on the SSD.

    But most of what makes up the volume on current computers (log files, backups, video/audio) can be committed to a regular hard drive.
    • Re: (Score:2, Interesting)

      That definately makes sense, It is a very expensive idea to make EVERYTHING SSD, it doesn't really make sense, because except for on the local level it wouldn't really make a huge difference either, due to network bandwidth limits.
      • by dgatwood (11270) on Wednesday June 04 2008, @01:32PM (#23656481) Journal

        I was thinking about this at Fry's the other day when trying to decide whether I could trust the replacement Seagate laptop drive similar to the one that crashed on me Sunday, and I concluded that the place I most want to see flash deployed is in laptops. Eventually, HDDs should be replaced with SSDs for obvious reliability reasons, particularly in laptops. However, in the short term, even just a few gigs of flash could dramatically improve hard drive reliability and battery life for a fairly negligible increase in the per-unit cost of the machines.

        Basically, my idea is a lot like the Robson cache idea, but with a less absurd caching policy. Instead of uselessly making tasks like booting faster (I basically only boot after an OS update, and a stale boot cache won't help that any), the cache policy should be to try to make the hard drive spin less frequently and to provide protection of the most important data from drive failures. This means three things:

        1. A handful of frequently used applications should be cached. The user should be able to choose apps to be cached, and any changes to the app should automatically write through the cache to the disk so that the apps are always identical in cache and on disk.
        2. The most important user data should be stored there. The user should have control over which files get automatically backed up whenever they are modified. Basically a Time Machine Lite so you can have access to several previous versions of selected critical files even while on the go. The OS could also provide an emergency boot tool on the install CD to copy files out of the cache to another disk in case of a hard drive crash.
        3. The remainder of the disk space should be used for a sparse disk image as a write cache for the hard drive, with automatic hot files caching and (to the maximum extent practical) caching of any catalog tree data that gets kicked out of the kernel's in-memory cache.

        That last part is the best part. As data gets written to the hard drive, if the disk is not already spinning, the data would be written to the flash. The drive would spin up and get flushed to disk on shutdown to ensure that if you yank the drive out and put it into another machine, you don't get stale data. It would also be flushed whenever the disk has to spin up for some other activity (e.g. reading a block that isn't in the cache). The cache should also probably be flushed periodically (say once an hour) to minimize data loss in the event of a motherboard failure. If the computer crashes, the data would be flushed on the next boot. (Of course this means that unless the computer had boot-firmware-level support for reading data through such a cache, the OS would presumably need to flush the cache and disable write caching while updating or reinstalling the OS to avoid the risk of an unbootable system and/or data loss.)

        As a result of such a design, the hard drive would rarely spin up except for reads, and any data frequently read would presumably come out of the in-kernel disk cache, so basically the hard drive should stay spun down until the user explicitly opened a file or launched a new application. This would eliminate the nearly constant spin-ups of the system drive resulting from relatively unimportant activity like registry/preference file writes, log data writes, etc. By being non-volatile, it would do so in a safe way.

        This is similar to what some vendors already do, I know, but integrating it with the OS's buffer cache to make the caching more intelligent and giving the user the ability to request backups of certain data seem like useful enhancements.

        Thoughts? Besides wondering what kind of person thinks through this while staring at a wall of hard drives at Fry's? :-)

        • I disagree that these disks should be used as a write cache. Frequent, incremental modifications to files is exactly what you DON'T want to use flash/SSD for, since it will wear out larger disk "blocks" faster than regular hard-disk writing. If you're not going to take advantage of HDD technology's superior write lifetime, you might as well not have one at all.
          • by dgatwood (11270) on Wednesday June 04 2008, @04:26PM (#23659345) Journal

            Five years ago, I would have agreed. These days, some of the better flash parts are rated as high as a million write cycles. If we're talking about 4 GB of flash, a million write cycles on every block would take a decade of continuous writes at 10 megabytes per second. Real-world workflows obviously won't hit the cache nearly that hard unless your OS has a completely worthless RAM-based write caching algorithm.... Odds are, the computer will wear out and be replaced long before the flash fails. That said, in the event of a flash write failure, you can always spin up the drive and do things the old-fashioned way. And, of course, assuming you put this on a card inside the machine, if it does fail, you wouldn't have to replace the whole motherboard to fix the problem.

            That said, to reduce thrashing of the write cache, it might be a good idea to add a cap of a meg or two and spin up the hard drive asynchronously once the write cache size exceed that limit. Continue writing to the flash to avoid causing user delays while the HD spins up (huge perceived user performance win there, too) and flush once the drive is up to speed.

            You could also do smart caching of ephemeral data (e.g. anything in /tmp, /var/tmp, etc.). Instead of flushing changes those files to disk on close, wait to flush them until there's no room for them in the RAM buffer cache, and then flush them to the flash. After all, those directories get wiped on reboot anyway, so if the computer crashes, there's no advantage to having flushed anything in those directories to disk....

            BTW, in the last week, I've lost two hard drives, both less than a year old. I'm not too impressed with the write lifetimes of Winchester disk mechanisms. :-)

          • by dgatwood (11270) on Wednesday June 04 2008, @04:39PM (#23659553) Journal

            Because write caches in RAM go away when your computer crashes, the power fails, etc. Battery-backed RAM is an option, but is a lot harder to get right than a USB flash part connected to an internal USB connector on a motherboard.... In-memory write caching (without battery backup) for more than a handful of seconds (to avoid writing files that are created and immediately deleted) is a very, very bad idea. There's a reason that no OS keeps data in a write cache for more than about 30 seconds (and even that is about five times too long, IMHO).

            Write caching is the only way you can avoid constantly spinning up the disk. We already have lots of read caching, so no amount of improvement to read caching is likely to improve things that dramatically over what we have already.

            Even for read caching, however, there are advantages to having hot block caches that are persistent across reboots, power failures, crashes, etc. (provided that your filesystem format provides a last modified date at the volume level so you can dispose of any read caches if someone pulls the drive, modifies it with a different computer, and puts the drive back). Think of it as basically prewarming the in-memory cache, but without the performance impact....

      • A single ioFusion [tgdaily.com] card has the concurrent data serving ability of a 1U server cabinet full of media servers. They do this by having 160 channels on a drive controller that also incorporates flash memory. Since each channel is a few orders of magnitude faster than a mechanical hard drive, one card can handle a flurry of concurrent random access requests as fast as 1000 conventional hard drives.

        The perfect thing for serving media, where you don't need a few GB per customer, you need the same few GB served o
    • Re: (Score:3, Insightful)

      Why would you bother putting the programs and operating system on SSD for a server? Once the files are loaded into memory, you'll never need to access them again. SSD only helps with OS and Programs when you are booting up, or opening new programs. This almost never happens on most servers.
      • Random access, relational, transactional, adjective-al databases.

        Random access time is (evidently) much better on flash. That's how Vista ReadyBoost works - there's a performance boost (a tiny one) if you let it put the non-sequential parts of the swap file onto a flash key.

        I imagine that you could increase performance for some types of databases by running on a solid-state drive.

        • by BlendieOfIndie (1185569) on Wednesday June 04 2008, @01:30PM (#23656441)
          It sounds like the SSDs are internal drives for the server. A database would never be stored on an internal hard drive. Almost any commercial database is connected to a disk farm through SAN fabric.

          SSDs really shine for OLTP databases. Lots of random IO occurs on these databases (as opposed to data warehouses that use lots of sequential IO).

          Normal hard drives are horrible for random IO because of mechanical limitations. Think about trying to switch tracks on a record player thousands of times per second; this is whats happening inside a hard drive (under a random IO load). Its amazing mechanical HDDs work as well as they do.

      • Re: (Score:3, Interesting)

        Ummm, most programs are not completely loaded into memory and inactive pages do get swapped out in favor of active pages. While the most active regions of a program are in memory most of the time, having the whole program in memory is not the general case.

        Also, DRAM burns ~8W/GB (more if FB-DIMMS), Flash burns only 0.01W/GB. Thus swapping inactive pages to Flash allows you to use your DRAM more effectively, improving your performance/W.

        From a different perspective: you have a datacenter and you are energy c
    • by clawsoon (748629) on Wednesday June 04 2008, @12:47PM (#23655771)
      We are going to have two layers, but they'll be deeper in the filesystem than that.

      High frequency, low volume operations - metadata journalling, certain database transactions - will go to flash, and low frequency, high volume operations - file transfers, bulk data moves - will go to regular hard drives. SSDs aren't yet all that much faster for bulk data moving, so it makes the most economic sense to put them where they're most needed: Where the IOPs are.

      Back in the day, a single high-performance SCSI drive would sometimes play the same role for a big, cheap, slow array. Then, as now, you'd pay the premium price for the smallest amount of high-IOPs storage that you could get away with.

    • We already have more than two layers.

      Registers, on chip cache, off chip cache, Ram, Flash, HD, LongTerm Low Volatility (tape, CD/DVD etc)

      • by Anonymous Coward on Wednesday June 04 2008, @12:54PM (#23655869)

        when you have servers that stays up for months
        I'm a Windows admin, you insensitive clod!
      • by boner (27505) on Wednesday June 04 2008, @01:15PM (#23656151)
        RAM drive uses DRAM, Enterprise class DRAM ~ $100/GB and uses ca 8W/GB. Enterprise Flash, ~ $30-80/GB and uses 0.01W/GB

        In addition, assume that 90% of ram-drive accesses go to 10% of the storage, you can see that effectively you are burning a lot of energy with zero gain. Multiply by up-time.

        Flash has the potential of greatly improving performance/watt for most servers.
      • by compro01 (777531) on Wednesday June 04 2008, @01:14PM (#23656141)
        I was sure that figure was upwards of a million cycles per sector in modern flash chips.

        Also, throw in wear-leveling and spare sectors. a million writes to a file system sector doesn't mean a million writes to a particular physical sector (could be 1000 writes each to a 1000 different sectors) and when a sector does wear out, it simply gets put out of service and is replaced with a spare one. this same principle is used in mechanical hard drives. if a sector is problematic to read from/write to, it gets marked as bad and the file system sector is remapped to somewhere else.

        SSDs could quite likely last longer than mechanical hard drives in this regard.
        • No marketing or sales executive will ever countenance the adding of "spare" sectors to a disk. If there are 100 billion physical sectors, then by God it's going to say so on the sales info.

          In order to do wear leveling you have to have additional metadata, which will take up additional space on disk. With SSD you pay a premium per byte over a magnetic disk. Do you really want a file system that's not going to make the best use of the space you just bought for an arm and a leg?
        • Re: (Score:2, Insightful)

          by Anonymous Coward
          As capacity goes up, the feature size on flash gets smaller. This means less energy per bit and a thinner dielectric.

          So, as density of flash goes up, write cycle lifetime potentially goes down.

          HDDs have the same issue of bits being less "durable" as capacity goes up. However, the media never wears out for HDD. Furthermore, it is already accepted that there will be many bit errors and these are simply corrected with error correction codes and mapping out bad sectors.

          As far as reliability goes, everybody t
  • by Amiga Lover (708890) on Wednesday June 04 2008, @12:27PM (#23655427)
    Cue up 20 comments going "But what about the limited write cycles, these things will fail in a month" and 500 comments replying "this is no longer an issue n00b"
    • You forgot the 1000 comments prognosticating about SSDs replacing HDDs permanently "any day now" with the added bravado of saying "I knew this would happen! See, I told you!" with 3000 comments replying 'Yeah, but price/performance!", all of which will be replied to with "but price/performance doesn't matter, n00b. Price makes no difference to anyone."

      Then, in a fit of wisdom, a few posters, all of whom will be modded down as flamebait, will say "There's room for both and price/performance does matter, at least for now."
      • Re: (Score:2, Insightful)

        I'm just glad there is enough interest in paying for the performance to keep the development moving at a decent clip, flash really does look like it will have a big advantage for laptop users that are not obsessed with storing weeks worth of video.
      • "Guys like us avoid monopolies. We like to compete." -- Bill Gates
        (parent's sig)

        This is no longer an issue n00b.
  • Good (Score:3, Insightful)

    by brock bitumen (703998) on Wednesday June 04 2008, @12:29PM (#23655457)
    They are trying to push new technology on their high paying customers because they can get a premium since it's a scarce resource, this will drive up production, and down the costs, and soon we'll all be toting massive flash disks all the day

    I, for one, welcome our new flash disk overlords
  • big deal (Score:4, Funny)

    by larry bagina (561269) on Wednesday June 04 2008, @12:37PM (#23655605) Journal
    Most computers come with flash preloaded. I don't know why you'd be browsing the web or watching videos/web comics/ads/etc on a server computer. Maybe they're trying to dumb diwn ti compete with Windows Server 2008.
  • by E-Lad (1262) on Wednesday June 04 2008, @12:58PM (#23655913) Homepage
    Current versions of ZFS have the feature where the ZIL (ZFS Intent Log) can be separated out of a pool's data devices and onto it's own disk. Generally, you'd want that disk to be as fast as possible, and these SSDs will be the winner in that respect. Can't wait!
    • Re: (Score:2, Interesting)

      Current versions of ZFS have the feature where the ZIL (ZFS Intent Log) can be separated out of a pool's data devices and onto it's own disk. Generally, you'd want that disk to be as fast as possible, and these SSDs will be the winner in that respect. Can't wait!

      As far as I know, contiguous writing of large chunks of data is slower for flash drives than plain HDD's. I'm guessing the ZIL is some kind of transactional journal log, where all disk writes go before they hit the main storage section of the filesystem? I don't think you'd get much of a speed bonus. SSDs are only really good for random access reads like OLTP databases.

    • by Anonymous Coward on Wednesday June 04 2008, @01:24PM (#23656313)
      The benchmarks say something like a 200x performance by putting the ZIL onto the an alternate high performance logging device.

      I have been actively researching a vendor who will supply this type of device. Currently we're testing with Gigabyte i-Ram cards, connected in through a separate SATA interface. (Note: Gigabyte are battery backed SDRAM .. but I won't have lost power for 12 hours so it's a non-issue for me)

      Fusion-IO is a vendor who is making a board for Linux - but as near as I can tell the cards aren't available yet, and when they are - they won't work with Solaris anyway!

      The product which Neil Perrin did his testing with (umem/micromemory) with their 5425CN card doesn't work with current builds of Solaris. Umem is also a pain to work with .. they don't even want to sell the cards (I managed to get some off eBay)

      I hope Sun lets me buy these cards separately for my HP proliant servers. Of course if they didn't, this is one thing that might make me consider switching to Sun Hardware! (Hey HP/Dell - are you reading this??)

  • Given that you can get flash disks that hang off pretty much any common bus used for mass storage(IDE, SATA, SAS, USB, SPI, etc.) "Adding a flash storage option" is pretty much an engineering nonevent, and a very minor logistical task.

    If Sun expects to sell a decent number of flash disks, or is looking at making changes to their systems based on the expectation that flash disks will be used, then it is interesting news; but otherwise it just isn't all that dramatic. While flash and HDDs are very different in technical terms, the present incarnations of both technologies are virtually identical from a system integration perspective. This sort of announcement just doesn't mean much at all without some idea of expected volume.
    • by gbjbaanb (229885) on Wednesday June 04 2008, @01:07PM (#23656051)

      "Adding a flash storage option" is pretty much an engineering nonevent
      but as a marketing event its a magnificent and almost unbelievable paradigm-shift approach to a massive problem that's been crying out for a reliable storage-based performance solution for years.
      • Good point. And I strongly suspect that it enables today's Dynamic CIOs to realize unprecedented First-Mover Synergies in the modern Data-Centric Enterprise Solution Space.
    • by boner (27505) on Wednesday June 04 2008, @01:36PM (#23656531)
      Re: "Adding a flash storage option" is pretty much an engineering nonevent, and a very minor logistical task.

      You have no idea what you are talking about. Sun customers demand that the product Sun sells them have known reliability properties and that Sun guarantees their products properly interact with each other. It takes a significant amount of resources to do this validation. At the same time SSDs and HDDs react very differently to load and can have all sorts of side effects if the OS/application is not prepared to deal with them.

  • IOPS (Score:5, Informative)

    by Craig Ringer (302899) on Wednesday June 04 2008, @01:10PM (#23656095) Homepage Journal
    People (read: vendors) now frequently refer to flash storage as superior when IOPs are the main issue.

    From what I've been able to discern this is actually true only in read-mostly applications and applications where writes are already in neat multiples of the flash erase block size.

    If you're doing random small writes your performance is likely to be miserable, because you'll need to erase blocks of flash much larger than the data actually being changed, then rewrite the block with the changed data.

    Some apps, like databases, might not care about this if you're able to get their page size to match or exceed that of the underlying storage medium. Whether or not this is possible depends on the database.

    For some other uses a log-oriented file system might help, but those have their own issues.

    In general, though, flash storage currently only seems to be exciting for random read-mostly applications, which get a revolting performance boost so long as the blocks being written are small enough and scattered enough. For larger contiguous reads hard disks still leave flash in the dust because of their vastly superior raw throughput.

    Vendors, however, make a much larger margin on flash disk sales.

    This article (PDF) may be of interest:
    Understanding Flash SSD performance [managedflash.com]
    (google text version) [209.85.173.104].
  • by UpooPoo (772706) on Wednesday June 04 2008, @02:00PM (#23656909) Journal
    I work in a company that has a few thousand servers running in a few regional data centers. We are looking into SSDs not because of their superior IOPs (this is a mitigating factor vs HDD performance) but because of their low power consumption and low heat dissipation. When you scale your operations reach a scale where you are using an entire data center, heating and power become more and more of a cost issue. Right now we are trying to build some hard data on actual sabings, but there's lots of spin out there that gives you an idea of what potential savings could be. Here are a few interesting links, google around for more information, there's plenty to be had:

    http://www.stec-inc.com/green/storage_casestudy.php [stec-inc.com]
    http://www.stec-inc.com/green/green_ssdsavings.php [stec-inc.com] (You have to request the whitepaper to see this one.)
  • RAID 4, anyone? (Score:4, Insightful)

    by mentaldrano (674767) on Wednesday June 04 2008, @02:31PM (#23657407)
    In the time between now and when SSD becomes cheaper than magnetic storage, might we see a resurgence of RAID 4? RAID 4 stripes data across several disks, but stores parity information all on one disk, rather than distributing the parity bits like RAID 5.

    This has benefits for workloads that issue many small randomly located reads and writes: if the requested data size is smaller than the block size, a single disk can service the request. The other disks can independently service other requests, leading to much higher random access bandwidth (though it doesn't help latency).

      One of the side effects of this is that the parity disk must be much faster than the data disks, since it must service all requests, to provide the parity info. Here SSD shines, with its quick random access times, but poor sequential performance. Interesting, no?
  • Cheaper than RAM? (Score:4, Insightful)

    by AmiMoJo (196126) <mojo@wDEBIANorld3.net minus distro> on Wednesday June 04 2008, @02:39PM (#23657555) Homepage
    At the moment high performance SSDs are still more expensive than RAM. Since a 64 bit processor can address vast amounts of RAM, wouldn't it be even better and cheaper just to have 200GB of RAM rather than 200GB of SSD?

    Okay, you would still need a HDD for backing store, but in many server applications involving databases (high performance dynamic web servers for example) a normal RAID can cope with the writes - it's the random reads accessing the DB that cause the bottleneck. Having 200GB of database in RAM with HDDs for backing store would surely be higher performance than SSD.

    For things where writes matter like financial transactions, would you want to rely on SSD anyway? Presumably banks have lots of redundancy and multiple storage/backup devices anyway, meaning each transaction is limited by the speed of the slowest storage device.
    • by QuantumRiff (120817) on Wednesday June 04 2008, @12:42PM (#23655675)
      This is just a story about SUN doing something that others have already done in for sometime now

      Really? What other top 5 computer manufacturer has been putting flash drives in SERVERS? I've seen a few laptops, but I haven't seen any used in servers or storage systems. (EMC and a few others have announced plans to do it, but haven't released anything AFAK)

      Also, their "thumper" server has 48 drives in it. Would you want to pay around $1000 per drive to fill that up?

      • Re: (Score:2, Informative)

        IBM has them as option for blades and racks servers...

      • by Archangel Michael (180766) on Wednesday June 04 2008, @12:57PM (#23655907) Journal

        Also, their "thumper" server has 48 drives in it. Would you want to pay around $1000 per drive to fill that up?
        Yes. If performance dictated it was necessary.

        Just because you don't want to, doesn't mean everyone else doesn't want to also.
      • $48k? Chump change. I remember back when the company I worked for at the time paid over six figures for a pimped out server back in the late 90's...
      • You're confusing two very different sorts of storage. There is bulk data storage. This is a fileserver for home directories, video archives, piles of email, that sort of stuff. This is the market where the 1TB sas drive thrives. Then there's the database backing store. Almost every customer I've sold to wants a huge number of very fast, very small drives for database backing store. The extra capacity is meaningless, as they have to use so many spindles to get a decent IOPS performance. In this area, selling
    • Re: (Score:3, Insightful)

      Samsung will have Multi Level Cells, which are slower (and cheaper). The Single Level cells are faster (up to twice as fast I think), but more expensive.
            You can go either way with it, but I think faster (and smaller) drives are more attractive than bigger and slower.
            You need to compete against the sequential speed of a 15,000 rpm SCSI drive too (SSD will beat them dead on access speed, but not all workloads are small random reads)