Long-Term Performance Analysis of Intel SSDs 95
Vigile writes "When the Intel X25-M series of solid state drives hit the market last year, there was little debate that they were easily the best performing MLC (multi-level cell) offerings to date. The one area in which they blew away the competition was with write speeds — initial reviews showed consistent 80MB/s results. However, a new article over at PC Perspective that looks at Intel X25-M performance over a period of time shows that write speeds are dramatically reduced from everyday usage patterns. Average write speeds are shown to drop to half (40MB/s) or less in the worst cases, though the author does describe ways that users can recover some of the original drive speed using standard HDD testing tools."
Reader MojoKid contributes related SSD news that researchers from the University of Tokyo have developed a new power supply system which will significantly reduce power consumption for NAND Flash memory.
Damn (Score:1, Funny)
Where is everyone? Oh right. Friday night.
Re:Damn (Score:5, Insightful)
Re: (Score:2)
What new show? :D
I'm in the Netherlands, so I'd like to know (so I can sample it online)
Re: (Score:2)
>>>Dollhouse.
It should be called "The Eliza Dushku Hour" because that's the only reason to watch it. At least it's better than that Terminator show which suffers from Gilligan's Island syndrome (no escape; no story progression). I suspect both shows are headed for the FOX Friday night graveyard along with Firefly, Brisco County Junior, Sliders, Millenium, Brimstone, VR5, Strange Luck, and Sliders. The X-Files is the only show to survive Friday night.
http://www.wallpapergate.com/data/media/77/El [wallpapergate.com]
Re: (Score:1)
Re: (Score:1)
Intel Sexually Transmitted Diseases?
Why? (Score:4, Interesting)
Re: (Score:1)
Re:Why? (Score:5, Informative)
Um... no.
When cells age, they take longer to erase. This happens over 5,000, 10,000 cycles or longer. It's not dramatic, and eventually the cells fail in a way more severe than can be corrected by the ECC.
Because there is a (software) process to bring full speed back to the drive, we can safely conclude that none of the slowdown is related to cell aging or other cell-level issues. It's more of an organization and fragmentation issue.
Re: (Score:2)
It brings it back to full speed, but I'd bet that it returns to the slow state much faster.
Re: (Score:2, Informative)
There was no difference in how long it took to fragment. If we wrote a nasty enough mix of smaller file sizes to the drive, performance would drop right at the point where all flash was written to at least once (i.e. just over the 80GB mark).
After running HDDErase on the drive, it went the same *exact* 80 MB/sec write speed each and every time. Additionally, running successive software secure erasures (writing 0's across all 80GB) showed 0 drop in speed even after 10 passes.
In testing several different SS
Re: (Score:2)
I'm curious -
How much is the speed difference between a new Hard Disk Drive and a fragmented drive? Do HDDs also slowdown with age?
Re: (Score:2)
No they dont. The filesystem can get fragmented, but thats a different matter which also applies to flash.
While they are still working, they will always pump out the same speed.
Re:Why? (Score:5, Informative)
A simple change to 1 byte means a read of the entire 64KiB block that byte is in, a change of the data and then a write of 64KiB.
If the filesystem isn't flash-aware you can suffer a theoretical performance hit of being 65536 times slower because of this.
So what you really need is a filesystem that stores files in 64KiB blocks and groups reads and writes to the same blocks together as 1 operation.
Re: (Score:3, Informative)
I thought it could flip a bit too 1 at will without having to rewrite the whole block but if you want to write a 0 it needs to read the whole block, wipe it out and rewrite it with the same data but with the bit flipped to 0. But I wouldn't really know, I'll use SSDs when they cost about the same as hard drives.
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Unchanged bytes might be slightly more common than changed bytes, but deleting a single byte might require the file to be rewritten. The 'only 0's get flipped' case is not worth giving special consideration.
Re: (Score:2)
The idea is to avoid erases or at least batch them.
The idea is to allocate metadata blocks somewhere to manage the storage. The metadata for a block includes a replacement pointer set initially to all ones. If the block changes, write the replacement block number into that pointer. If a block is released, write 0 to the pointer.
This can be done in the drive logic or can be handled by the OS. The catch is that the device gets slower as time goes on. Eventually, you are forced to defrag the flash management
Re:Why? (Score:5, Informative)
Actually, NAND flash comes in 2 block sizes - small block (16kiB/block, 512bytes/page, 32 pages/block), and large block (128kiB/block, 2048bytes/pages, 64 pages/block).
Also, in NAND flash, a "write" operation can turn a "1" bit to a "0" bit. An "erase" operation turns a "0" bit into a "1" bit. Writes can work at the bit level, erases at the block level. (Though, large block NAND can NOT be partial-page programmed, so you must write 2048 bytes at once, but you can read all 2048 bytes, flip one bit, then write it all back). This characteristic is used by the flash management routines in order to manage the flash block. Marking pages as "discard" or "ready for erase" is done by flipping a 1 bit to 0 since that's easy. You can write a block partially, so you don't have to incur a huge 128kiB write always.
Given this, it's a block device, so you can't write 1 byte anyhow - you must write the sector size, which is emulated as 512 bytes. What normally happens is that the SSD will mark a page as "dirty" to indicate it's not to be used, and remap that page's contents onto a new page elsewhere, thus only performing a 2048 byte write (plus 64 out of band bytes).
Now, what happens when all the blocks are used? The flash routines have to erase a block, but before erasing a block, it has to make sure all the pages within it are "dirty". If there are non-dirty pages, they're copied to another block, and when all non-dirty pages are copied, that block is erased. If your access pattern is such that all the blocks have non-dirty pages, it takes a little while to actually move all the data around to get blocks that can be erased. Do enough random I/O, and this can happen quite easily.
Re: (Score:3, Informative)
Older flash devices allowed multiple writes to one page, but new ones do not.
The higher-density MLC devices do not allow you to read a page, flip a bit to 0 and overwrite it. They require that pages be written just one, and in order.
This is causing no end of frustration for the Microsoft mobile filesystems, which frequently overwrote pages to flag them.
Closer, but.. no. (Score:5, Informative)
NAND blocks are *erased* in large blocks, probably 128KB or larger in this case.
However, the read and write operations occur at a *page* level, not block. NAND pages today are typically 2K or 4KB in size.
So you can read and write in smaller units than 128KB.
However, to erase any byte of the NAND, you have to relocate the preserved data and erase a whole block.
Because these drives operate on huge aggregate arrays of NAND, their block structure may be much larger, or they may have very complicated and smart algorithms to re-map write new data while waiting to perform erases much later.
Re: (Score:2)
This isn't Wikipedia, we call them "KB" here. Thanks.
Re:Why? (Score:5, Informative)
Essentialy what the intel write-combining technology is doing is combining multiple small (4KB) writes into a single block, and letting the old block become fragmented (having a bunch of 4KB holes in it.)
The scenario in the nutshell:
You have a 1MB file and a program which modifies a single 4KB chunk of it. Intels technology marks the original 4KB chunk within its original "block" as erased, and then allocates a new block (using the wear leveling algorithm) to hold the new version of the 4KB chunk and additionally combines it with any other small writer operations that may have recently occured or will recently occur. Up to 128 such 4KB writes can be combined into a single block write.
After this is done many hundreds of thousands of times, however, the drive begins to be in a state where nearly every "block" is only partialy used. The write combiner itself is stuck with whatever the wear leveling algorithm handed it, which is now a partialy used block instead of a fully virgin block. It can no longer combine 128 small 4KB writes together, but maybe only has space to combine 10 of them, or in the worst case scenario.. 1 of them.
Re: (Score:2)
Again, it's only the ERASE unit that is huge -- 64KB, 128KB, or 256KB on the device itself.
You can't erase 4KB alone.
It gets more complicated when you consider huge parallel arrays of NAND, and the complex logical remapping that goes on to give the appearance of a typical 512-byte sector device.
Re: (Score:1)
Nothing is lost, it just goes slower. Due to the fragmentation caused by write combining, the drive has to shuffle more data around when you write to those same areas later on. The flash is still being written at full speed as it copies the data internally, but the drive can only accept new data at the reduced speeds seen.
We're working with Intel to help them reproduce the more significant issue we saw.
Allyn Malventano
PCPer Editor
No fix until ZFS (Score:1)
With the fix for this problem being essentially "nuke the drive and reinstall periodically", there's really no fix until you get a flash-aware filesystem. Too many virtualization layers between your app doing the write, and the bits being flipped.
This could be useful for ETL jobs or other heavy 'batch' type work, as the nature of the access will essentially 'reset' the drive for the next pass.
Re: (Score:2)
TL:DR (Score:4, Insightful)
That article is a multi-page annoyance, the grammar is bad and we already have flash-aware filesystems like jffs2.
Re:TL:DR (Score:4, Informative)
As far as I can tell from some quick googling and checking on Wikipedia, jffs2 isn't much of a competitor at this point, e.g., it's apparently not really usable on flash chips bigger than 512 Mb. Maybe UBIFS or LogFS? None of them seem to be really mature.
Re:TL:DR (Score:5, Interesting)
Re: (Score:2, Interesting)
Will the traditional flash file systems (jffs2) etc. still work when we have SSD's interfacing over SATA? USB sticks don't work with it because they 'pretend' to be a hard drive over USB, and same for the SSD's over SATA. jffs wants the flash device (MTD) interface.
Intel employee Matthew Wilcox spoke at linux.conf.au about some kernel performance improvements related to the Intel SSD drives - redundant ATA calls that have been removed, and allowing larger sector sizes under ATA 8 [lwn.net], so maybe the authors of th
Re: (Score:3, Insightful)
Flash-aware filesystem currently only works on embeded setup where there is direct access to the Flash.
Given the need for compatibility, SSD will always have a controller showing the SSD as a disk, but I agree that it'd be nice if they would add additionnal lower level access in the case the computer is able to use Flash-aware filesystem.
Re: (Score:2)
The article was very useful and, as far I can tell, the author was systematic and did a good research. This conclusions are very useful for everyone using SSD. Did you ever used jffs2? Do you trust your files to it? Come on... At present there is no production reliable, SSD oriented file system available.
Bullshit (Score:1)
Fortunately for us, flash based storage has access times nearly as fast as RAM
My 300MHz DDR bus had 300M x 2 x 8 == 600M x 8 == 4.8GB/s sequential read access speed.
Re: (Score:1)
Re: (Score:2)
Re: (Score:1, Informative)
Access time != bandwidth. But you're right, RAM still has much quicker access times. Still, both seem instantaneous to humans; is a 0.2 ms access time really so bad for most applications?
Re: (Score:2)
Re: (Score:2)
Access time != sequential bulk read throughput.
Think hard drive vs flash drive.
Flash does have "access time" close to RAM, since it doesn't have to seek or do complex addressing.
When you have these huge banks of flash acting as one drive, then "access time" becomes a computational problem of how fast you can look up the physical location of the user's data, based on a logical sector address.
Still faster then mechanically moving a drive head, of course.
File system? (Score:1)
Re: (Score:1)
No I didn't RTA..
Re: (Score:3, Insightful)
One of the biggest challenges of the coming years will be finding and developing filesystems (logical data stores) that take advantage of the strengths of flash memory while deminishing the weaknesses of it.
Our approach today is mapping large banks of Flash to look like a hard drive, and then using a filesystem that is optimized to reduce seek activity. (Cyl/Hds/Tracks-per-Sector..)
EXT3 on SSD, FAT on huge SD cards, it's just shoe-horning our old filesystems onto new media. It makes about as much sense as
Re: (Score:2)
I am waiting for these SSDs (Score:4, Informative)
I am patiently waiting for these SSDs and plan to test them on a MythTV distro box. I will get a fully compatible Linux SSD notebook onto which a MythTV distro will be installed.
Then with 3 TV cards, I will see how these SSDs measure up on reading/writing/transcoding etc. My intention is to work the SSD for about a week. Watch this space for results.
I do not think that Intel will deliver the "golden" SSD. I think Samsung's SSD [samsung.com] effort will bear results faster. Those videos say a lot.
SLC vs MLC (Score:4, Interesting)
It's early days for SSDs. I'll be sticking with my power guzzling magnetic frisbe stacks for a while yet.
Re: (Score:2)
Many cheap SSDs use MLC for obvious reasons,
You hit the nail on the head. This high-price Intel SSD is just a cheap MLC unit with a big brand name and inflated expectations.
I'm usually quick to adopt new tech, but I'm still not satisfied with SSDs. I can't think of a good reason to use one in a desktop PC right now. They're slow, they die young, and they have perverse design quirks like oversized pages that result in internal fragmentation, and oh yeah, they cost an arm and a leg. Once someone releases an SSD that solves ALL of those sticky point
Re: (Score:2, Informative)
Once someone releases an SSD that solves ALL of those sticky points, and ideally delivers enough random-access throughput to saturate the 300MB/s SATA line (or whatever bus is mainstream by then), that's when I'll jump on board.
Well, like myself, you will be waiting for a non-flash based SSD then.
Inevitably, something like PRAM [wikipedia.org] will displace Flash, and it can't happen soon enough. Until then, I would much rather see some of that fab capacity reclaimed for DRAM production.
Re: (Score:3, Interesting)
I've been using an Intel SSD as a boot drive and I think it's worth every penny so far. I have a few programs and games on the boot drive and they all load up considerably faster than the alternatives. I don't care about write speeds. Their size alone means they're not really meant for storage yet, so using it as such is a bit retarded. If you're doing a lot of write operations to your SSD, you should probably think about moving that file(s) to a different storage device.
Re: (Score:2)
Small vs Large (Score:2)
MLC brings more density to the table. That's the only reason they do it. Smaller die size and storage density means more MB per dollar
SLC would be a much smaller capacity drive for the same money. It would be faster at writing, but probably too expensive or too small to have many adopters.
Same reason SLC is all but unheard-of in thumbdrives. (IronKey being one exception.)
Re: (Score:2)
I'm trying to snatch up some last-gen Mtron SSDs. The Mobi 3000 looks especially good for the price. They're old and only 16GB, but they're also SLC server SSDs, with about 100MB/sec read and 80MB/sec write.
Cant wait for them to drop under $100!
Re: (Score:2)
I think a major hidden problem with MLC is not so much read or write speed, but data integrity. If an MLC chip is storing 4-bits per cell, that's 16 discrete VT levels that need to be detected to resolve the stored info. Couple that with increasingly smaller cell sizes and it would seem to me that even very low levels of gate leakage could lead to bit errors.
I cringe at the thought of using an MLC based SSD to store important data and then having it basically bit rot due to gate leakage (an effect which i
Re: (Score:1)
Re:SLC vs MLC (Score:5, Insightful)
To me, MLC has a conceptual problem of going against the fine tradition of binary computing, which is all about data integrity. Why don't we go back to analog computers for even higher densities, while we're at it.
Says someone that obviously has never seen the raw output of a HDD read head, or the optical laser in a DVD reader. The real world was always very ugly and analog, there's a helluva lot going on to give you a 0 or 1 answer.
Re:SLC vs MLC (Score:5, Informative)
Re: (Score:2)
Not ready for prime-time (Score:1)
This is unfortunately another case to show that SSDs are not ready for prime-time. With that said, I'm anxiously awaiting the ability to buy a super-fast 120GB+ SLC drive once prices drop below $400.
I just hope that Microsoft and Apple come up with some great software enhancements for handling SSDs ASAP.
It is hard for me to believe that the two OS giants can't release their upcoming software in a way that is totally SSD optimized. They are kidding themselves if they don't think that conventional mechanica
Certainly ready for prime-time (Score:2)
For some time now all my storage needs are satisfied in their entirety by SSDs and I have no HDDs now. Certainly much better than my previous 10,000 RPM hard disks, so I think they are ready for the prime time.
Re: (Score:2)
I believe there is so much misconception out there about flash memory performance, it's astonishing. There just isn't a good understanding of how all the layers of cache in the OS work.
SSD's are not slow, do not "die young". I just built a new system with 3 SSD's in RAID0 and I'm getting 350MB/sec sequential read performance, and nearly 250MB/sec sequential write. In fact, I'm less worried adding additional drives in RAID0, because they fail by total wear, not a single point of a failure, and if the wear is
Re: (Score:2)
In a 3 drive RAID 0 array 350MB/s isn't that impressive. That's 116 2/3 MB/s per drive which is only slightly higher than the 1TB SATA spindles that were on sale months ago for less than a SSD. (Those speeds were maximums, though. Min was in the upper 60's I believe.)
That performance is cool but its price premium is huge for a drive that provides only a fraction of spindles' capacity. Until they drop in price or are handled more efficiently to improve their performance, they may be ready for prime time but
Re: (Score:2)
The benchmarks I've seen on 1TB drives show about 80MB/sec average, and it goes from 60MB to 100MB/sec depending on the location of the read on the drive, SSD's don't have this non-linearity. Also, the 350MB/sec is reaching the limitations of the RAID controller, a single drive is about 150MB/sec, so it starts scaling down a little when you add more. I will probably try a PCI express adapter that has more bandwidth in the future.
Still, the main speed boost is in the latency. 0.2ms vs 8ms is a huge leap. Als
Re:There's got to be some writable space here... (Score:5, Insightful)
That's so oversimplified as to be completely wrong.
The number of write/erase cycles on NAND is significantly less than a hard drive. Typical devices are rated for 10,000 cycles. Bleeding-edge MLC parts can be as low as 5,000 or 7,000 erase cycles.
But.. a well-designed device will perform accurate wear-levelling across all the available blocks, so it doesn't matter what kind of access the user performs -- the whole device will wear evenly.
There are indeed reserve blocks to mitigate premature death of some parts.
But, the most important part is the ECC mechanism. The parts don't just wear out and die, they get an increasing bit error rate. By overdesigning the ECC logic, you can squeeze longer life out of the parts.
It does not play guess and check.. well-recognized error correction algorithms like Reed-Solomon or BCH are used with really high detect/correct rates.
Once you have accurate wear levelling, excellent ECC, and some manner of failure prediction, then it doesn't make so much sense to keep all your flash "in reserve" ready to swap out other parts wholesale. You might as well involve all the parts in the mix, so you get longer wear throughout.
Still better than the alternative (Score:2, Informative)
Looking at the big picture, I'd rather have a slow SSD than keep dealing with the data losses of (criminally unreliable) HDs.
Pffft! My Eee PC is faster than that.... (Score:2)
I got one of these in it: http://www.youtube.com/watch?v=zisx4mLF6Qo [youtube.com]
As to the second link, the power saving one... (Score:2)
A boost converter is FAR from new.
http://en.wikipedia.org/wiki/Boost_converter [wikipedia.org]
They have not invented a new power supply system. They are just suggesting it be applied to NAND and the high voltage needed be fed into the chip from a central supply, instead of having a charge pump (switching capacitors) in each NAND chip.
I'm not certain this will save power, but it will reduce peak currents because when the charge pumps switch on in the NAND chips, it creates a huge (but short) current spike. And if you write
Big opportunity for Linux (Score:2)
As people push for smaller laptops with longer battery life and as flash memory continues to drop in price and power requirements and to gain in raw performance, it makes less and less sense for people to use mechanical hard drives in laptops. But as this article shows, the drive's logic can only do so much to try to maintain performance while appearing to the OS to be just a regular hard drive. Using a direct flash interface and a flash file system like UBIFS, YAFFS2, or LogFS should provide Linux netbooks
Hw Wear leveling is BAD (Score:1)
Hardware CAN'T know what areas are used and what aren't. So those hw workarounds for not using a real flash filesystems can't work well.
I can't understand why those people are still spending money in producing such complex brain damaged products. Give us complete access to flash chips, let the OS do the right thing. Legacy operating systems like Windows? Just give a driver and the flash file system in bundle! I'm sure performance and cost outweight the inconvenience of installing it. Maybe use a replaceable
...workarounds for not using a real filesystems (Score:2)
This is what happens when your OS is closed/proprietary.
Re: (Score:1)
You do realize that there is software in the hardware that is lying to the software at higher layers, right?
Even spinning disks present a virtual interface to the hardware.
Re: (Score:1)
You do realize that there is software in the hardware that is lying to the software at higher layers, right?
Even spinning disks present a virtual interface to the hardware.
I understand that there IS a layer even on hard disks.
The problem is that the layer for SSD is currently the wrong one (the hard disk one)
In fact, the linux layer for MTD devices is completely different and more complex, and flash filesystems are very specific for that layer so you can't use them for normal block devices.
I don't say you don't need a layer. I say that using the wrong layer and reversing it in hardware is excessively WRONG.
SSDs and forensics (Score:2)
http://www.youtube.com/watch?v=WcO7xn0wJ2I [youtube.com]
my SSD performance results (Score:2)
Okay, I just ran this benchmark on my 3 RAID0 SSD array....
From 0.5KB to 128KB write performance (in KB/sec)
3928
7368
12579
19931
48306
83492
143772
233510
252352
The reason for the low performance on small block sizes is the option called "Direct I/O" on ATTO Disk Benchmark. What this probably does is turns of your system's caching capability, so of course you are going to get ridiculously slow rates. It's good for comparison, but to say you're system is going to be slow because of it is ridiculous because in the r
SSD (Score:1)