Intel's First SSD Blows Doors Off Competition 282
theraindog writes "Intel is entering the storage market with an ambitious X25-M solid-state drive capable of 250MB/s sustained reads and 70MB/s writes. The drive is so fast that it employs Native Command Queuing (originally designed to hide mechanical hard drive latency) to compensate for latency the SSD encounters in host systems. But how fast is the drive in the real world? The Tech Report has an in-depth review comparing the X25-M's performance and power consumption with that of the fastest desktop, mobile, and solid-state drives on the market."
More Details and Benchmarks Here (Score:5, Informative)
The PCMark Vantage tests are especially impressive: http://www.hothardware.com/Articles/Intel-X25M-80GB-SATA-Solid-State-Drive-Intel-Ups-The-Ante/?page=7 [hothardware.com]
More details and Benchmarks here (Score:4, Informative)
Benchmarks start here: http://www.hothardware.com/Articles/Intel-X25M-80GB-SATA-Solid-State-Drive-Intel-Ups-The-Ante/?page=4 [hothardware.com]
Re:but is it fast enough (Score:5, Informative)
That depends entirely on what kind of RAID [wikipedia.org] we're talking about...
Re:One test they never run - FRAGMENTATION (Score:5, Informative)
Re:Well, a step in the right direction (Score:2, Informative)
Replying to you, since you seem serious, as opposed to sibling.
That's $600 per 80GB drive, with a minimum order of 1000.
You can't buy a single drive for $600. Or at least, not from Intel.
Re:but is it fast enough (Score:5, Informative)
Re:One test they never run - FRAGMENTATION (Score:5, Informative)
Yes, it would wear the disk out faster, but your original premise is flawed.
Clustering locations would allow for accessing large chunks of data with one fetch, instead of lots of little fetches. If you're old enough, think back to the Blitter on the Amiga and moving contiguous chunks of memory as opposed fragmented blocks.
Remember, RAM can get fragmented just as badly as a hard drive.
Re:System boot time goes from 43 secs to 37 second (Score:2, Informative)
Re:One test they never run - FRAGMENTATION (Score:4, Informative)
You can't grow a file in the middle. There don't exists any filesystem call that can do that.
Fragmentation only happens if you append to a file, but that kind of fragmentation should not be a problem for ssd, because all blocks(Except the last) will be full, and ssd don't read the 'next' block, any faster then any other black.
Re:One test they never run - FRAGMENTATION (Score:5, Informative)
A good SSD has wear-leveling and write-combining techniques that keep the SSD "defragmented" automatically.
And it doesn't matter if the FS clusters are far apart as long as they are close to the SSD's hardware cluster sizes or the SSD intelligently combines them (which is what I believe Intel is doing since they claim a write amplification of only 1.1).
It's possible that the Samsung SLC chip stores data for the wear-leveling and write-combining operations which would remap the MLC in a non-fragmented way.
BTW, let me give you a naive wear-leveling / write-combining algorithm. I'm sure Intel has a better one because they've invested millions of dollars of research and the one I'm about to present to you could be done by a CS101 student:
1) You have a bit more than 80GB free for an 80GB drive (extra memory to take care of bad sectors just like a normal hard drive plus a small amount of required for the wear-leveling / writecombining)
2) You treat most of the storage as a ring buffer that consists of blocks on two levels: the native block size and a subblock size. The remaining storage (or alternate storage which may be the Samsung SLC chip on the MLC drives) is used to journal your writes and wear-leveling.
3) You combine all writes aligned to the subblock size into a native block and write them out to the next free native block in the ring buffer and keep a counter for the write to the block. If you run into a used block, and increment a counter (for wear levelling) and if the counter is below a certain value, you skip it to the next free block, otherwise you move the used block (which has been stagnent) to a more frequently writtento free block (which will now take less of a burden since it's had a stagnant block moved into it).
4) Anytime you make a write, the new sectors are updated in the memory area used for journaling / wear-level / sector remapping.
Assuming your reads can be done fairly quickly at the subblock level, it never matters if you have to "seek" for the reads and the drive won't fragment on writes because they are combined into native block sizes.
Re:Well, a step in the right direction (Score:5, Informative)
Before rushing to buy these for database use, I would want a good look at MTBF values. Especially MTBF values for really heavy use, which may be completely different from estimated desktop use.
Re:Gonna Take a Little While Yet (Score:3, Informative)
Write rates aren't THAT impressive, good but meh.
Less heat depends on the device, I've seen plenty of HOT SSDs, presumably due to the density of silicon in them and being first generation devices
Better power consumption ... where? Every SSD I've seen doesn't have a power saving mode, in power saving mode, as a general rule, mechanical drives are less hungry than SSDs.
They are really only compelling if you need fast seek times or for use in a laptop where shock (head strikes) is a potential issue at this point in time.
Re:Well, a step in the right direction (Score:4, Informative)
It's running on 4 SCSI-320 Cheetah 32GB, 15K RPM drives in RAID 0.
I hope you know how volatile RAID 0 can be. A problem with any single one of those drives will screw up the whole works until you can restore from a backup. I can understand wanting to avoid RAID 5/6 if there are a lot of writes to your DB as performance of those arrays in writes are notoriously bad and RAID 1 would be a doubled hardware cost increase, but the ability to stay up and hot swap in drives after a failure is priceless.
Re:Oh Yeah? (Score:3, Informative)
--sabre86
Re:Gonna Take a Little While Yet (Score:4, Informative)
Here's my concern in a nutshell:
Assuming a degenerate workload, with a naive algorithm that never remaps existing data except when it is written, death is swift. Assume a 256 KB flash block. Assume a 4 GB flash device with 2% spare. Assume 70 MB/sec. transfer rate. Assume TCQ/NCQ so that you can queue up requests without waiting for the previous request to complete. At 2%, you have about 81.92 MB of spares, or about 328 spares. You have to erase a block containing 256KB at once (one entire flash block). Write random data on a single data block over and over without caching. At 70 MB/sec. divided by a 256 KB block, you can write 280 blocks per second. That comes to about 1.17 seconds to go through all of the spares once. With a 10,000 erasure limit, that means you destroy all the spares in 2.38 hours. At that point, no further writes can occur because erasing and rewriting a block in place is inherently unsafe. Obviously for a 60 GB disk, multiply the numbers by 15. Even with 100,000 cycle flash, one could kill a drive with a naive algorithm in about four months. Okay, so it wouldn't be quite that fast because you'd have to issue write cache flush instructions between each write, but you're in the ballpark.
On the flip side, with a typical workload, a drive would likely last several years even with such a naive algorithm. This is why I'm concerned. It is quite possible for a company to implement a remarkably naive wear leveling algorithm and mostly get away with it except for a few unlucky people who end up with data loss. We saw this in the HD industry not too long ago with IBM claiming after the fact that their drives were not designed for continuous use. With such a history of reliability corner-cutting from storage vendors, I think there's good reason to expect better transparency from the flash drive vendors about how they are doing wear leveling, particularly if these products are expected to be used in enterprise installations as this drive supposedly is. Fool me once and all that....
I won't even get into the question of how one can possibly achieve anything approaching a 1.1 write amplification rate short of building custom flash chips that allow per-page erasure.... Maybe for certain synthetic workloads, but not for a degenerate workload (e.g. write blocks sequentially with a stride length of the same size as (or larger than) the physical flash block size until you exceed the capacity of the write cache, rinse, repeat).... Otherwise, that seems at least an order of magnitude lower than is plausible. I'd have to see white papers explaining exactly how they're doing this miraculously good wear leveling before I'd trust any low-cycle-count SSDs in anything resembling a production server....
Re:Well, a step in the right direction (Score:1, Informative)
You should probably check out Texas Memory Systems. They sell a number of solutions to your problem.
Re:Commercial uses don't fragment (Score:1, Informative)
There is no concept of "modify a physical block" in flash devices. Most flash devices will make a new block when ever you modify even a byte in it. It's done for wear-leveling and also how flash can only clear whole groups of block or write a block. No clearing a single block and writing it again.
I know I'm not explaining it well. Just google JAFFS and YAFFS2 and see how they work. I believe there is one good article from IBM.
Ah! here it is - http://www.ibm.com/developerworks/library/l-flash-filesystems/
-1tsm3