New Middleware Promises Dramatically Higher Speeds, Lower Power Draw For SSDs 68
mrspoonsi (2955715) writes "A breakthrough has been made in SSD technology that could mean drastic performance increases due to the overcoming of one of the major issues in the memory type. Currently, data cannot be directly overwritten onto the NAND chips used in the devices. Files must be written to a clean area of the drive whilst the old area is formatted. This eventually causes fragmented data and lowers the drive's life and performance over time. However, a Japanese team at Chuo University have finally overcome the issue that is as old as the technology itself. Officially unveiled at the 2014 IEEE International Memory Workshop in Taipei, the researchers have written a brand new middleware for the drives that controls how the data is written to and stored on the device. Their new version utilizes what they call a 'logical block address scrambler' which effectively prevents data being written to a new 'page' on the device unless it is absolutely required. Instead, it is placed in a block to be erased and consolidated in the next sweep. This means significantly less behind-the-scenes file copying that results in increased performance from idle."
Re:Wear leveling (Score:5, Informative)
Sandisk's CompactFlash memory cards ( intended for professional video cameras ) seemed to make great SSD's for older DOS systems when fitted with a CF to IDE adapter. I can format smaller CF cards to FAT16 ( using the DOS FDISK and FORMAT commands very similar to installing a raw magnetic drive ). With the adapter, the CF card looks and acts like a magnetic rotating hard drive. I had a volley of emails between SanDisk and myself, and the gist of it was they did not advertise using their product in this manner, and they did not want to get involved in support issues, but it should work. They told me they had wear leveling algorithms in place, which was the driving force behind my volley of emails with them. I was very concerned the File Allocation Table area would be very short lived because of the extreme frequency of it being overwritten. I would not like to give my client something that only works for a couple of months - that goes against everything I stand for.
So, I have a couple of SanDisk memories out there in the field on old DOS systems still running legacy industrial robotics... and no problems yet.
Apparently the SanDisk wear-leveling algorithms are working.
I can tell you this works on some systems, but not on others, and I have yet to figure out why. I can even format and have a perfectly operational CF in the adapter plate so it looks ( both physically and supposedly electronically ) like a magnetic IDE drive in one system
Re:Wear leveling (Score:5, Informative)
Re: (Score:1)
Re: (Score:2)
I believe Advantech will still happily sell you ISA backplanes. At the same time I put these things together, I had to reverse-engineer and fabricate some old I/O cards which had "unique" (incompatible with readily available cards) interrupt register mappings, also with EAGLE - great software!
I should mention: the MS-DOS system has outlasted three replacement attempts (two windows-based applications were from the original vendor who sold the MS-DOS system). There's just something completely unbreakable abou
Re: (Score:1)
IBM ThinkPads want both ATA SECURITY and UNLOAD IMMEDIATE. If they don't detect it, they will bitch about it.
Re: (Score:1)
I was wondering why my ThinkPads would not see these.
Not wear leveling. (Score:5, Interesting)
Wear leveling is typically a system by which you write new data to the least-written empty block available, usually with some sort of data-shuffling involved to keep "stagnant" data from preventing wear on otherwise long-occupied sections. It sounds like this is a matter of not erasing the block first: For example if the end of a file has used 60% of a block and is then deleted, the SSD can still use the remaining 40% of the block for something else without first deleting it. Typically, as I understand it, once a block is written that's it until its page is erased - any unused space in a block remains unused for that erase cycle. This technique would allow all the unused bits at the end of the blocks to be reused without an expensive erase cycle, and then when the page is finally ready to be erased all the reused bits on the various blocks can be consolidated to fill a few fresh blocks.
It seems to me this could be a huge advantage for use cases where you have a lot of small writes so that you end up with lots of partially filled blocks. Essentially they've introduced variable-size blocks to the SSD so that one physical block can be reused multiple times before erasure, until all available space has been used. Since erasing is pretty much the slowest and most power-hungry operation on the SSD that translates directly to speed and power-efficiency gains.
Re: (Score:2)
You're incorrect. Writes can only happen at the page size, but there are multiple pages per block. If a block has unwritten pages, you can still write to the remaining pages.
Re: (Score:2)
You are correct, I got the terms switched in line with the confusion in the summary. Reread with that in mind and I think you'll find the rest is in order (i.e. they are rewriting a partially used page)
Re: (Score:2)
Are you certain? That sounds like what they're describing, and certainly the individual bits are capable (you're still just setting some of the bits that were reset in the last erase cycle), the rest is just the control hard/software. It's the reset that needs to be handled specially, so long as you are only setting bits that haven't been altered since the last erase there's shouldn't be a problem. It seems to me that, at the crudest, you could simply read a partially filled block, add extra data to the
Re: (Score:2)
Actually filesystems are typically *allocated* in 4k increments, but not necessarily *written* in such, it's easy enough on a magnetic drive to write only three bytes in the middle of a file, or only the bytes actually used in the last allocation block of each file, though caching systems may obscure that fact.
As for the writing mechanism, you're right, it would likely be a bit more complicated. On further reflection I would suspect that they wouldn't bother reading a block at all, just write the new data
Excuse my naiveté (Score:3)
Could the incoming data be written first in either a RAM or SLC cache while the formatting is going on ?
Re: (Score:2)
It could, but if you're writing large amounts of data (considerably larger than your write cache) that won't actually help much. It also doesn't change the number of erasures required to get the data written, which is the primary speed and power bottleneck.
This technique is sort of like using that blank corner on a piece of scratch paper before you throw it away - the blank spot was there anyway, and by making a habit of reuse you can significantly reduce the number of fresh sheets of paper (erasures) that
Re: (Score:2)
Re: (Score:2)
Certainly - and if you're typically writing one huge file all at once this will have minimal benefit. But if you're filling the cache with lots and lots of small writes then this technique has potential.
Re: (Score:2)
Only the last block of a file will have a "random" chance of usage.
Sure, BUT... blocks on SSDs can be as large or 16k and even larger. That's a lot of wasted space, especially if you have lots of small files.
The real underlying issue here, though, is the number of lifetime write-cycles. Newer SSD technology (MLC in particular) actually made the number smaller, not larger. When it really, really must get larger before SSDs will be mature. That's the central reason why all these workarounds are necessary in the first place. And that's what they are: work-arounds.
Maybe
Re: (Score:2)
Not even close to practical. The magnetic disk manufacturers implemented wear leveling back when the drives were in the 200MB-range. Before that disks wore out even quicker than flash disks and I didn't even use swap-files then.
There is a huge difference between unlimited number of writes and undefined number of writes.
In critical applications, a bad number is better than an undefined one. At least you can calculate a life-time and design after that.
No, sir. HDDs (at least up until I stopped writing FW for them in 1999) did not have any wear leveling algorithms. In other words, the translation of LBA to physical location on the media (sometimes called Physical Block Address or PBA) is fixed, other than for defective sectors which have been remapped. So if an O.S. wrote to a specific LBA or range of LBAs repeatedly (think paging/swap file or hibernate file), those PBAs would be written to more frequently (or at least at a different rate) than other P
Already being done (Score:3)
Most flash drives have some RAM cache and most erasing is done as a background task by the on-board firmware of the drive. Part of flash drive reliability has to do with having big enough capacitors on board so a powerfailure will allow the drive to write enough data to flash to have a consistent state for at least it's own bookkeeping data on blocks and exposed data. The enterprise ones usually have enough capacitors to write all data to flash that has been reported to the OS as "we wrote this to the drive
crappy journalism as always (Score:3, Informative)
http://techon.nikkeibp.co.jp/english/NEWS_EN/20140522/353388/?SS=imgview_e&FD=48575398&ad_q
they came up with a better scheme for mapping logical to physical. however, the results aren't as good as all the news sources say.
Compared To What? (Score:5, Insightful)
I don't doubt that the researchers have hit on something interesting, but it's hard to make heads or tails of this article without knowing what algorithms they're comparing it to. The major SSD manufacturers - Intel, Sandforce/LSI, and Samsung - all already use some incredibly complex scheduling algorithms to collate writes and handle garbage collection. At first glance this does not sound significantly different than what is already being done. So it would be useful to know just how the researchers' algorithm compares to modern SSD algorithms in both design and performance. TFA as it stands is incredibly vague.
Re: (Score:1)
It was tails
Re: (Score:3)
Fragmentation of data doesn't even affect the speed.
Is this completely true? Because benchmarks show that even SSDs can read larger chunks much faster than small ones. So if a big file exists mostly on adjacent flash cells, it would be faster to read? Of course operating system -level defragmentation might not be very useful because the physical data might be mapped into completely different areas due to wear leveling. Thus the drive would have to perform defragmentation internally.
Re: (Score:2)
Because benchmarks show that even SSDs can read larger chunks much faster than small ones
Well, why shouldn't prefetching and large block reading work on the SSD controller level? I assume that Flash chips are still slower than DRAMs, and the controller has to do some ECC work, not to mention figuring out where to read from (which may also be something that is kept in Flash chips in non-volatile form, unless you want your logical-to-physical mapping completely scrambled when the drive is turned off). So prefetching the data into the controller's memory should help hide latencies even if the Flas
Re: (Score:2)
If you're reading a lot of small files, that's a lot of open/read/close commands. If you're reading a big file, that's one open command, multiple sequential read commands, one close command.
And if it's anything like SPI, there's not even multiple read commands, you just keep clocking to read the data sequentially.
Re: (Score:2)
I'm not sure about NAND flash, which is a block device, but in NOR flash sequential reads are faster due to prefetching, where the next memory word is read before the CPU has finished processing the first one. For NAND, I'd imagine you could start caching the next page. Not sure if that's actually done, though.
Re: (Score:2)
If data is fragmented over multiple blocks, It requires mulitple reads. But this kind of fragmentation is not as bad as HDD where you had a seek time of 7-8 ms. Matching the block size of the SDD to the block sie of the FS is an effective performance enhancement.
Modern SDD have read limits. Every 10.000 reads or so the data has to be refreshed. The firmware will do this silent.
Re: (Score:1)
> Modern SDD have read limits. Every 10.000 reads or so the data has to be refreshed. The firmware will do this silent.
Please provide reference(s). I have never seen any indication of this, or at least there is no read limit for the flash memory itself. You can read from it indefinitely just like static RAM, without "refresh" as required for DRAM.
Re: (Score:2)
Google: flash read disturb [google.com]
The Micron presentation [micron.com] is rather old, but gives a good overview of how Flash works.
Re: (Score:2)
GP is correct about read disturb. NAND vendors will specify specific policy for a given part, but it is typically N reads to a particular area (i.e. one block, which is 256 or 512 or 1024 pages) then requires erasing that area. So even if page 7 in a block is never read, but page 100 is read a lot, the drive will have to rewrite that whole block eventually.
(I work for a NAND controller vendor.)
Re: (Score:2)
Original (Score:2)
Wear leveling (Score:2)
Re: (Score:3)
Per how big data areas is wear leveling performed in an SSD? Maybe not for each 4kB block...
IIRC the erase/write block size is typically 128KB.
Re: (Score:2)
It was 128 KB for smaller, older drives. For instance, the Samsung 840 EVO series use an erase block size of 2 MB. Some devices even have an 8 MB erase block size. 8 KB page sizes are common now, too, much like how spinning rust moved to 4 KB pages. Using larger pages and blocks allows for denser, cheaper manufacturing.
Re: (Score:2)
Not a word of that is true (Score:1)
Am I the only one that knows that's not remotely true? I don't even know where to start. So the SSD wants to write to location 0x00000032 but it's occupied by old data. First of all, no it isn't. TRIM already took care of that. But let's say you're using the SSD in Windows XP so TRIM doesn't work. So they claim the SSD writes data
Re: (Score:2)
So they claim the SSD writes data to a blank location on the drive temporarily, then erases the original intended location and later moves it back to that location to be contiguous? What's so damn special about that location? Just leave it in the blank location. They claim that causes fragmentation, which has no impact on the performance of an SSD in any way.
This is a useless invention from people who don't know how SSDs work.
You are correct. SSDs don't have a fixed LBA-to-physical arrangement, so host rewrites of an LBA will normally go to a new (erased) NAND location, with the drive updating its internal LBA map automatically (I.e. no need for TRIM of that LBA).
Re: (Score:2)
However, they have gotten "good enough" for most use cases. Though I agree that it is in the "just barely" category. Limited rewrites are their one major problem at this time. If that can be improved, it would be a great advance for us all.
Re: (Score:2)
http://techreport.com/review/2... [techreport.com]
tl;dr - the Samsung 840 series is the only drive to really suffer problems but that's strictly relatively speaking; it's allocating from reserve capacity and to reach the point it's at now you'd have to have 150 gb of writes per day for 10 years, which is probably at least an order of magnitude higher than even a heavy standard user. And that's the consumer version -- the Intel ssd, aimed more at production / business environmen
The problem with this article... (Score:2)
...is that in a properly-designed SSD, there is no such thing as data fragmentation. You lay out the nand as a circular log and write to every bit of it once before you overwrite, and maintain a set of pointers that translates LBA to memory addresses.
Pretty much every SSD vendor out there has figured this out a few years ago.
Re: (Score:2)
Re: (Score:2)
That's where overprovisioning and write-amplification come into play. The head NEVER meets the tail - the circular log is larger than the advertised size. E.g., a 120GB (120,000,000,000 byte) SSD would have 128GiB of flash. That differ
LWN? (Score:2)
Have I stumbled into a new green themed version of LWN? The comments here are far too insightful and interesting for the usual /. fare. Can't even find the frist post.
bad idea - thrashing directory blocks (Score:2)
I've written drivers for solid state media. It is a cost to find the the "next available block" for incoming data. Often, too, it is necessary to copy the original instance of a media block to merge new data with the old. Then, you can toss the old block into a background erase queue, but the copy isn't time-free, either.
Since so-called Smart Media didn't have any blocks dedicated to the logical-physical mapping (It was hidden in a per-physical-block logical id), there was also a startup scan required.
If