New Middleware Promises Dramatically Higher Speeds, Lower Power Draw For SSDs 68

Posted by timothy on Saturday May 24, 2014 @07:30AM from the well-it-sounds-good dept.

mrspoonsi (2955715) writes "A breakthrough has been made in SSD technology that could mean drastic performance increases due to the overcoming of one of the major issues in the memory type. Currently, data cannot be directly overwritten onto the NAND chips used in the devices. Files must be written to a clean area of the drive whilst the old area is formatted. This eventually causes fragmented data and lowers the drive's life and performance over time. However, a Japanese team at Chuo University have finally overcome the issue that is as old as the technology itself. Officially unveiled at the 2014 IEEE International Memory Workshop in Taipei, the researchers have written a brand new middleware for the drives that controls how the data is written to and stored on the device. Their new version utilizes what they call a 'logical block address scrambler' which effectively prevents data being written to a new 'page' on the device unless it is absolutely required. Instead, it is placed in a block to be erased and consolidated in the next sweep. This means significantly less behind-the-scenes file copying that results in increased performance from idle."

New Middleware Promises Dramatically Higher Speeds, Lower Power Draw For SSDs

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 68 Comments Log In/Create an Account

Comments Filter:

- Re:Wear leveling (Score:5, Informative)
  
  by anubi ( 640541 ) writes: on Saturday May 24, 2014 @08:14AM (#47082447) Journal
  
  I was looking into that when I was checking out alternatives to sub-gigabyte hard drives to keep legacy systems ( DOS and the like ) alive.
  
  Sandisk's CompactFlash memory cards ( intended for professional video cameras ) seemed to make great SSD's for older DOS systems when fitted with a CF to IDE adapter. I can format smaller CF cards to FAT16 ( using the DOS FDISK and FORMAT commands very similar to installing a raw magnetic drive ). With the adapter, the CF card looks and acts like a magnetic rotating hard drive. I had a volley of emails between SanDisk and myself, and the gist of it was they did not advertise using their product in this manner, and they did not want to get involved in support issues, but it should work. They told me they had wear leveling algorithms in place, which was the driving force behind my volley of emails with them. I was very concerned the File Allocation Table area would be very short lived because of the extreme frequency of it being overwritten. I would not like to give my client something that only works for a couple of months - that goes against everything I stand for.
  
  So, I have a couple of SanDisk memories out there in the field on old DOS systems still running legacy industrial robotics... and no problems yet.
  
  Apparently the SanDisk wear-leveling algorithms are working.
  
  I can tell you this works on some systems, but not on others, and I have yet to figure out why. I can even format and have a perfectly operational CF in the adapter plate so it looks ( both physically and supposedly electronically ) like a magnetic IDE drive in one system ... but another system ( say an old IBM ThinkPad ) won't recognize it. However a true magnetic drive swaps out nicely - albeit the startup files may need to be changed from one system to another.
  
  - Re:Wear leveling (Score:5, Informative)
    
    by csirac ( 574795 ) writes: on Saturday May 24, 2014 @08:27AM (#47082491)
    
    Many industrial computers have CF-card slots for this very application. I put together a few MS-DOS systems using SanDisk CF cards around 8 years ago and they're still going strong, using a variant of one of these cards which has a CF slot built-in (so no need for a CF -> IDE adapter): PCA-6751 [advantech.gr]
    
    - Re: (Score:1)
      
      by anubi ( 640541 ) writes:
      
      Thank you for the link, CSI! I did not know about that one. It looks like a very handy little board that can retrofit into other ISA systems. ( Yes, I can get desperate enough to fire up Eagle and layout a custom ISA motherboard for something like this if the dying dinosaur is important enough ).
      - Re: (Score:2)
        
        by csirac ( 574795 ) writes:
        
        I believe Advantech will still happily sell you ISA backplanes. At the same time I put these things together, I had to reverse-engineer and fabricate some old I/O cards which had "unique" (incompatible with readily available cards) interrupt register mappings, also with EAGLE - great software!
        I should mention: the MS-DOS system has outlasted three replacement attempts (two windows-based applications were from the original vendor who sold the MS-DOS system). There's just something completely unbreakable abou
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    IBM ThinkPads want both ATA SECURITY and UNLOAD IMMEDIATE. If they don't detect it, they will bitch about it.
    - Re: (Score:1)
      
      by anubi ( 640541 ) writes:
      
      Thanks!
      
      I was wondering why my ThinkPads would not see these.
- Not wear leveling. (Score:5, Interesting)
  
  by Immerman ( 2627577 ) writes: on Saturday May 24, 2014 @10:08AM (#47082767)
  
  Wear leveling is typically a system by which you write new data to the least-written empty block available, usually with some sort of data-shuffling involved to keep "stagnant" data from preventing wear on otherwise long-occupied sections. It sounds like this is a matter of not erasing the block first: For example if the end of a file has used 60% of a block and is then deleted, the SSD can still use the remaining 40% of the block for something else without first deleting it. Typically, as I understand it, once a block is written that's it until its page is erased - any unused space in a block remains unused for that erase cycle. This technique would allow all the unused bits at the end of the blocks to be reused without an expensive erase cycle, and then when the page is finally ready to be erased all the reused bits on the various blocks can be consolidated to fill a few fresh blocks.
  It seems to me this could be a huge advantage for use cases where you have a lot of small writes so that you end up with lots of partially filled blocks. Essentially they've introduced variable-size blocks to the SSD so that one physical block can be reused multiple times before erasure, until all available space has been used. Since erasing is pretty much the slowest and most power-hungry operation on the SSD that translates directly to speed and power-efficiency gains.
  
  - Re: (Score:2)
    
    by Guspaz ( 556486 ) writes:
    
    You're incorrect. Writes can only happen at the page size, but there are multiple pages per block. If a block has unwritten pages, you can still write to the remaining pages.
    - Re: (Score:2)
      
      by Immerman ( 2627577 ) writes:
      
      You are correct, I got the terms switched in line with the confusion in the summary. Reread with that in mind and I think you'll find the rest is in order (i.e. they are rewriting a partially used page)
      - Re: (Score:2)
        
        by Immerman ( 2627577 ) writes:
        
        Are you certain? That sounds like what they're describing, and certainly the individual bits are capable (you're still just setting some of the bits that were reset in the last erase cycle), the rest is just the control hard/software. It's the reset that needs to be handled specially, so long as you are only setting bits that haven't been altered since the last erase there's shouldn't be a problem. It seems to me that, at the crudest, you could simply read a partially filled block, add extra data to the
        
        Re: (Score:2)
        
        by Immerman ( 2627577 ) writes:
        
        Actually filesystems are typically *allocated* in 4k increments, but not necessarily *written* in such, it's easy enough on a magnetic drive to write only three bytes in the middle of a file, or only the bytes actually used in the last allocation block of each file, though caching systems may obscure that fact.
        As for the writing mechanism, you're right, it would likely be a bit more complicated. On further reflection I would suspect that they wouldn't bother reading a block at all, just write the new data
Excuse my naiveté (Score:3)

by HuguesT ( 84078 ) writes: on Saturday May 24, 2014 @07:55AM (#47082387)

Could the incoming data be written first in either a RAM or SLC cache while the formatting is going on ?

- Re: (Score:2)
  
  by Immerman ( 2627577 ) writes:
  
  It could, but if you're writing large amounts of data (considerably larger than your write cache) that won't actually help much. It also doesn't change the number of erasures required to get the data written, which is the primary speed and power bottleneck.
  This technique is sort of like using that blank corner on a piece of scratch paper before you throw it away - the blank spot was there anyway, and by making a habit of reuse you can significantly reduce the number of fresh sheets of paper (erasures) that
  - Re: (Score:2)
    
    by Bengie ( 1121981 ) writes:
    
    Only the last block of a file will have a "random" chance of usage.
    - Re: (Score:2)
      
      by Immerman ( 2627577 ) writes:
      
      Certainly - and if you're typically writing one huge file all at once this will have minimal benefit. But if you're filling the cache with lots and lots of small writes then this technique has potential.
    - Re: (Score:2)
      
      by Jane Q. Public ( 1010737 ) writes:
      
      Only the last block of a file will have a "random" chance of usage.
      Sure, BUT... blocks on SSDs can be as large or 16k and even larger. That's a lot of wasted space, especially if you have lots of small files.
      
      The real underlying issue here, though, is the number of lifetime write-cycles. Newer SSD technology (MLC in particular) actually made the number smaller, not larger. When it really, really must get larger before SSDs will be mature. That's the central reason why all these workarounds are necessary in the first place. And that's what they are: work-arounds.
      
      Maybe
      - Re: (Score:2)
        
        by OffTheWallSoccer ( 1699154 ) writes:
        
        Not even close to practical. The magnetic disk manufacturers implemented wear leveling back when the drives were in the 200MB-range. Before that disks wore out even quicker than flash disks and I didn't even use swap-files then.
        There is a huge difference between unlimited number of writes and undefined number of writes.
        In critical applications, a bad number is better than an undefined one. At least you can calculate a life-time and design after that.
        No, sir. HDDs (at least up until I stopped writing FW for them in 1999) did not have any wear leveling algorithms. In other words, the translation of LBA to physical location on the media (sometimes called Physical Block Address or PBA) is fixed, other than for defective sectors which have been remapped. So if an O.S. wrote to a specific LBA or range of LBAs repeatedly (think paging/swap file or hibernate file), those PBAs would be written to more frequently (or at least at a different rate) than other P
- Already being done (Score:3)
  
  by dutchwhizzman ( 817898 ) writes:
  
  Most flash drives have some RAM cache and most erasing is done as a background task by the on-board firmware of the drive. Part of flash drive reliability has to do with having big enough capacitors on board so a powerfailure will allow the drive to write enough data to flash to have a consistent state for at least it's own bookkeeping data on blocks and exposed data. The enterprise ones usually have enough capacitors to write all data to flash that has been reported to the OS as "we wrote this to the drive
crappy journalism as always (Score:3, Informative)

by Anonymous Coward writes: on Saturday May 24, 2014 @07:56AM (#47082389)

http://techon.nikkeibp.co.jp/english/NEWS_EN/20140522/353388/?SS=imgview_e&FD=48575398&ad_q
they came up with a better scheme for mapping logical to physical. however, the results aren't as good as all the news sources say.

Compared To What? (Score:5, Insightful)

by rsmith-mac ( 639075 ) writes: on Saturday May 24, 2014 @08:00AM (#47082403)

I don't doubt that the researchers have hit on something interesting, but it's hard to make heads or tails of this article without knowing what algorithms they're comparing it to. The major SSD manufacturers - Intel, Sandforce/LSI, and Samsung - all already use some incredibly complex scheduling algorithms to collate writes and handle garbage collection. At first glance this does not sound significantly different than what is already being done. So it would be useful to know just how the researchers' algorithm compares to modern SSD algorithms in both design and performance. TFA as it stands is incredibly vague.

- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  It was tails
- Re: (Score:3)
  
  by jones_supa ( 887896 ) writes:
  
  Fragmentation of data doesn't even affect the speed.
  Is this completely true? Because benchmarks show that even SSDs can read larger chunks much faster than small ones. So if a big file exists mostly on adjacent flash cells, it would be faster to read? Of course operating system -level defragmentation might not be very useful because the physical data might be mapped into completely different areas due to wear leveling. Thus the drive would have to perform defragmentation internally.
  - Re: (Score:2)
    
    by K. S. Kyosuke ( 729550 ) writes:
    
    Because benchmarks show that even SSDs can read larger chunks much faster than small ones
    Well, why shouldn't prefetching and large block reading work on the SSD controller level? I assume that Flash chips are still slower than DRAMs, and the controller has to do some ECC work, not to mention figuring out where to read from (which may also be something that is kept in Flash chips in non-volatile form, unless you want your logical-to-physical mapping completely scrambled when the drive is turned off). So prefetching the data into the controller's memory should help hide latencies even if the Flas
  - Re: (Score:2)
    
    by ArcadeMan ( 2766669 ) writes:
    
    If you're reading a lot of small files, that's a lot of open/read/close commands. If you're reading a big file, that's one open command, multiple sequential read commands, one close command.
    And if it's anything like SPI, there's not even multiple read commands, you just keep clocking to read the data sequentially.
  - Re: (Score:2)
    
    by AdamHaun ( 43173 ) writes:
    
    I'm not sure about NAND flash, which is a block device, but in NOR flash sequential reads are faster due to prefetching, where the next memory word is read before the CPU has finished processing the first one. For NAND, I'd imagine you could start caching the next page. Not sure if that's actually done, though.
- Re: (Score:2)
  
  by leuk_he ( 194174 ) writes:
  
  If data is fragmented over multiple blocks, It requires mulitple reads. But this kind of fragmentation is not as bad as HDD where you had a seek time of 7-8 ms. Matching the block size of the SDD to the block sie of the FS is an effective performance enhancement.
  Modern SDD have read limits. Every 10.000 reads or so the data has to be refreshed. The firmware will do this silent.
  - Re: (Score:1)
    
    by gregben ( 844056 ) writes:
    
    > Modern SDD have read limits. Every 10.000 reads or so the data has to be refreshed. The firmware will do this silent.
    Please provide reference(s). I have never seen any indication of this, or at least there is no read limit for the flash memory itself. You can read from it indefinitely just like static RAM, without "refresh" as required for DRAM.
    - Re: (Score:2)
      
      by altstadt ( 125250 ) writes:
      
      Google: flash read disturb [google.com]
      The Micron presentation [micron.com] is rather old, but gives a good overview of how Flash works.
    - Re: (Score:2)
      
      by OffTheWallSoccer ( 1699154 ) writes:
      
      GP is correct about read disturb. NAND vendors will specify specific policy for a given part, but it is typically N reads to a particular area (i.e. one block, which is 256 or 512 or 1024 pages) then requires erasing that area. So even if page 7 in a block is never read, but page 100 is read a lot, the drive will have to rewrite that whole block eventually.
      (I work for a NAND controller vendor.)
- Re: (Score:2)
  
  by tomhath ( 637240 ) writes:
  
  The linked article is pretty bad. This link [nikkeibp.co.jp] has a little more information. Apparently the saving they claim comes from filling the pages that already have valid data more completely rather than writing to new pages within the same block (the reduced fragmentation claim); then the garbage collector has fewer pages to relocate when erasing that block (the speed-up claim). Of course if the garbage collection happens in the background the savings are moot.
Original (Score:2)

by GrahamJ ( 241784 ) writes:

In the original-ish article [nikkeibp.co.jp] here they go into a bit more detail but the "conventional scheme" they're comparing against appears to be just straight mapping. It would be interesting to see how this stacks up against some of the more advanced schemes employed in today's SSDs.
Wear leveling (Score:2)

by jones_supa ( 887896 ) writes:

Per how big data areas is wear leveling performed in an SSD? Maybe not for each 4kB block, because that would require hundreds of megabytes of extra data just for the remap pointers, if we assume that they each are 48 bits long. Also TRIM data (which blocks are "nuked" and not just zeroes) requires similar kind of extra space.
- Re: (Score:3)
  
  by sribe ( 304414 ) writes:
  
  Per how big data areas is wear leveling performed in an SSD? Maybe not for each 4kB block...
  IIRC the erase/write block size is typically 128KB.
  - Re: (Score:2)
    
    by MarkRose ( 820682 ) writes:
    
    It was 128 KB for smaller, older drives. For instance, the Samsung 840 EVO series use an erase block size of 2 MB. Some devices even have an 8 MB erase block size. 8 KB page sizes are common now, too, much like how spinning rust moved to 4 KB pages. Using larger pages and blocks allows for denser, cheaper manufacturing.
- Re: (Score:2)
  
  by Bengie ( 1121981 ) writes:
  
  Tracking 4KB blocks wouldn't be that bad for meta data. Like you said, assume 48bit pointers, then some extra metadata, so 64bit, which is 8 bytes. 1GB is 262,144 4KB blocks, which is only about 2MB of metadata per 1GB, which is only 0.2% overhead. They over-provision something like 10%-30% just for wear leveling.
Not a word of that is true (Score:1)

by slashmydots ( 2189826 ) writes:

"Currently, data cannot be directly overwritten onto the NAND chips used in the devices. Files must be written to a clean area of the drive whilst the old area is formatted"
Am I the only one that knows that's not remotely true? I don't even know where to start. So the SSD wants to write to location 0x00000032 but it's occupied by old data. First of all, no it isn't. TRIM already took care of that. But let's say you're using the SSD in Windows XP so TRIM doesn't work. So they claim the SSD writes data
- Re: (Score:2)
  
  by OffTheWallSoccer ( 1699154 ) writes:
  
  So they claim the SSD writes data to a blank location on the drive temporarily, then erases the original intended location and later moves it back to that location to be contiguous? What's so damn special about that location? Just leave it in the blank location. They claim that causes fragmentation, which has no impact on the performance of an SSD in any way.
  This is a useless invention from people who don't know how SSDs work.
  You are correct. SSDs don't have a fixed LBA-to-physical arrangement, so host rewrites of an LBA will normally go to a new (erased) NAND location, with the drive updating its internal LBA map automatically (I.e. no need for TRIM of that LBA).
- Re: (Score:2)
  
  by Jane Q. Public ( 1010737 ) writes:
  
  That *IS* the basic problem.
  
  However, they have gotten "good enough" for most use cases. Though I agree that it is in the "just barely" category. Limited rewrites are their one major problem at this time. If that can be improved, it would be a great advance for us all.
- Re: (Score:2)
  
  by regular_gonzalez ( 926606 ) writes:
  
  You're in luck, as that time is right now!
  http://techreport.com/review/2... [techreport.com]
  
  tl;dr - the Samsung 840 series is the only drive to really suffer problems but that's strictly relatively speaking; it's allocating from reserve capacity and to reach the point it's at now you'd have to have 150 gb of writes per day for 10 years, which is probably at least an order of magnitude higher than even a heavy standard user. And that's the consumer version -- the Intel ssd, aimed more at production / business environmen
The problem with this article... (Score:2)

by AcquaCow ( 56720 ) writes:

...is that in a properly-designed SSD, there is no such thing as data fragmentation. You lay out the nand as a circular log and write to every bit of it once before you overwrite, and maintain a set of pointers that translates LBA to memory addresses.
Pretty much every SSD vendor out there has figured this out a few years ago.
- Re: (Score:2)
  
  by Bengie ( 1121981 ) writes:
  
  I may be missing something, but if you have a circular log and the head meets the tail, how can you not start fragmenting to fill the holes in the log? My understanding of circular logs is you just start writing over the oldest data, which you cannot do with permanent storage.
  - Re: (Score:2)
    
    by tlhIngan ( 30335 ) writes:
    
    I may be missing something, but if you have a circular log and the head meets the tail, how can you not start fragmenting to fill the holes in the log? My understanding of circular logs is you just start writing over the oldest data, which you cannot do with permanent storage.
    That's where overprovisioning and write-amplification come into play. The head NEVER meets the tail - the circular log is larger than the advertised size. E.g., a 120GB (120,000,000,000 byte) SSD would have 128GiB of flash. That differ
LWN? (Score:2)

by JSG ( 82708 ) writes:

Have I stumbled into a new green themed version of LWN? The comments here are far too insightful and interesting for the usual /. fare. Can't even find the frist post.
bad idea - thrashing directory blocks (Score:2)

by dltaylor ( 7510 ) writes:

I've written drivers for solid state media. It is a cost to find the the "next available block" for incoming data. Often, too, it is necessary to copy the original instance of a media block to merge new data with the old. Then, you can toss the old block into a background erase queue, but the copy isn't time-free, either.
Since so-called Smart Media didn't have any blocks dedicated to the logical-physical mapping (It was hidden in a per-physical-block logical id), there was also a startup scan required.
If

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Re:Wear leveling (Score:5, Informative)

Re:Wear leveling (Score:5, Informative)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re: (Score:1)

Not wear leveling. (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Excuse my naiveté (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Already being done (Score:3)

crappy journalism as always (Score:3, Informative)

Compared To What? (Score:5, Insightful)

Re: (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Original (Score:2)

Wear leveling (Score:2)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Not a word of that is true (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

The problem with this article... (Score:2)

Re: (Score:2)

Re: (Score:2)

LWN? (Score:2)

bad idea - thrashing directory blocks (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals