Vigile writes "Despite the rising excitement over SSDs, some of it has been tempered by performance degradation issues. The promised land is supposed to be the mighty TRIM command — a way for the OS to indicate to the SSD a range of blocks that are no longer needed because of deleted files. Apparently Windows 7 will implement TRIM of some kind but for now you can use a proprietary TRIM tool on a few select SSDs using Indilinx controllers. A new article at PC Perspective evaluates performance on a pair of Indilinx drives as well as the TRIM utility and its efficacy."
I finally got the opportunity to test out SSDs this year. There may be the odd teething problem to get over, but in my mind there is no market in the future for mechanical drives except maybe as cheap low-speed devices for storing non-critical information... in much the same way as tape drives were used a few years ago.
The mechanicals may be able to stay ahead in capacity for a long long time, even though they obviously have no hope of competitng in the performance arena ever again.
Actually, magnetic disks have exponentially increased in capacity since the 50s. In fact, the rate of increase has been higher than the growth of transistor count.
Things have changed a lot in four years. Since 2005, hard drives have only increased from 500 GB to 2 TB---a factor of 4. In that same time, Compact Flash cards increased from 8GB to 128 GB---a factor of 16. Flash density increases are severely outpacing hard drive density increases, and unlike hard drives, flash storage isn't rapidly becoming less reliable as the density increases....
...and unlike hard drives, flash storage isn't rapidly becoming less reliable as the density increases....
I can see the logic behind the argument that hard drives should become more failure prone as the platter density increases, but I've yet to see any data substantiating this point. Your claim that hard drives are rapidly becoming more unreliable makes your statement come off as even more dubious to me.
I don't mean to attack you or come off as a complete dickhole, but do you know of any data to back this up? I'm legitimately curious, as in my (completely anecdotal) experience, magnetic hard drives seem to
Flash drives have longer MTBF than spinning media... so they last longer. However, a less well known fact is that flash drives have a URE rate 10-100X worse than spinning media does typically today. It's getting fixed, but the fellow you're replying to is basically wrong.
I can buy a terabyte hard drive for around $100. For the same hundred dollars, the best SSD I can find is 32GB. On my computer, Steam's cache folder is bigger than 32GB. My music player has a 120GB drive, my DVR has a 350GB drive, and my backup server has a 1.5TB raid. Just because expensive mobile gadgets use expensive solid-state drives does not mean hard drives are dead, dying, or even decaying.
I finally got the opportunity to test out SSDs this year. There may be the odd teething problem to get over, but in my mind there is no market in the future for mechanical drives except maybe as cheap low-speed devices for storing non-critical information... in much the same way as tape drives were used a few years ago.
Well damn, I'll just have to tell our customer that has something like a 30 petabyte TAPE archive that's growing by about a terabyte or more each and every day that they're spending money on something you say is, umm, outdated and these newfangled devices that the next power surge will totally fry are the wave of the future.
Guess what? There's a whole lot more money spent on proven rock-solid technology by large organizations then you apparently know.
Tape and hard drives are going NOWHERE. For a long, lon
if by "proven rock-solid" you mean horrid fidelity and media degradation rates, i'd say you are correct about tapes. if you're client has a 30 petabyte tape archive there is probably some horrible inefficiency goin on. (i'm sure you probably have little control ofer the situation, i have similar clients) but if they have 30Pb of data on tape that they access regularly, they're wasting a LOT of time just retrieving data. you should really consider a SAN NAS or similar. HDD storage is very cheap these days and LTO4 tapes are pretty pricey. we all know they have shoddy storage quality to boot. if they dont access it regulary then its probably a real waste of money to own, record and store 30Pb of data. either way, just the physical storage of that many tapes is probably about equivelant to the sq. footage needed for a rack or 2 (or 3) of blade servers with the same storage capacity.
How many multi-petabyte enterprise data centers have you seen running SSDs as their primary storage? None. Yeah, that's what I thought.
Agreed that SSDs have a long way to go on price to compete, but it's simply not true that they're not yet ready for the enterprise datacenter. All the larger enterprise storage array vendors (EMC, HDS, IBM, NetApp) say they're ready, and most are shipping them with decent sales. Despite their price and the "fact" you've so eloquently stated, you'll find them in many Fo
All the larger enterprise storage vendors are full of shit. They say the SSD is "ready" because it's the hottest buzzword in the industry, which always commands huge profit margins.
On one hand, I can use cheap fast 2.0TB SATA drives for 11 cents a gig, or I can go the SSD route with 256gb drives at $4.00 a gig. That's OEM cost, which means EMC and friends will triple that number, to convince your boss these drives are "special".
Yeahhh... give me the one that costs 36 times more, takes up 4 times more spac
That's my biggest complaint about them, actually -- these "teething problems" people mention are pretty much directly a result of OSes treating SSDs as though they were spinning magnetic disks.
No, the OS should be able to do its own wear leveling. If you need to pretend it's a hard drive, do it in the BIOS and/or the drivers, not in the silicon -- at least that way, you can upgrade it later when things like this come out.
No way, lets have the firmware do this. The problem with your approach is that the OS wont understand the drive as well as the manufacturer does, so it will always be a sub-optimal solution. Dont tie the hands of the manufacturer to put intelligence in his drives. For instance, the best way to wipe a disk is via an ATA command [zdnet.com], and not through multi-passes of wipes. The manufacturer knows where the heads are and how the drive writes. The SSD situation is somewhat similar.
I'm pretty sure even Windows is smart enough to just use the BIOS-provided access, if it doesn't have a driver. If it does, provide it in a driver.
It would likely also require a different filesystem.
Nope. You can do exactly the same pretend-it's-a-hard-drive approach, until additional filesystems are developed. And there's nothing preventing a third party from developing a filesystem for Windows.
Again, see the BIOS approach. In fact, look at nVidia's fakeraid -- software RAID done with BIOS support and a Windo
You beat me to it, but in the spirit of adding value, there's a good article here [linux-mag.com]. Another benefit of nilfs2 is that you can easily snapshot and undelete files, giving it a sort of built in "time machine" technology (to use apple's terminology).
I'm just surprised that none of the linux distros are talking about it yet. You would think with the apple and ibm laptops using SSD today that there would be some option somewhere. I think everyone is distracted by btrfs.
I've got ext4 on my SSD. It performs very well, but nilfs is a better fit for an SSD. I'll reformat to nilfs sometime within the next few kernel release cycles. Nevertheless, ext4 is just fine--I even have journaling and all the other bells and whistles. I'm not afraid of the additional wear as I suspect the drive will fail by some other technical malfunction long before the flash cells wear out.
By the way, it's true what they say: An SSD is the one component that will provide you with the most notic
It's extremely unfair to link to the print version of that article. Anand put an immense amount of time into that (and everything before it!) and scarred quite a few bridges to bring it to light for his readers - there are very, very few reviewers out there that would do that for their readerbase. The least you could do is offer him and his site _some_ respect.
Because, basically, flash drives are laid in levels.
When you delete, you simply map logical space as free.
If you go to use that free space later, you find that area, and drop shit into it. It's I dunno, a 32 KB block of memory called a page. If the page is full (to the point where you can't fit your new shit) of "deleted" files, you first need to write over those deleted files, then write your actual data.
If the logical space is full with good, fragmented (with deleted files interspersed) files, you need to read out to memory, reorder the living data and remove the deleted data, add in the full page back.
Think of it as having a notebook. You can write to 1 page at a time, only.
Page 2 write (not enough space, no more blank pages, read page 2 and copy non-deleted shit to scratch paper, add new shit to scratch paper, cover page 2 in white out, copy scratch paper to whited-out page 2)
Disclaimer: I am not a SSD firmware author, although I've spoken to a few.*
As best I can understand it, the problem is that writes are scattered across the physical media by wear-leveling firmware on the disk. In order to do this, the firmware must have a "free list" of sorts that allows it to find an un-worn area for the next write. Of course, this unworn area also needs to not currently be storing any relevant data.
Now, consider a SSD in use. Initially, the whole disk is free, and writes can go anywhere at all. They do, too - you end up with meaningful (at some point) data covering the entirety of the physical memory cells pretty quickly (consider things like logfiles, pagefiles, hibernation data, temporary data, and so forth). Obviously, most of that data doesn't mean anything anymore - to the filesystem, only perhaps 20% of the SSD is actually used, after 6 months. However, the SSD's firmware things that every single part has now been used.
Obviously, the firmware needs to be able to detect when data on disk gets obsoleted, and can safely be deleted. The problems with this are that this leads to *very* complicated translation tables - logical disk blocks end up having no relation at all to physical ones, and the SSD needs to track those mappings. The other problem is that these tables get *huge* - a typical home system might have between 100K and 1M files on it after a few months of usage, but probably generates and deletes many thousands per day (consider web site cookies, for example - each time they get updated, the wear leveling will write that data to a new portion of the physical storage).
Maintaining the tables themselves is possible, and when a logical block gets overwritten to a new physical location, the old location can be freed. The problem is that this freeing comes at the same time that the SSD needs to find a new location to write to, and the only knowledge it has about physical blocks which can safely be overwritten is ones where the logical block has been overwritten already (to a different physical location). Obviously, the lookup into the table of active blocks has to be indexed by logical block, which may make it difficult to locate the oldest "free" physical blocks. This could lead to searches that, even with near-instant IO, result in noticeable slowdowns.
Enter the TRIM command, whereby an OS can tell the SSD that a given range of logical blocks (which haven't been overwritten yet) are now able to be recycled. This command allows the SSD to identify physical blocks which can safely be overwritten, and place them in its physical write queue, before the next write command comes down from the disk controller. It's unlikely to be a magic bullet, but should improve things substantially.
* As stated above, I don't personally write this stuff, so I may be mis-remembering or mis-interpreting. If anybody can explain it better, please do.
Obviously, the firmware needs to be able to detect when data on disk gets obsoleted, and can safely be deleted. The problems with this are that this leads to *very* complicated translation tables - logical disk blocks end up having no relation at all to physical ones, and the SSD needs to track those mappings.
Would it solve the problem (or, I guess I should say, remove the symptoms... for a while, at least) to do a full backup, format the SSD, and restore? I know it's not an ideal solution but rsync or Time
The problem isn't scanning metadata - the problem is relocating data prior to an erase. Flash memory is built into erase blocks that are quite large - 64k to 128k is typical. You can write to smaller regions, but to reset them for another write you have to pave over the neighborhood. However the OS is sending writes at the 512-byte sector granularity. So the drive has to essentially mark the old location for the data as obsolete, and place it somewhere else. When the drive has been used enough, however, it m
In very simple terms (because I'm no expert), it's because of the way SSDs deal with wear leveling and the fact that a single write is non-sequential. When it writes data, it is writing to multiple segments across multiple chips. It is very fast to do it this way, in fact the linear alternative creates heavy wear and is significantly slower (think single chip usb flash drives) than even spinning disk tech, and so this non-sequential write is essential.
Now, to achieve this, each chip is broken down into se
When you delete data, you are deleting little bits spread all over the physical drive.
The biggest problem is that a delete in most filesystems simply marks the space in the index on the device as free. However most filesystems leave the deleted data in place without writing anything over the top until that space is re-allocated. Hard disks don't typically need to know which sectors of the physical storage are actually in use. If you tell an SSD that this block is no longer required it can start erasing the physical chips and add them to the internal free list ready for the next data to be wr
Once upon a time, a technical subject on/. gave insightful and informative responses that were modded up. Time changes, I guess.
The "fragmentation" that SSD drive have don't really come from wear leveling, or from having to find some place to write things, but from the following properties:
* Filesystems read and write 4KiB pages. * SSD can read many time 4KiB pages FAST, can write ONCE 4KiB pages FAST, but can only erase a whole 512KiB blocks SLOWLY.
When the drive is mostly empty, the SSD have no trouble finding blanks area to store the 4KiB write from the OS (he can even cheat with wear leveling to re-locate 4K pages to blank spaces when the OS re-write the same block). After some usage, ALL THE DRIVE HAVE BEEN WRITTEN TO ONCE. From the point of view of the SSD all the disk is full. From the point of view of the filesystem, there is unallocated space (for instance, space occupied for files that have been deleted).
At this point, when the OS send a write command to a specific page, the SSD is forced to to the following:
* read the 512KiB block that contain the page * erase the block (SLOW) * modify the page * write back the 512KiB block
Of course, various kludges/caches are used to limit the issue, but the end result is here: writes are getting slow, and small writes are getting very slow.
The TRIM command is a command that tell the SSD drive that some 4KiB page can be safely erased (because it contains data from a delete file, for instance), and the SSD stores a map of the TRIM status of each page.
Then the SSD can do one of the following two things:
* If all the pages of a block are TRIMed, it can asynchronously erase the block. So, the next 4KiB write can be relocated to that block with free space, and also the 127 next 4KiB writes. * If a write request come and there is no space to write data to, the drive can READ/ERASE/MODIFY/WRITE the block with most TRIMed space, which will speed up the next few writes. (of course, you can have more complex algorithms to pre-erase at the cost of additional wear)
Something as simple as deleting the wrong partition becomes an irreversible operation if you do it using a tool that supports TRIM on TRIM-enabled hardware.
Even if you restore the partition table from a backup, you will likely suffer silent file system corruption, which may even not be apparent until it's too late.
If TRIM support is actually implemented on the device, the device is free to 'lose' data on TRIMmed blocks until they are written at least once.
Something as simple as deleting the wrong partition becomes an irreversible operation if you do it using a tool that supports TRIM on TRIM-enabled hardware.
This seems needlessly verbose. Let me shorten it for you:
Deleting a partition should always be considered an irreversible operation.
Hmmm, even shorter:
Don't delete a partition unless you want it to go away forever.
Even if you restore the partition table from a backup, you will likely suffer silent file system corruption, which may even not be apparent until it's too late. If TRIM support is actually implemented on the device, the device is free to 'lose' data on TRIMmed blocks until they are written at least once.
If I understand you correctly, you are suggesting that a disk partitioning tool will use TRIM to not only wipe the partition table itself, but also nuke the partition data from orbit. And you the point out that it would not be adequate to rewrite just the sectors of the partition table.
If so, then the answer is: you don't just restore the partition table, you restore the whole partition (including data) from backup.
I for one consider much-faster write speeds to be a bigger advantage than possibly being able to reverse a partition deletion.
This will only work if the drive doesn't do background 'scrubbing' to improve future write performance.
Or, even if the drive didn't erase physical Flash cells yet, it could already mangle the mapping between the logical and physical blocks.
In fact, I have a cheap CompactFlash card that does exactly that when you yank power from it while writing - the drive appears completely scrambled (with blocks reordered) when you restore power to it.
What in the world are you talking about? The nice things about SSDs is that yes, they do fail, but they fail (or are supposed to) in a predictable, non-catastrophic way that leaves the data readable just not writable. I have had two SSDs and haven't had either fail despite heavy usage, and I don't think you could patent SSDs because the technology is everywhere because it is flash memory and even if it is patented more companies make them than just one.
20% under what conditions, and in what timeframe? Over a long enough time period everything has a 100% failure rate.
Normal hard disks also will eventually fail, due to physical wear.
Also if it lasts long enough, at some point, reliability will stop being important. Even if it still works, very few people will want to use a 100MB hard disk from 15 years ago.
Just a small tangential nitpick: we were already more than a factor of ten past that HDD capacity fifteen years ago. The 1GB barrier was broken very early in the Nineties. I still have an HP 1GB SCSI drive from about '91 or '92, IIRC.
As far as failure rates go, I still have ALL of my disk drives (one or two outright failed) from the 15-20 years, and every single one of them still functions at least nominally. I'm still more trusting of magnetic media than I am either rewritable optical or Flash-based media.
I've never heard of a 20% fail rate for SSDs. I've heard of wear concerns, as each little bit on the drive can only be written a set number of times (it's at 10,000 or so, if I remember correctly). However, thanks to the majic of wear leveling and the large amount of separate chips in an SSD drive, you can fill up your drive completely and you will have only written to each bit exactly once. That means you could theoretically fill your SSD up 10,000 times before you would expect failure. Reality is a bi
Gamers, gamers, gamers and gamers. Seriously, the early adopters of any technology that is supposed to be faster on the consumer level will be gamers. Considering that most games are Windows-only it makes sense.
...Because either the game has to do a lot of initial loading or use the disk. Even copying from the HD to RAM takes time, sure, today you can pre-load a bunch of stuff, but things still need to be written and read from the disk every now and then.
Even the best consumer-level SSDs like the Intel x-25m/e use a volatile RAM cache to speed up the writes. In fact, with the cache disabled, random write IOPS drops to about 1200, which is only about three or four times as good as a 15k 2.5" drive. The more expensive truly-enterprise SSD drives which don't need a volatile write cache cost at LEAST $20/GB, so the $/(safe random write iop) ratio is actually still pretty close, and cheap SATA drives may actually be even on that metric as the fast enterprise SSDs. Granted, this shouldn't be the case in a year, but that's where it is right now. (Also, the performance-per-slot is a lot higher for SSDs, which can translate into different $ and power and space savings.)
But its the future (Score:5, Interesting)
I finally got the opportunity to test out SSDs this year. There may be the odd teething problem to get over, but in my mind there is no market in the future for mechanical drives except maybe as cheap low-speed devices for storing non-critical information... in much the same way as tape drives were used a few years ago.
Re: (Score:2)
Re:But its the future (Score:5, Informative)
Actually, magnetic disks have exponentially increased in capacity since the 50s. In fact, the rate of increase has been higher than the growth of transistor count.
See: http://www.scientificamerican.com/article.cfm?id=kryders-law [scientificamerican.com]
Parent
Re: (Score:3, Interesting)
Things have changed a lot in four years. Since 2005, hard drives have only increased from 500 GB to 2 TB---a factor of 4. In that same time, Compact Flash cards increased from 8GB to 128 GB---a factor of 16. Flash density increases are severely outpacing hard drive density increases, and unlike hard drives, flash storage isn't rapidly becoming less reliable as the density increases....
Re: (Score:3, Insightful)
...and unlike hard drives, flash storage isn't rapidly becoming less reliable as the density increases....
I can see the logic behind the argument that hard drives should become more failure prone as the platter density increases, but I've yet to see any data substantiating this point. Your claim that hard drives are rapidly becoming more unreliable makes your statement come off as even more dubious to me.
I don't mean to attack you or come off as a complete dickhole, but do you know of any data to back this up? I'm legitimately curious, as in my (completely anecdotal) experience, magnetic hard drives seem to
Re:But its the future (Score:4, Informative)
Flash drives have longer MTBF than spinning media... so they last longer. However, a less well known fact is that flash drives have a URE rate 10-100X worse than spinning media does typically today. It's getting fixed, but the fellow you're replying to is basically wrong.
C//
Parent
Re:It is yesterdays future ... (Score:5, Insightful)
I can buy a terabyte hard drive for around $100. For the same hundred dollars, the best SSD I can find is 32GB. On my computer, Steam's cache folder is bigger than 32GB. My music player has a 120GB drive, my DVR has a 350GB drive, and my backup server has a 1.5TB raid. Just because expensive mobile gadgets use expensive solid-state drives does not mean hard drives are dead, dying, or even decaying.
Parent
Re: (Score:3, Informative)
If you can afford an SSD, why would you waste it on swap? Why not just buy more RAM? If you ever actually need swap, you are doing something wrong.
Re: (Score:3, Interesting)
How about hibernate to disk? If you have lots of good SSD that should be very fast shouldn't it?
Re: (Score:3, Funny)
Think of it as a luxury expense from the cash we save building our own systems.
Re: (Score:3, Insightful)
I finally got the opportunity to test out SSDs this year. There may be the odd teething problem to get over, but in my mind there is no market in the future for mechanical drives except maybe as cheap low-speed devices for storing non-critical information... in much the same way as tape drives were used a few years ago.
Well damn, I'll just have to tell our customer that has something like a 30 petabyte TAPE archive that's growing by about a terabyte or more each and every day that they're spending money on something you say is, umm, outdated and these newfangled devices that the next power surge will totally fry are the wave of the future.
Guess what? There's a whole lot more money spent on proven rock-solid technology by large organizations then you apparently know.
Tape and hard drives are going NOWHERE. For a long, lon
Re:But its the future (Score:4, Interesting)
if by "proven rock-solid" you mean horrid fidelity and media degradation rates, i'd say you are correct about tapes. if you're client has a 30 petabyte tape archive there is probably some horrible inefficiency goin on. (i'm sure you probably have little control ofer the situation, i have similar clients) but if they have 30Pb of data on tape that they access regularly, they're wasting a LOT of time just retrieving data. you should really consider a SAN NAS or similar. HDD storage is very cheap these days and LTO4 tapes are pretty pricey. we all know they have shoddy storage quality to boot. if they dont access it regulary then its probably a real waste of money to own, record and store 30Pb of data. either way, just the physical storage of that many tapes is probably about equivelant to the sq. footage needed for a rack or 2 (or 3) of blade servers with the same storage capacity.
Parent
Re: (Score:3, Insightful)
Agreed that SSDs have a long way to go on price to compete, but it's simply not true that they're not yet ready for the enterprise datacenter. All the larger enterprise storage array vendors (EMC, HDS, IBM, NetApp) say they're ready, and most are shipping them with decent sales. Despite their price and the "fact" you've so eloquently stated, you'll find them in many Fo
Re: (Score:3, Interesting)
All the larger enterprise storage vendors are full of shit. They say the SSD is "ready" because it's the hottest buzzword in the industry, which always commands huge profit margins.
On one hand, I can use cheap fast 2.0TB SATA drives for 11 cents a gig, or I can go the SSD route with 256gb drives at $4.00 a gig. That's OEM cost, which means EMC and friends will triple that number, to convince your boss these drives are "special".
Yeahhh... give me the one that costs 36 times more, takes up 4 times more spac
Re: (Score:2)
What I really want to know (Score:3, Insightful)
Which Linux filesystem works best with SSDs? I don't intend to touch Win7.
Re:What I really want to know (Score:4, Informative)
NILFS - http://www.linux-mag.com/id/7345/ [linux-mag.com]
Parent
Re: (Score:3, Informative)
That's because JFFS and such are intended to be used on top of a raw flash device.
SSDs do wear levelling internally already, so a filesystem that tries to do it as well is redundant.
Re:What I really want to know (Score:4, Insightful)
That's my biggest complaint about them, actually -- these "teething problems" people mention are pretty much directly a result of OSes treating SSDs as though they were spinning magnetic disks.
No, the OS should be able to do its own wear leveling. If you need to pretend it's a hard drive, do it in the BIOS and/or the drivers, not in the silicon -- at least that way, you can upgrade it later when things like this come out.
Parent
Re:What I really want to know (Score:4, Insightful)
No way, lets have the firmware do this. The problem with your approach is that the OS wont understand the drive as well as the manufacturer does, so it will always be a sub-optimal solution. Dont tie the hands of the manufacturer to put intelligence in his drives. For instance, the best way to wipe a disk is via an ATA command [zdnet.com], and not through multi-passes of wipes. The manufacturer knows where the heads are and how the drive writes. The SSD situation is somewhat similar.
Parent
Re: (Score:3, Informative)
Not true at all -- that's why I mentioned a BIOS.
I'm pretty sure even Windows is smart enough to just use the BIOS-provided access, if it doesn't have a driver. If it does, provide it in a driver.
It would likely also require a different filesystem.
Nope. You can do exactly the same pretend-it's-a-hard-drive approach, until additional filesystems are developed. And there's nothing preventing a third party from developing a filesystem for Windows.
Again, see the BIOS approach. In fact, look at nVidia's fakeraid -- software RAID done with BIOS support and a Windo
Re:What I really want to know (Score:5, Informative)
I'm just surprised that none of the linux distros are talking about it yet. You would think with the apple and ibm laptops using SSD today that there would be some option somewhere. I think everyone is distracted by btrfs.
Parent
Re: (Score:2)
By the way, it's true what they say: An SSD is the one component that will provide you with the most notic
fragmentation? (Score:2)
can someone explain why fragmentation in the mapping between logical blocks and
physical addresses causes performance degradation?
is it an issue with logically sequential reads being spread across multiple pages?
a multi-level lookup to perform the mapping?
?
Re:fragmentation? (Score:5, Informative)
This older Slashdot post linked in the story links to a story that covers that topic very well: http://www.pcper.com/article.php?aid=669 [pcper.com]
Parent
Re: (Score:3, Insightful)
It's extremely unfair to link to the print version of that article. Anand put an immense amount of time into that (and everything before it!) and scarred quite a few bridges to bring it to light for his readers - there are very, very few reviewers out there that would do that for their readerbase. The least you could do is offer him and his site _some_ respect.
Re:fragmentation? (Score:4, Interesting)
Because, basically, flash drives are laid in levels.
When you delete, you simply map logical space as free.
If you go to use that free space later, you find that area, and drop shit into it. It's I dunno, a 32 KB block of memory called a page. If the page is full (to the point where you can't fit your new shit) of "deleted" files, you first need to write over those deleted files, then write your actual data.
If the logical space is full with good, fragmented (with deleted files interspersed) files, you need to read out to memory, reorder the living data and remove the deleted data, add in the full page back.
Think of it as having a notebook.
You can write to 1 page at a time, only.
Page 1 write
Page 2 write
Page 3 write
Page 2 delete
Page 2 write (still space)
Page 2 write (not enough space, write to page 4 instead)
Page 2 delete
Page 2 write (not enough space, no more blank pages, read page 2 and copy non-deleted shit to scratch paper, add new shit to scratch paper, cover page 2 in white out, copy scratch paper to whited-out page 2)
Parent
Re:fragmentation? (Score:5, Funny)
If you go to use that free space later, you find that area, and drop shit into it.
Knock it off with all the fancy jargon!
Parent
Re:fragmentation? (Score:5, Informative)
Disclaimer: I am not a SSD firmware author, although I've spoken to a few.*
As best I can understand it, the problem is that writes are scattered across the physical media by wear-leveling firmware on the disk. In order to do this, the firmware must have a "free list" of sorts that allows it to find an un-worn area for the next write. Of course, this unworn area also needs to not currently be storing any relevant data.
Now, consider a SSD in use. Initially, the whole disk is free, and writes can go anywhere at all. They do, too - you end up with meaningful (at some point) data covering the entirety of the physical memory cells pretty quickly (consider things like logfiles, pagefiles, hibernation data, temporary data, and so forth). Obviously, most of that data doesn't mean anything anymore - to the filesystem, only perhaps 20% of the SSD is actually used, after 6 months. However, the SSD's firmware things that every single part has now been used.
Obviously, the firmware needs to be able to detect when data on disk gets obsoleted, and can safely be deleted. The problems with this are that this leads to *very* complicated translation tables - logical disk blocks end up having no relation at all to physical ones, and the SSD needs to track those mappings. The other problem is that these tables get *huge* - a typical home system might have between 100K and 1M files on it after a few months of usage, but probably generates and deletes many thousands per day (consider web site cookies, for example - each time they get updated, the wear leveling will write that data to a new portion of the physical storage).
Maintaining the tables themselves is possible, and when a logical block gets overwritten to a new physical location, the old location can be freed. The problem is that this freeing comes at the same time that the SSD needs to find a new location to write to, and the only knowledge it has about physical blocks which can safely be overwritten is ones where the logical block has been overwritten already (to a different physical location). Obviously, the lookup into the table of active blocks has to be indexed by logical block, which may make it difficult to locate the oldest "free" physical blocks. This could lead to searches that, even with near-instant IO, result in noticeable slowdowns.
Enter the TRIM command, whereby an OS can tell the SSD that a given range of logical blocks (which haven't been overwritten yet) are now able to be recycled. This command allows the SSD to identify physical blocks which can safely be overwritten, and place them in its physical write queue, before the next write command comes down from the disk controller. It's unlikely to be a magic bullet, but should improve things substantially.
* As stated above, I don't personally write this stuff, so I may be mis-remembering or mis-interpreting. If anybody can explain it better, please do.
Parent
Re:fragmentation? (Score:5, Informative)
For a thorough (RE: long) primer on SSDs and long term performance woes, Anand's overview [anandtech.com] is a must read.
Parent
Re: (Score:3, Interesting)
Obviously, the firmware needs to be able to detect when data on disk gets obsoleted, and can safely be deleted. The problems with this are that this leads to *very* complicated translation tables - logical disk blocks end up having no relation at all to physical ones, and the SSD needs to track those mappings.
Would it solve the problem (or, I guess I should say, remove the symptoms... for a while, at least) to do a full backup, format the SSD, and restore? I know it's not an ideal solution but rsync or Time
Re: (Score:3, Informative)
When the drive has been used enough, however, it m
Re: (Score:3, Insightful)
In very simple terms (because I'm no expert), it's because of the way SSDs deal with wear leveling and the fact that a single write is non-sequential. When it writes data, it is writing to multiple segments across multiple chips. It is very fast to do it this way, in fact the linear alternative creates heavy wear and is significantly slower (think single chip usb flash drives) than even spinning disk tech, and so this non-sequential write is essential.
Now, to achieve this, each chip is broken down into se
Re: (Score:2)
When you delete data, you are deleting little bits spread all over the physical drive.
The biggest problem is that a delete in most filesystems simply marks the space in the index on the device as free. However most filesystems leave the deleted data in place without writing anything over the top until that space is re-allocated. Hard disks don't typically need to know which sectors of the physical storage are actually in use. If you tell an SSD that this block is no longer required it can start erasing the physical chips and add them to the internal free list ready for the next data to be wr
Re:fragmentation? (Score:4, Interesting)
Very interesting, I assumed the problem was similar to fragmentation and wondered why nobody compared it as such.
Now, your explanation makes things much more clearer, the global problem is amplified by the additional problem you described.
Now would implementing the logic to control the SSD entirely at the OS/FS level be much slower than implementing it in silicon in the SSD itself ?
As you said, I now understand that the OS/FS would now have to be aware of the underlying media ;-)
Parent
Re:fragmentation? (Score:5, Informative)
Once upon a time, a technical subject on /. gave insightful and informative responses that were modded up. Time changes, I guess.
The "fragmentation" that SSD drive have don't really come from wear leveling, or from having to find some place to write things, but from the following properties:
* Filesystems read and write 4KiB pages.
* SSD can read many time 4KiB pages FAST, can write ONCE 4KiB pages FAST, but can only erase a whole 512KiB blocks SLOWLY.
When the drive is mostly empty, the SSD have no trouble finding blanks area to store the 4KiB write from the OS (he can even cheat with wear leveling to re-locate 4K pages to blank spaces when the OS re-write the same block). After some usage, ALL THE DRIVE HAVE BEEN WRITTEN TO ONCE. From the point of view of the SSD all the disk is full. From the point of view of the filesystem, there is unallocated space (for instance, space occupied for files that have been deleted).
At this point, when the OS send a write command to a specific page, the SSD is forced to to the following:
* read the 512KiB block that contain the page
* erase the block (SLOW)
* modify the page
* write back the 512KiB block
Of course, various kludges/caches are used to limit the issue, but the end result is here: writes are getting slow, and small writes are getting very slow.
The TRIM command is a command that tell the SSD drive that some 4KiB page can be safely erased (because it contains data from a delete file, for instance), and the SSD stores a map of the TRIM status of each page.
Then the SSD can do one of the following two things:
* If all the pages of a block are TRIMed, it can asynchronously erase the block. So, the next 4KiB write can be relocated to that block with free space, and also the 127 next 4KiB writes.
* If a write request come and there is no space to write data to, the drive can READ/ERASE/MODIFY/WRITE the block with most TRIMed space, which will speed up the next few writes.
(of course, you can have more complex algorithms to pre-erase at the cost of additional wear)
Parent
Potential data recovery problems (Score:3, Interesting)
Even if you restore the partition table from a backup, you will likely suffer silent file system corruption, which may even not be apparent until it's too late.
If TRIM support is actually implemented on the device, the device is free to 'lose' data on TRIMmed blocks until they are written at least once.
Re:Potential data recovery problems (Score:4, Insightful)
Something as simple as deleting the wrong partition becomes an irreversible operation if you do it using a tool that supports TRIM on TRIM-enabled hardware.
This seems needlessly verbose. Let me shorten it for you:
Deleting a partition should always be considered an irreversible operation.
Hmmm, even shorter:
Don't delete a partition unless you want it to go away forever.
Even if you restore the partition table from a backup, you will likely suffer silent file system corruption, which may even not be apparent until it's too late.
If TRIM support is actually implemented on the device, the device is free to 'lose' data on TRIMmed blocks until they are written at least once.
If I understand you correctly, you are suggesting that a disk partitioning tool will use TRIM to not only wipe the partition table itself, but also nuke the partition data from orbit. And you the point out that it would not be adequate to rewrite just the sectors of the partition table.
If so, then the answer is: you don't just restore the partition table, you restore the whole partition (including data) from backup.
I for one consider much-faster write speeds to be a bigger advantage than possibly being able to reverse a partition deletion.
steveha
Parent
Re: (Score:2)
Or, even if the drive didn't erase physical Flash cells yet, it could already mangle the mapping between the logical and physical blocks.
In fact, I have a cheap CompactFlash card that does exactly that when you yank power from it while writing - the drive appears completely scrambled (with blocks reordered) when you restore power to it.
SSDs?! (Score:2)
Despite the rising excitement over SSDs, some of it has been tempered by performance degradation issues.
Who cares how they perform. All they have to do is sit there and scare away enemy fleets.
Re:High failure rate (Score:4, Informative)
Parent
Re:High failure rate (Score:5, Insightful)
That's a statistic that doesn't make any sense.
20% under what conditions, and in what timeframe? Over a long enough time period everything has a 100% failure rate.
Normal hard disks also will eventually fail, due to physical wear.
Also if it lasts long enough, at some point, reliability will stop being important. Even if it still works, very few people will want to use a 100MB hard disk from 15 years ago.
Parent
Re:High failure rate (Score:5, Insightful)
Just a small tangential nitpick: we were already more than a factor of ten past that HDD capacity fifteen years ago. The 1GB barrier was broken very early in the Nineties. I still have an HP 1GB SCSI drive from about '91 or '92, IIRC.
As far as failure rates go, I still have ALL of my disk drives (one or two outright failed) from the 15-20 years, and every single one of them still functions at least nominally. I'm still more trusting of magnetic media than I am either rewritable optical or Flash-based media.
Parent
Re: (Score:3, Insightful)
I've never heard of a 20% fail rate for SSDs. I've heard of wear concerns, as each little bit on the drive can only be written a set number of times (it's at 10,000 or so, if I remember correctly). However, thanks to the majic of wear leveling and the large amount of separate chips in an SSD drive, you can fill up your drive completely and you will have only written to each bit exactly once. That means you could theoretically fill your SSD up 10,000 times before you would expect failure. Reality is a bi
Re: (Score:3, Insightful)
Re: (Score:2)
Meh. Just stick in 50GB worth of RAM in there. No one's filling a Blu-Ray disk with 3d environment data yet, are they?
Why should a game even hit the disk except when saving, these days?
Re: (Score:2)
Re: (Score:2)
> Gamers, gamers, gamers and gamers.
Steve, is that you ?
Re: (Score:3, Insightful)
Because someone got paid to do it. You don't think /. editors work for free do you?
Re:Why Windows 7 in the summary? (Score:4, Interesting)
Even the best consumer-level SSDs like the Intel x-25m/e use a volatile RAM cache to speed up the writes. In fact, with the cache disabled, random write IOPS drops to about 1200, which is only about three or four times as good as a 15k 2.5" drive. The more expensive truly-enterprise SSD drives which don't need a volatile write cache cost at LEAST $20/GB, so the $/(safe random write iop) ratio is actually still pretty close, and cheap SATA drives may actually be even on that metric as the fast enterprise SSDs. Granted, this shouldn't be the case in a year, but that's where it is right now. (Also, the performance-per-slot is a lot higher for SSDs, which can translate into different $ and power and space savings.)
Parent