Samsung's SSD 840 Read Performance Degradation Explained 65
An anonymous reader writes with a link to TechSpot's explanation of the reason behind the performance degradation noticed by many purchasers of certain models of Samsung SSD (the 840 and 840 EVO), and an evaluation of the firmware updates that the firm has released to address is. From the piece, a mixed but positive opinion of the second and latest of these firmware releases: "It’s not an elegant fix, and it’s also a fix that will degrade the lifetime of the NAND since the total numbers of writes it’s meant to withstand is limited. But as we have witnessed in Tech Report’s extensive durability test there is a ton of headroom in how NAND is rated, so in my opinion this is not a problem. Heck, the Samsung 840 even outlasted two MLC drives.
As of writing, the new firmware has only been released for the 2.5” model of the SSD 840 EVO, so users of the 840 EVO mSATA model still have to be patient. It should also be noted that the new firmware does not seem to work well with the TRIM implementation in Linux, as this user shared how file system corruption occurs if discard is enabled."
Samsung should fix it for 840 owners also (Score:2)
Re: (Score:2)
Thank you for that on-topic demonstration of the file system corruption problem.
No Degradation Here (Score:1)
Re: (Score:2)
Return it and get a different drive.
My older drive is worse. (Score:2)
I tried hdtune on my older sata 2 samsung 470, and it's hdtune graph looks worse, with a minimum of 40MB/s. The drive's still in my system, but I don't use it anymore.
Re: (Score:2)
To keep the performance up the advertised values (Score:1)
So, what they're going to do to keep the performance up to the advertised value, is to rewrite all the data at least once per 2 months. That's actually a good chunk of the rated TB written for SSDs, whose low values in that regard are only acceptable if you take into account that most data isn't continuously rewritten. If I had one of those SSDs, I'd consider returning them for a refund. They are obviously defective, as in significantly deviating from their advertised performance, either in speed or longevi
Re:To keep the performance up the advertised value (Score:5, Informative)
Say you bought the 1Tb version (which is big for an SSD).
In this case, you rewrite it six times a year. That's 6Tb of write. That's...well... pathetic compared to the write expectancy of an SSD anyway.
So, actually, it's not that big a deal at all.
Re: (Score:2)
I still have one, but it requires 1.21 gigawatts.
Re: (Score:3)
Say you bought the 1Tb version (which is big for an SSD).
1Tb was big maybe five years ago for an SSD, but these days pretty much every SSD-based laptop has that much or more. If you want to go big now, 1TB is around where you'd look. Or 1TiB. But not 1Tb.
Re: (Score:2)
Wow. I'm surprised how you take a crappy drive like that so easily. I bet you didn't pay the serious money they actually charged for it? And it's no cheapo brand neither; rather the self-declared Mercedes. ... just works. Sure it does!
And the prescribed maintenance is rewriting all data twice a month because it tends to be forgetful. What a fantastic piece of hardware! That is what we ought to discuss here; rather than if a brute-force workaround
Re: (Score:2)
And the prescribed maintenance is rewriting all data twice a month because it tends to be forgetful. What a fantastic piece of hardware! That is what we ought to discuss here; rather than if a brute-force workaround ... just works. Sure it does!
No. The prescribed maintenance is to rewrite old data that hasn't been written to for 2 months, because it tends to be slow to read otherwise. No-one has reported data loss as a result of this.
Notice also, that all the reporting so far uses the artificial benchmarks to demonstrate the problem. In normal use, you'd be unlikely to ever notice, unless you're copying big old date files from one location to another.
The new firmware misreports its supported features (Score:5, Informative)
Apparently the new firmware now advertises that it supports queued TRIM, when in fact it doesn't https://bugs.launchpad.net/ubu... [launchpad.net]
The old firmware did not advertise queued TRIM support, so it wasn't an issue. The solution is a kernel patch to blacklist queued TRIM on all Samsung 8xx drives.
Re:The new firmware misreports its supported featu (Score:4, Informative)
Judging from the kernel's blacklist, queued TRIM does cause issues on quite a few SSDs, the 840EVO just did not announce that capability before the patch and now does (but cannot do it), and hence the problem. The kernel folks are now adding all Samsung 8xx to the blacklist, which will likely fix the issue. As Windows is traditionally behind in these areas it may just not use queued TRIM at all. I do hope that Samsung adds (more) Linux test systems to their qualification process now, though. Side-note: The 850PRO is apparently affected as well, but the kernel already blacklists it.
The conclusion here is that apparently getting SSD firmware right is a pretty big challenge and that SSD technology is still evolving. Also, not enough testing on Linux and likely not enough really smart people in the SSD firmware team. It is a learning process and the prevalent "clueless MBA bean-counter plague" will likely affect Samsung as well, just as it does any other large company.
Re: (Score:2)
As Windows is traditionally behind in these areas it may just not use queued TRIM at all.
That is my suspicion as well. This is sooo often the issue with all sorts of firmware. Linux tries to implement cutting edge features by spec, but in practice the hardware makers just write everything against Windows spec. The hardware might announce ACPI 5.0 support or queued TRIM support, but the actual codepaths are stubs that don't work properly. When such hardware is used under Linux, unexpected error states can be encountered. Sad trombone.
Re: (Score:2)
That and apparently many hardware makers do not test against Linux or do not do it well. As reverse-engineering what Windows does is really not easily possible, we will see things like this from time to time. I hope this will result in more and more public shaming, as that is the only way to make the bean-counters realize they are not investing enough effort.
The only good thing is that the Linux kernel folks are pretty fast to respond.
Re: (Score:2)
Given that the kernel 4.0.2 blacklist lists Micron, Crucial and Samsung as broken for this feature, you may be on to something. I also completely agree on the stupid. It is likely more complex though: They may have tried to produce the firmware cheaper than possible, by having only semi-competent (cheaper) people on it and replacing technological insight with "processes". Would not surprise me one bit. The MBA-bean-counter plague is strong in the industry these days. Save a penny, lose a billion is the name
toy anyway (Score:2, Interesting)
Re:toy anyway (Score:5, Informative)
Most drives sold in the world today don't have power loss protection either.
If it matters to you, you put that stuff in the controller, not the drive.
Re:toy anyway (Score:5, Insightful)
I think the concern is that this would somehow dramatically increase the probability of data loss caused by powering the drive even while it appears to be inactive. After all, it randomly rewrites flash blocks. However, in practice, this should not be an issue.
Presumably, their firmware never erases and rewrites a flash page in place. And presumably it does not write the log entry that causes the drive to look for those blocks in the new location until after the page has been fully written. Assuming they do, in fact, follow those rules, then a power interruption during a block clone should never result in loss of any data, because the data still exists in the old page, which will not be invalidated in favor of the replacement copy until that replacement copy is fully written. If they aren't doing that, then they are incompetent, and their drives should never be trusted with cat pictures, much less valuable data.
Re: (Score:1)
Re: (Score:2)
I would certainly hope so. It's what the rest of the industry does for every write. Loss of data that isn't being actively modified should be almost impossible if the people writing the firmware are even halfway competent (ignoring unlucky filesystem metadata changes aborted halfway through).
Re:toy anyway (Score:5, Interesting)
You don't need power protection if you take precautions and design your system around the fact that power can be removed at any time.
Some SSDs cheaped out and didn't have power protection AND used features that requires it (usually to get better performance - obviously if you're not worried about power dropping abruptly, you can avoid writing code to protect against it). It's no surprise those SSDs corrupted data liberally because their translation tables got corrupted.
But there are plenty of SSDs that aren't concerned with performance. In fact, if you're on SATA, performance is no longer important as they're all maxing out the SATA bus. If you're wondering why they all seem to be at 540MB/sec read and writes, that's because SATA is now the bottleneck. So now you can spend lots of time working on power-fail-safe firmware - because if you're stuck at 540MB/sec, it doesn't matter what performance tweaks you do because you're stuck there. If you can do 1GB/sec internally, and power safe code loses 40% of that, you do it. 1GB/sec is wasted on SATA, but you can save a few bucks by not needing power backup parts. 40% loss brings you down to only 600MB/sec, which is faster than SATA still.
It's why next gen SSDs are going PCIe - 540MB/sec is nothing compared to 1.5GB/sec you can find on Apple's machines.
Power fail is nice to have, but given everything's limited by SATA more than anything else, it's currently optional. For PCIe SSDs, you'd expect power fail components because you need performance.
Ironically, the faster it is, the less you need since you just need to dump your tables to storage ASAP, and if you're able to do 1.5GB/sec writes, and your tables are 500MB in size, you only need power for half a second. While if your media speeds was only 500MB/sec, you';d need power for a whole second.
Re: (Score:2)
No modern drive except some very expensive data-center drives have it. In addition, basically all moder drives lie about having flushed data to permanent storage. This includes spinning disks. The Linux file-system folks found out about it some while ago. It is not a big deal if the OS filesystem driver knows about it. Also, if you have a good PSU, that will keep power up for 20ms or more after the respective signal line signals loss of power and then drop power slow enough that the drive has time to find o
Re: (Score:2)
Actually, more and more SSDs today *DO* have power loss protection. Take it apart... if you see a bunch of capacitors on the mainboard all bunched together with no obvious purpose it's probably to keep power good long enough to finish writing out meta-data. Cheaper to use a lot of normal caps than to use thin-film high capacity caps.
-Matt
Just like Floppies then? (Score:4, Informative)
Goodness me.
We had this problem back in the 1970s/1980s with floppy disks!
When the disc drive writes to a part of the surface of the disc it energies the magnetic particle to saturation. This ability of the material to keep so much of its original pulse of energy was called the clipping level of the floppy.
As soon as the area is energised, it starts to decay (hopefully) very slowly over time. Once it decays below 40% of the energy originally given, that bit is lost and data is lost.
Some cheap floppies had a nasty low clipping level as they'd use cheap materials, over time of say a year the area that hasn't been rewritten to would decay and that bit was then unreadable. You lost that data. We had various programs that would take the 8", 51/4" and 3.5" floppies and read then rewrite the entire disk to ensure that the disc was refreshed. As I worked in Ferranti for the UK space and military, I could ask the likes of TDK,Maxell, etc. what the clipping levels of their discs were. Something the public didn't have access too.
If the sellers wouldn't say, we simply didn't buy from them. Let me tell you most low-medium priced suppliers hide this value and we didn't do business. Glad to say the top disc suppliers were always open and we'd buy discs with an over 80% clipping level!
With these MLC SSDs the voltage level is very important. It'll decay over time, nothing can stop it.
Re: (Score:2)
Well indeed. The demodulation problem for decaying signals is not going away. It is basic physics.
Dear Samsung (Score:2)
Strange Linux behavior (Score:4, Insightful)
We have a bunch of shared build PCs with 840 Evo SSDs in them and we noticed strange problems when we build off the SSD (over say, the HDD).
Basically what would happen after a little while (a month), all of a sudden during the build the entire system would practically lock up - all the cores are pegged at 99% system time, and system responsiveness collapses - it can literally be minutes for the system to respond. It makes a little headway, but compilation speed drops (since 99% of every core is spent in the kernel). It's completely fine off the hard drive, and if it wasn't for this loss in speed, the SSD would be faster (right now because it pauses a few minutes every 15 or so, the HDD is faster).
It's completely unusual - I did try to analyze the kernel, which appeared to have all the cores tied up in ext4 spinlocks. Not sure if it's a result of the tables being slow and blocking or what.
It happens under high load - I normally set the build at 12 threaded builds (8 cores!). Thought at first it was Linux collapsing under the weight of the build, but it's actually the SSD. Building off hard drive on the system system is no problem at all.
Re: (Score:1)
Re: (Score:2)
This is not related to the SSD. If your cpus are pegged then it's something outside the disk driver. If it's system time it could be two things: (1) Either the compilers are getting into a system call loop of some sort or (2) The filesystem is doing something that is causing lock contention or other problems.
Well, it could be more than two things, but it is highly unlikely to be the SSD.
One thing I've noticed with fast storage devices is that sometimes housekeeping operations by filesystems can stall out
Re: (Score:2)
Let's put everything we know on a row and think (Score:1)
0. About half a year ago, performance at read of data not read for a long time is observed to degrade.
1. Samsung acknowledges this fact.
The work around is obvious to everyone with a common sense: re-read data with old access data. This would be no fix, but the work-around of choice.
2. Samsung offers a fix some months later. Immediate observation: this fix doesn't fix the problem.
Samsung asks for more time and promises a fix.
3. Some more months later, Samsung provides the 'fix', which isn't, but the almost o
Head Up Ass (Score:2)