Slashdot Banner
Stories
Slash Boxes
Comments
typodupeerror delete not in

Comments: 137 +-   Intel Confirms Data Corruption Bug, Halts New SSDs on Monday August 03, @05:00PM

Posted by ScuttleMonkey on Monday August 03, @05:00PM
from the solid-state-death dept.
storage
hardware
CWmike writes "Intel has confirmed that its new consumer-class X25-M and X18-M solid state-disk drives (SSDs) suffer from data corruption issues and said it has pulled back shipments to resellers. The X25-M (2.5-inch) and X18-M (1.8-inch) SSDs are based on a joint venture with Micron and used that company's 34-nanometer lithography technology. That process allows for a denser, higher capacity product that brings with it a lower price tag than Intel's previous offerings, which were based on 50-nanometer lithography technology. Intel says the data corruption problem occurs only if a user sets up a BIOS password on the 34-nanometer SSD, then disables or changes the password and reboots the computer. When that happens, the SSD becomes inoperable and the data on it is irretrievable. This is not the first time Intel's X25-M and X18-M SSDs have suffered from firmware bugs. The company's first generation of drives suffered from fragmentation issues resulting in performance degradation over time. Intel issued a firmware upgrade as a fix."
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • Test before you ship (Score:5, Interesting)

    by alain94040 (785132) * on Monday August 03, @05:00PM (#28933773) Homepage

    Maybe they should have used HW/SW co-verification (like Seagate in that study [eve-team.com] - an example of how a storage company tests their firmware).

    For you software developers out there who enjoy free debuggers, you should know that we, hardware designers, also have our own debuggers. Except they are a little bit more expensive (think $500,000+) and can be quite bulky. But they are the only way to really test firmware before taping-out a chip.

    • by Anonymous Coward on Monday August 03, @05:33PM (#28934069)

      As a professional FW tester, I can say 1) firmware can be tested easier than the hardware verification the parent is talking about, and 2) Parent is confusing HW verification with firmware verification. Don't confuse HW verification with Firmware, and don't confuse Software testing with hardware verification. They are vastly different than each other, and have their own set of tools and methods (try sitting through a STAR East or STAR West seminar as a FW tester - it is a total waste of time).

      I can (and do) test firmware on buggy hardware all day long - its not an issue.

      • What the hell is that supposed to mean? Data structures and algorithms don't suddenly work differently when they're synthesized from Verilog instead of compiled from C.

        • Re: (Score:3, Insightful)

          by Anonymous Coward

          Yes, they do.

          C doesn't have voltage or current leaks.

          • So how do voltage and current leaks invalidate the universal mathematical principles of computer science? I'm beginning to get a whiff of anti-intellectualism here.

            • by Movi (1005625) on Monday August 03, @06:01PM (#28934317)
              Because suddenly your code becomes time-based, eg it matters WHEN x=0 becomes x=1, and what's in between.

              Believe me, this kicks you in the balls really hard. I still remember the frustration on my Altera course, where in simulation everything worked fine, but once flashed onto a FPGA everything went to shit.
                • by Obfuscant (592200) on Monday August 03, @09:11PM (#28935671)
                  ... just like if you're getting segfaults when writing C you're doing something wrong. It doesn't mean that the process is any less deterministic.

                  If you are getting segfaults in C you usually ASSUME that the processor you are running on is acting in a deterministic manner and ASSUME the problem is your code.

                  The DIFFERENCE is that SOMETIMES the underlying hardware is not acting deterministically because it is a PHYSICAL system that has physical flaws or imperfections. Like leakage currents that are JUST a tiny bit too much, or depend on the state of the neighboring circuit or the temperature.

                  In other words, I've written C code that had "segfaults" and it wasn't the fault of the C code, it was memory issues that resulted in problems. And I've written C code that suffered from a buggy compiler, too. I've also written code that "misread" about 1% of the characters typed in at the terminal, and it wasn't the code that was at fault, it was the UART.

                  I don't know anything about the source of Intel's problem, but I will say that they can send me ALL of the "defective" SSDs and I'll give them a home where I promise never to set a password on the disk or change it after I do.

            • Re:Typical redditor (Score:4, Informative)

              by NP-Incomplete (1566707) on Monday August 03, @09:09PM (#28935659)
              On a chip, adding 2^256-1 and 1 may not equal 2^256 when:
              1. Your destination register is 256 bits.
              2. Your destination register is in a different clock domain.
              3. Your timing constraints are wrong.
              4. Your power grid cannot support switching 256 registers.

              Functional simulations will only catch #1.

          • C doesn't have voltage or current leaks.

            But C has a lot more loops and pointers, which makes verification a lot harder (I work on a static analysis tool for C/C++, and it's also very expensive ;) )

  • Ugh... summary.... (Score:3, Informative)

    by blahplusplus (757119) on Monday August 03, @05:07PM (#28933833)

    "The company's first generation of drives suffered from fragmentation issues resulting in performance degradation over time."

    The performance degradation in the Intel X-25 is not because of a "firmware bug". All SSD's will suffer performance degradation whether or not their writing/wear leveling algorithms have been updated via firmware.

    • by ShadowRangerRIT (1301549) on Monday August 03, @05:26PM (#28933997)
      The X25-M's initial firmware was unusually bad; the degradation was more rapid and more severe than necessary. Thus, they issued a firmware update [slashdot.org]. The results were quite impressive [pcper.com]. It not only reduced the perf degradation, but it seems to have made writes faster across the board.
      • Re: (Score:3, Informative)

        "Although Intel acknowledged that all of its SSDs will suffer from reduced performance because of significant fragmentation, the type of write levels needed to reproduce PC Perspective's results aren't likely for everyday users, whether they're running Windows and Apple's Mac OS X. Even so, it still released the firmware upgrade to slow fragmentation."

      • Re: (Score:3, Informative)

        The X25-M's initial firmware was unusually bad; the degradation was more rapid and more severe than necessary.

        Unusually bad? More severe than necessary? Not really. Even with this supposed degradation, it was ages ahead of any and all competition. What was unusually bad was the complete lack of understanding from all reviewers who did not understand basic principles and the fundamental limitations of flash and yet rushed ahead with their articles. Those poor fools expected that the driver should behave lik

          • Re: (Score:3, Informative)

            Don't answer with generalities unless you have really thought about it. Wear-leveling is based on heuristics; since it cannot predict the future it is always possible to construct scenarios which will hit the worst case. And if it is theoretically possible, it will happen.

            Imagine a simple case and go from there. Imagine a flash with 5 blocks total, 4 sectors per block. The logical capacity is 16 sectors; the extra block is over-provisioned for wear leveling, etc. Now, imagine that you have the 4 blocks neat

        • by magarity (164372) on Monday August 03, @09:31PM (#28935823)

          What makes Intel a hard disk vendor anyway? Yes, it is still a disk
           
          It's solid state mass storage, where "solid state" = "chips". A disk is a spinning thingy which is completely different. Since Intel designs and make chips (see: "solid state" = "chips"), it is a perfect choice for them to make solid state mass storage devices out of chips.
           
          Have I mentioned the relationship between "solid state" and "chips" and how "solid state" != "spinning thingy"?

            • Re: (Score:3, Insightful)

              I'm not even going to put a foot in the flamefest over whether solid state mass storage is cost effective or even reliable - I only ask you don't call some chips that just sit there a spinning disk.

              More than 1/4 of Intel's revenue comes from miscellaney chips [intc.com] and motherboards that are not microprocessors. That's a big enough chunk it shouldn't be dismissed as not a core business.

              That this bug made it through means someone should be looking for employment and indicates a problem with managem

    • by Krizdo4 (938901) on Monday August 03, @05:30PM (#28934035) Homepage

      The performance degradation in the Intel X-25 is not because of a "firmware bug".

      Bugs can cause slowdowns, too

      Though it's highly regarded, Intel's X25-M SSD had a firmware bug that adjusted the priorities of random and sequential writes, leading to a major fragmentation problem that dropped throughput dramatically. The issue was originally uncovered by PC Perspective after two months of testing. Those tests showed that write speeds dropped from 80MB/sec. to 30MB/sec. over time, and read speeds dropped from 250MB/sec. to 60MB/sec. for some large block writes.

      https://www.techworld.com.au/article/302571/ssd_performance_--_slowdown_inevitable?pp=3 [techworld.com.au]

      Before firmware update

      the result suggested a write speed of 30 MB/sec.

      http://pcper.com/article.php?aid=691&type=expert&pid=3 [pcper.com]

      After firmware update

      After composing myself, I did the same file copy I had tried earlier. 76 MB/sec.

      http://pcper.com/article.php?aid=691&type=expert&pid=4 [pcper.com]

      Not a firmware bug?

      • ""Although Intel acknowledged that all of its SSDs will suffer from reduced performance because of significant fragmentation, the type of write levels needed to reproduce PC Perspective's results aren't likely for everyday users, whether they're running Windows and Apple's Mac OS X. Even so, it still released the firmware upgrade to slow fragmentation.."

    • The performance degradation in the Intel X-25 is not because of a "firmware bug". All SSD's will suffer performance degradation whether or not their writing/wear leveling algorithms have been updated via firmware.

      You're missing several months of history here.

      Back in February, several reviewers found that the X-25s performance fell to unacceptably low levels after a certain threshold was reached. Intel tried to deny it, saying that you'd never see the problem in real-world usage and only benchmarking the dis

  • I find it difficult to really blame them for this. What an obscure bug. How do you QA yourself out of something like that without spending more than you did on your R&D?

    • Take a down payment from your users as a massive discount in exchange for them signing on as "beta testers." If they actually find something wrong with the product and send in problem reports, then they get to keep the product for just that initial down payment so long as they keep sending in problem reports. If no problem reports come in within a given amount of time, bill them the remainder of the MSRP on the product, since it obviously works well enough for their uses.

      I guarantee you something like this

    • Re:Well.. (Score:5, Interesting)

      by rickb928 (945187) on Monday August 03, @05:37PM (#28934113) Homepage

      Is this a cost issue, or a thoroughness issue?

      No, we dont catch every possible scenerio here, either, but we do try very, very hard. Knowing one of the coders in Intel's RAID drivers groups, he goes crazy with stuff. And he just writes Linux drivers. I do not envy him - this past year, every bug he's had to fix has been caused by someone else's code. Someone not writing Intel drivers. And he gets slammed every time for bad testing, as if he can test all the rest of the kernel team's stiff, NTM every fly-by-night Chinese hardware outfit. They're killing him.

      I can't even say 'ext4', he just goes insane. Though he chuckles when I whisper 'ReiserFS', and opens another beer.

      I'm glad I'm not in that line of work.

      • Not really. Making an educated guess from the article, it appears that this is implemented as a simple controller lockout, not actual encryption. So swapping the flash memory into another controller (common computer forensics technique) would bypass it. Most people paranoid enough to want a disk password want real encryption, so using Intel's half-measure of a password is likely a very uncommon scenario. The tests are probably very simple; glossing over this case would be an understandable, if not desir
  • Intel says the data corruption problem occurs only if a user sets up a BIOS password on the 34-nanometer SSD, then disables or changes the password and reboots the computer.

    What does this mean? The flash drive has a password lockout? If so:

    (1) a password lockout on a drive is daft, you want to encrypt the drive or not worry about it.

    (2) flash drives trashing themselves irretreivably when you reboot after enabling passwords? I've seen that before, on "secure" thumb drives. I won't have anything to do wit tha

    • a password lockout on a drive is daft, you want to encrypt the drive or not worry about it.

      That's hardly daft. I have motion-detecting laser bullets in my foyer, but I still lock my front door.
      • by Grishnakh (216268) on Monday August 03, @05:56PM (#28934273)

        Why bother though? If someone breaks in, you'll have to fix or replace your front door, even though the motion-detecting laser robots zapped him. If you just leave your front door unlocked instead, intruders can just walk in, and the laser-wielding robots can zap him, and then automatically dispose of the body for you too. This way, the intruder won't cause any damage.

      • If only because your homeowners insurance requires it for them to maintain full liability?

      • You have things backwards.

        Encrypting the drive ... in software, mind, not in the drive's firmware ... is like locking the front door. It's simple, safe, works for all doors, and is unlikely to break down and kill someone accidentally.

        Putting a password on the drive is like leaving the door unlocked and booby-trapped.

  • Feature Not A Bug (Score:5, Insightful)

    by mrbene (1380531) on Monday August 03, @05:32PM (#28934061)

    Seriously, I'd say this is in the By Design bucket. For the security conscious - set a BIOS password. If the (feds/aliens/wife/others) remove the password, all access to the data is gone.

    Brilliant! Secure!

    Mind you, not being able to change my password once every other day might hinder my current security model.

  • by owlstead (636356) on Monday August 03, @05:37PM (#28934107)

    Although this bug should have been caught faster it seems that it is possible to update the firmware without any data loss (fortunately I have put it in a laptop, power outages are no problem). I've looked at the Intel site and the flash utility seems to be simply bootable from CD - if this is the last bug I'll be a very happy punter indeed.

    My 80 GB G2 SSD replaced a not too fast laptop drive. I'm now trying Linux, but I'll try Vista as well just for fun - I'll just write my 80 GB to an external drive using Gparted. These drives come highly recommended even if they would slow down to 50% of performance (which, it seems, they don't). I unzipped Eclipse to it and JavaDoc and I could see that the archiver that unzipped the .zip has some performance issues reading the index. It took longer than the unzipping and gunzipping and untarring (the Eclipse gunzipping/untarring took less than 2 seconds - yikes). The only thing faster is the tmpfs in RAM which I used to compile the OpenJDK in on my "workstation". Starting Eclipse takes now less time on my laptop than on my workstation even though it got twice as few cycles.

    • My 80 GB G2 SSD replaced a not too fast laptop drive. I'm now trying Linux, but I'll try Vista as well just for fun - I'll just write my 80 GB to an external drive using Gparted. These drives come highly recommended even if they would slow down to 50% of performance (which, it seems, they don't). I unzipped Eclipse to it and JavaDoc and I could see that the archiver that unzipped the .zip has some performance issues reading the index. It took longer than the unzipping and gunzipping and untarring (the Eclipse gunzipping/untarring took less than 2 seconds - yikes). The only thing faster is the tmpfs in RAM which I used to compile the OpenJDK in on my "workstation". Starting Eclipse takes now less time on my laptop than on my workstation even though it got twice as few cycles.

      This just goes to show how much of a bottle neck traditional hard drives really are. A friend of mine recently replaced his hard drive in with an SSD and I was extremely impressed by the speed improvement - so much so that I'm considering installing an SSD drive on my computer as the primary hard drive and using the second as backup space.

      • If your OS is small enough, skip the Flash SSD altogether, get 4GB of cheap DDR memory and a Gigabyte i-RAM SSD and put your OS on that.

  • by neokushan (932374) on Monday August 03, @05:44PM (#28934171)

    "How to recover lost/corrupted files from an SSD?"

  • by JakFrost (139885) on Monday August 03, @07:33PM (#28935055)

    This really seems like a very unlikely event to happen to trigger the problem on these drives for most users since from my experience personally and professionally I have yet to see anyone actually know about BIOS passwords, much less about setting a password on the drive using the ATA secure drive password feature. I am surprised that this was even caught by anyone unless it was a complete fluke or there actually are people or companies using this type of a feature for security. (I don't doubt it but haven't seen it.)

    I personally own the first generation Intel X25-M 80GB MLC SSD [intel.com] and I have written about it extensively here on this forum. I heard rumors that the new TRIM feature support will only made available to this second generation release of these drives but I'm unsure if that is really true. I'm on the fence right now whether I should sell my G1 drive and upgrade to the G2 because of this feature and also for a little more performance because I am so happy with the performance of this drive and also the current 8820 firmware that solved the fragmentation and slowdown issues.

    If you are one of those folks who is still sitting around not knowing what to do when all of this Solid State Disk news is coming out all over then you are missing the biggest paradigm shift to computing performance since the transfer from floppy disks to hard drives.

    With the upcoming re-release of this newly affordable drive around 2009-08-28 from Intel X25-M G2 80GB MLC SSD at ~$230 USD from Newegg [newegg.com] or ZipZoomFly [zipzoomfly.com] you should definitely dig down deep and save a little money to buy one of these drives and experience the biggest performance and responsiveness improvement to your computer that you could imagine.

    If you need a primer on the SSD revolution check out my previous post regarding the articles to read.

    Required Reading for Solid State Drives (Score 1) [slashdot.org]

  • by AllynM (600515) * on Monday August 03, @08:29PM (#28935419) Journal

    Welcome to 2 weeks ago:

    http://www.pcper.com/comments.php?nid=7544 [pcper.com]

    Allyn Malventano
    Storage Editor, PC Perspective

  • by Allnighterking (74212) on Tuesday August 04, @12:42AM (#28936949) Homepage
    I've seen this before, though I can't remember where. In that case what was happening was that when you changed or removed the password it would corrupt the password file and lock you out. The first time (no password exists set original) does the following
    • read the password
    • hash the password
    • write the hash to the data file

    Now the problem came in that case when you wanted to change/delete the password. It would use a second subroutine to do.

    • read the old password
    • get the old password hash and use it to check if the user knows the correct password
    • get new password (twice and compare)
    • hash the result of the diff of the first entry and the second entry for the new password

    That last step was the killer, seems that someone had declared a global variable and a local variable with the same name. End result one overwrote the others data, and one never knew exactly what the box hashed, nor you could figure out what to key in to the screen to unlock the door. (so to speak.)

  • by Waccoon (1186667) on Tuesday August 04, @06:11AM (#28938537) Homepage

    Ask anyone who bought a JMicron-based SSD about insufficient testing. How any company thought that controller was worthy for their SSDs is beyond me.

    Before I replaced mine with a Samsung SSD, my [censored] was regularly giving me studders and pauses that lasted for 20-40 seconds at a time. It just flat-out halted everything on the computer for half a minute for no apparent reason, even while reading, not just writing. Apparently, this was predominant behavior for the controller that dominated the SSD arena until the X-25 started blowing people away.

    I think I understand now why Seagate, WD, and the other HD manufacturers are taking so long to get SSDs on the market. Since their market depends almost exclusively on storage, they can't afford to screw up their first SSDs. At least, I hope that's the reason. Even they have to understand that the hard drive market isn't going to last forever.

    • by jtownatpunk.net (245670) on Monday August 03, @05:25PM (#28933987)

      Future? You must be new to computers. I updated the firmware in my very first 80's printer to give it more features. Had to pop out the old chips and put in the new ones. I upgraded the firmware in modems from several different manufacturers (some more than once) to add features and fix bugs. I've updated the firmware (BIOS) on most of my motherboards. I've updated the firmware on optical drives. I've updated the firmware on a scanner. I've updated the firmware on SCSI controllers. I've updated the firmware on hard drives. I've updated the firmware on switches and routers. Hell, I've updated the firmware on keyboards.

      This is hardly a new phenomenon.

    • Are we looking at a future where we not only have to download updates to fix bugs in our applications and operating systems, but our hardware as well?

      No, we're looking at a past like that. Lest you forget, both the 486 and the Pentium had firmware updates too (the Pentium FDIV bug being the better remembered of the two.) My first firmware update was a bugfix in a 300 baud accoustic coupler, way back in 1983 or thereabouts.

      Can't imagine why you think this is anything new; even video game consoles have been

      • Re: (Score:3, Informative)

        The FDIV bug wasn't fixed in firmware. There was a microcode update that worked around the problem, but it made division painfully slow. Intel's 'fix' was to recall all of the affected chips and provide replacements. It cost the company a lot of money and the story became the introduction to Andy Grove's biography.
    • From my perspective it's actually beggining to be quite common among HW manufacturers to release broken hardware. Actually had 2 run-ins with a required firmware upgrade to gfx boards (both nvidia)

      #1 8800GTX 512MB who in it's video bios claimed to only have 256MB. I guess the windows drivers had their own VRam enumeration procedure, but this majorly put other drivers off to a hang (OSX - yeha i know hackintosh is bad, and noveau). I had to get the vbios from the board, hexedit it (4 offsets), then flash it
      • I got one to add that I'm still working on:

        GTX 285 - hangs with blue/black screen of death both in idle and in games although far more frequently at idle, for some people it happens so early and often that a RMA is their only option. For me it happens within 3-5 days of bootup. What I think the problem is: the card is designed to throttle down when it's not being fully utilized, but I suspect the voltage regulators weren't designed to handle this, so even during full utilization when the BIOS runs at its de

    • by ShadowRangerRIT (1301549) on Monday August 03, @05:30PM (#28934027)
      They probably meant a hard disk password. Depending on implementation, this means either disk supported full disk encryption, or a simple firmware interlock that prevents reading through the controller without the password (but could be bypassed with forensic tools that read the disk surface directly).
    • Yes, you must be new to computers since hard disks have had passwords for years. It was a popular feature in the "enterprise" market before full-disk encryption became practical.

I'd like to meet the guy who invented beer and see what he's working on now.