Become a fan of Slashdot on Facebook


Forgot your password?
Data Storage Hardware

The Lies Disks and Their Drivers Tell 192

davecb writes "Pity the poor filesystem designer: they just want to know when their data is safe, but the disks and drivers try so hard to make I/O 'easy' that it ends up being stupidly hard. Marshall Kirk McKusick writes about the difficulties in making the systems work nicely together: 'In the real world, many of the drives targeted to the desktop market do not implement the NCQ specification. To ensure reliability, the system must either disable the write cache on the disk or issue a cache-flush request after every metadata update, log update (for journaling file systems), or fsync system call. Both of these techniques lead to noticeable performance degradation, so they are often disabled, putting file systems at risk if the power fails. Systems for which both speed and reliability are important should not use ATA disks. Rather, they should use drives that implement Fibre Channel, SCSI, or SATA with support for NCQ.'"
This discussion has been archived. No new comments can be posted.

The Lies Disks and Their Drivers Tell

Comments Filter:
  • by adturner ( 6453 ) on Friday September 07, 2012 @03:37PM (#41265333) Homepage

    But you lost me the moment you mentioned ATA drives.

  • 2 out of 3 (Score:4, Insightful)

    by ardmhacha ( 192482 ) on Friday September 07, 2012 @03:37PM (#41265345)

    Cheap, fast and reliable.

    Pick any two.

    • by h4rr4r ( 612664 )

      Only because of market segmentation. They sell the same drives as Enterprise Grade SATA with these NCQ turned on in firmware as they do to consumers with it turned off.

      Even worse are the RAID controllers(looking at you DELL) that do not disable the cache on the drives when you tell them to disable the write cache. You think your data is safe, then you lose power and what should be an oops has you going to your backups and doing a rebuild or swapping over to a replicated box.

      • They sell the same drives as Enterprise Grade SATA with these NCQ turned on in firmware as they do to consumers with it turned off.

        What you get with an "enterprise" sata drive is higher MTBF and a firmware tweaked to work well with RAID (desktop drives try to be more forgiving for IO errors, while the enterprise drives are more quick to decide "ive failed, let the raid controller do its work").

        Im not aware of any sata drive that doesnt support NCQ-- its certainly on every desktop drive ive used excepting MAYBE the very first sata drive I bought in 2003. Certainly all SSDs I am aware of (except niche super-low-end ones) and all mass-ma

      • Even worse are the RAID controllers(looking at you DELL)

        You buy RAID controllers from DELL? You deserve what you get. Buying DELL server gear is like bringing a Schwinn Varsity to the Tour de France.

    • I hear that in the project management realm...great quote and forces people to think about the interdependence of the three variables.
    • by geekoid ( 135745 )

      you need to define terms before I can even pick one.

      BTW, new manufacturing technique can accomplish all three.

  • One can't have ones cake and eat it. Speed or reliability, there should be more differentiation and more clarity in the specs. I want my backup-disk to be very reliable, I want my boot-disk to be fast. Best performance for both, but different circumstances.

    • by h4rr4r ( 612664 )

      In that case boot should just be on an SSD, where these issues pretty much disappear anyway.

      • That's not true at all. The X25 drives from Intel were terrible in terms of safe writes. The newer Intel drives are better, but only because they added a capacitor to allow in-process writes to complete -- simply being solid-state does not resolve these issues, and in some cases can make them much worse.

  • Sorry, what? (Score:4, Insightful)

    by Compaqt ( 1758360 ) on Friday September 07, 2012 @03:41PM (#41265449) Homepage

    We're talking about ATA drives?

    As in non-SATA drives?

    Who has those anymore?

    While the article is good for publication in an academic journal like ACM, it's useless for the real world.

    For that, the author should tell us whether most drives on the market have NCQ already or not. Popular drives like WD Green and Seagate's various lines.

    Otherwise, saying "$A is useless without $Y" is pointless.

    • i'm guessing that since he's talking about 4K sectors, he means SATA since none of the PATA drives were large enough to warrant the switch from 512.

    • An SATA drive is a subset of ATA drives. You're thinking of PATA or IDE drives. []

      In other words, when someone says "ATA drives" they aren't exclusively talking about non-SATA drives.

      • Wrong. ATA is the original name of what was renamed to PATA once SATA was introduced. So if he is saying what you claim he is using the term incorrectly.

        • by AK Marc ( 707885 )
          So, if someone is going to New York, and someone corrects them to "New Amsterdam", is the corrector correct in that the area was once called something different, or is "New York" the correct term, as that's the current name and eliminates confusion?

          ATA doesn't exist anymore. It's like saying you are going to New York by saying "I'm going to the USA". It might be technically correct, but entirely useless, especially if one is in California telling all his friends he's going to the USA. It's not only tech
          • Regardless, the author's choice of terms plus lack of additional clarification totally muddled what he might have been trying to say.

            Also, there's no context for what he's saying ("SATA without NCQ is bad"). It's like saying MySQL without foreign keys is bad, without mentioning the context that MySQL does have foreign keys these days.

            • by AK Marc ( 707885 )
              Perhaps more applicable if all builds of MySQL claimed to have foreign keys, but only some actually had them.
        • by geekoid ( 135745 )

          No, YOU are wrong.
          AT refers to ATA and ATAPI command-set; which SATA uses; with improvements and new features.

          It was change to PATA to more accurately describe how it's moving data through it's channels i.e. parallel

    • ATA came before SATA. One use I've found for ATA is to increase the number of drives supported on a motherboard. I use one as a boot disk for a FreeNAS box. The drive is basically read-only, so I don't expect write cache issues. ATA drives are very slow and noisy, and the reason that technology is obsolete.
      • dude, the ata vs sata is ONLY on the controller card!

        the drive spindle is the same. its funny to hear someone say that older ide drives are 'noisier'.

        you CAN say that older drives are noisier than new ones. and I'd respond with "DUH!"

        but scsi, sata, ide, sas, fc: the drives are still the same. controllers are what varies.

        • Dude, thanks for the information. I did not know they were all the same. My noisy drive is a Seagate Bigfoot 20 gig drive that's around 15 years old, 5 1/4 format half-height that weighs five pounds. I can't believe I blamed the noise on the interface.
  • by poet ( 8021 ) on Friday September 07, 2012 @03:41PM (#41265463) Homepage

    We shouldn't even be writing for ATA drives anymore. And any name brand manufacturer that you would trust (on a mediocre level) WD, Seagate etc... all support NCQ.

    • Are you saying we should cast the ATA driver out of the kernel and dispose of all our ATA hardware?

      Even though it's not in new hardware any more, we still need to support it in existing hardware. The driver still needs work when the kernel APIs change.

      • by poet ( 8021 )

        Good point. That said, ATA hardware is really quite old. I don't know that it would hurt to say, you know what if you want to run 3.6 of Linux you aren't going to have an ATA drive. If they run to run ancient hardware, let them run older hardware (note I didn't say ancient hardware).

  • I put my important files (pr0n, etc.) on my zfs mirror file server and scrub each week. The really important stuff (tax returns, etc.) I put in a safe deposit box at the bank.

  • by Anonymous Coward on Friday September 07, 2012 @03:43PM (#41265483)

    Don't assume that "enterprise" disks do this correctly either.

    Many have options to make them behave properly but out of the box have write back caches and ignore FUA or similar, leading to the same problems.

  • by rickb928 ( 945187 ) on Friday September 07, 2012 @03:47PM (#41265557) Homepage Journal

    I never recommended ATA drives for servers. Really old stuff that used MFM and RLL drives was back in the era where the just anything else. I used ATA drives for my home stuff and lab where it wasn't expected to be very reliable, and SCSI was all I used for a very long time. Even today I recommend against SATA though it seems tolerable, but SCSI drives are still my standard.

    Mostly I thought SCSI drives were also made better, but Seagate and WD convinced me otherwise.

    And yes, MFM drives in a Novell DCB setup were among my first servers. Making NW 2.15c mount a 4 GB volume just so you can say you did it would not be fun today, but back then it was work, and clients paid for it. I'm glad it wasn't a VINES server.

    • Funny, google has tens of thousands of servers and they put cheapo SATA drives in them.

      • And Google relies on multiply redundant servers and data, both for performance and reliability. Not many small businesses are gonna want to put in 5-way clustering.

        • Not to mention Google doesn't have to provide a "right" answer. They can provide any answer that seems approximately correct. However for their accounting, payroll and tax systems I'd bet $20 that they use name brand servers running name brand OSes and name brand software.

          • If you take a few moments and look into it, google buys custom servers that are more like commodity boxes than premium servers.

      • And google is not your average company.

        Google has a LOT of servers running much the same workloads. As such it makes sense for them to put in the software engineering effort to achive higher level redundancy. They engineer things so they don't have to care if a server dies.

        Most companies have a relatively small number of servers each with a particular task. If one of those servers fails it's a much bigger deal that can mean significant downtime and/or data loss. IIRC restoring a big database from backup an

  • by MSTCrow5429 ( 642744 ) on Friday September 07, 2012 @03:58PM (#41265759)
    1) This article isn't about ATA, ignore it.

    2) The article's point on NCQ is that many consumer drives do not implement it correctly, and disable the write cache on the disk and issue cache-flush requests to increase performance, but leading to possible file-system failures if there is a power outage.

    I think this article is saying that for the enterprise, buy enterprise drives, not consumer drives. Most consumers use laptops now, so power failure doesn't fit in, and consumers prefer speed over reliability, which is why I've always been stuck using laptops lacking ECC RAM.

    • When the power goes out, all cards are in the air anyway. We had a UPS boo-boo and our enterprise drives (both SCSI & SAS) managed to corrupt data, even with a battery on the controller itself (battery was in good health.)

      Shit happens. It's pretty damn difficult to account for power failures... even with battery backups on the local controllers you can only do so much.

    • Windows 7's Device Manager, there is a Policies tab, allowing you to "Enable write caching on the device" and additionally to "Turn off Windows write-cache buffer flushing on the device." The former warns "a power outage or equipment failure might result in data loss or corruption." The latter states "do no select this check box unless the device has a separate power supply that allows the device to flush its buffer in case of power failure." In Windows 7, by default, write-caching is on, and write-cache
      • by ChumpusRex2003 ( 726306 ) on Friday September 07, 2012 @05:28PM (#41267207)

        The "Turn off Windows write-cache buffer flushing on the device" option activates an ancient windows bug, and should never be used.

        When Windows 3.11 was released, MS accidentally introduced a bug, whereby a call to "sync" (or whatever the windows equivalent was called) would usually be silently dropped. At the time, a few programmers noticed that their file I/O appeared to have improved, and attributed this to MS's much marketed new 32-bit I/O layer. What a lot of naive developers didn't notice was that the reason their I/O appeared to be faster was that the OS was handling file steams in an aggressive write-back mode, and then calls to "sync" were being ignored by the OS.

        Because of this, there was a profusion of office software, in particular, accounting software, which would "sync" frequently - some packages would call "sync" on every keypress, or everytime enter was pressed, or the cursor moved to the next data entry field. As on 3.11, this call was effectively a NOP, a lot of packages made it onto client machines, and because it was fast, no one noticed.

        With Win95, MS fixed the bug. Suddenly, corporate offices around the world had their accounting software reduced to glacial speed, and tech support departments at software vendors rapidly went into panic mode. Customers were blaming MS, Win95 was getting slated, lawyers were starting to drool, etc. Developers were calling senators and planning anti-trust actions. The whole thing was getting totally out of hand.

        In the end, MS decided the only way to deal with this bad PR, was to put an option into windows, where the bug could be reproduced for software which depended upon it. The option to activate the bug was hidden away reasonably well, in order to stop most people from turning it on, and running their file-system in a grossly unstable mode. However, in Win95 - Vista, it had a rather cryptic name "Advanced performance", which meant that a lot of hardware enthusiasts would switch it on, in order to improve performance, without any clear idea of what it did. At least in Win7 it now has a clear name, even though it still doesn't make clear that it should only be used for when using defective software.

    • Or you can buy a real RAID controller with battery backup for the cache, in which case you are just fine with the cheap SATA drives.

  • by wonkey_monkey ( 2592601 ) on Friday September 07, 2012 @04:15PM (#41266055) Homepage

    Native Command Queueing []

    Because not everybody knows everythingTM

  • Get Hardware RAID (Score:5, Insightful)

    by FranTaylor ( 164577 ) on Friday September 07, 2012 @04:22PM (#41266179)

    The people who make hardware RAID know all about the lying drives, they get good information from the manufacturer on how to make the drives play nice with the RAID controller.

    Just read the compatibility charts for your RAID controller, many drives have footnotes with minimum drive firmware requirements and other odd behavior.

    • Re:Get Hardware RAID (Score:4, Interesting)

      by randallman ( 605329 ) on Friday September 07, 2012 @04:58PM (#41266725)

      The only real advantage to "Hardware RAID" is the battery backed cache. Hardware RAID comes with the disadvantage of a whole other operating system "firmware" with its own bugs and often proprietary disk layout. Parity calculations are nothing for current CPUs, so the onboard processor is not so useful. Advanced filesystems such as ZFS or BTRFS need direct access to the disks. I'd like to see drives and/or controllers with battery backed cache. Until then, I rely on my UPS.

      • Only? What about the advantage of a lot more SATA/SAS connections than you get on your motherboard? Also ZFS is limited in the number of platforms it is available for and BTRFS is not ready so it's a bit of a red herring throwing those in and saying that hardware RAID is not required because those exist.
  • That would test and identify a drive for NCQ and cache disable/enable operation correctness that would report the model/serial and result to a central website
    • Whether this sort of thing works correctly can change based on drive firmware. So even a given model/serial number combination can change which type of results it gives over time. There is no substitute for testing yourself.

  • by randallman ( 605329 ) on Friday September 07, 2012 @05:06PM (#41266877)

    I think this is quite interesting. []

    While I've often gotten the impression that the write cache opens up a large "write hole", Linus says that data is cached only for milliseconds, not held in the cache for several seconds. Still, I'd like to see battery backed caches in regular drives and/or controllers.

    Would be nice to hear from some drive firmware writers.

  • Put some flash ram on the HD with its own on-board battery backup ...
  • by erc ( 38443 )
    ATA? Does anyone use that anymore? Hasn't the world gone to SATA, FC, or SCSI-? This seems a lot of ado about nothing...
  • The article is total crap, every disk supports NCQ as half the world's population has pointed out in the comments.

    The problems are elsewhere: When a disk suddenly loses power while it is writing, there is a risk of various interesting errors. The disk may a) write nulls instead of the correct data, b) write garbage instead of the correct data, c) fail in the middle of a Read-Modify-Write operation and therefore destroy data in files which weren't written to at all, d) write good data to the wrong place on t

  • Probably every half decent controller card on the market for the last decade gets around this problem with a bit of memory and a battery to keep it alive. If you have a lot of disks on one system you'd probably have a controller like that anyway just to get enough SATA/SAS connections.
    I can see how it's a big deal with workstations/desktops/laptops but that's really only a small chunk of storage in general.
  • This is a lot of noise for nothing. For kids and amateurs, here's a quick summary...

    fsync used-to be the go-to, but that was decades ago, when IDE was in full-swing. Back then, there was a big hub-hub about drives lying. Since then, it's been common knowledge and status-quo that fsync is not trustworthy, end of story.

    Today, we have WRITE BARRIERS, and they work great. Ever since, say, the advent of 60GB IDE drives, I've never found a drive that doesn't support write barriers, and in my conversations wit

    • The XFS guys are right. I've lost data multiple times on XFS due to their disk caches being enabled in Suse 11.4. My disks are ST9750420AS using bios raid. I finally had to disable disk write caches on bootup. These losses were not even due to power failure: these losses were incurred during graceful system shutdown.
  • Shitty consumer oriented hardware not suitable for enterprise class data integrity and retention.

    If you need data integrity and cache, you need a battery backed up IO controller and UPS for a start. If you're relying on the fact that turning cache off on the drive is going to ensure that your writes complete before the power goes out to the drive, you've already set sail for fail.

When a fellow says, "It ain't the money but the principle of the thing," it's the money. -- Kim Hubbard