Forgot your password?
typodupeerror
Data Storage Hardware

The Lies Disks and Their Drivers Tell 192

Posted by Soulskill
from the designed-at-odds dept.
davecb writes "Pity the poor filesystem designer: they just want to know when their data is safe, but the disks and drivers try so hard to make I/O 'easy' that it ends up being stupidly hard. Marshall Kirk McKusick writes about the difficulties in making the systems work nicely together: 'In the real world, many of the drives targeted to the desktop market do not implement the NCQ specification. To ensure reliability, the system must either disable the write cache on the disk or issue a cache-flush request after every metadata update, log update (for journaling file systems), or fsync system call. Both of these techniques lead to noticeable performance degradation, so they are often disabled, putting file systems at risk if the power fails. Systems for which both speed and reliability are important should not use ATA disks. Rather, they should use drives that implement Fibre Channel, SCSI, or SATA with support for NCQ.'"
This discussion has been archived. No new comments can be posted.

The Lies Disks and Their Drivers Tell

Comments Filter:
  • by Anonymous Coward on Friday September 07, 2012 @03:43PM (#41265483)

    Don't assume that "enterprise" disks do this correctly either.

    Many have options to make them behave properly but out of the box have write back caches and ignore FUA or similar, leading to the same problems.

  • by MSTCrow5429 (642744) on Friday September 07, 2012 @03:58PM (#41265759)
    1) This article isn't about ATA, ignore it.

    2) The article's point on NCQ is that many consumer drives do not implement it correctly, and disable the write cache on the disk and issue cache-flush requests to increase performance, but leading to possible file-system failures if there is a power outage.

    I think this article is saying that for the enterprise, buy enterprise drives, not consumer drives. Most consumers use laptops now, so power failure doesn't fit in, and consumers prefer speed over reliability, which is why I've always been stuck using laptops lacking ECC RAM.

  • by wonkey_monkey (2592601) on Friday September 07, 2012 @04:15PM (#41266055) Homepage

    Native Command Queueing [wikipedia.org]

    Because not everybody knows everythingTM

  • by TheGratefulNet (143330) on Friday September 07, 2012 @04:28PM (#41266285)

    yeah, well, I have quite a bit of experience with samsung (not seagate branded but the older samsungs) drives.

    they REPORTED having ncq but you always had to disable them.

    I got so that I do this at bootup:

    if [ -e /sys/block/sda/device/queue_depth ] ; then
          echo " sda NCQ now off"
          echo 1 > /sys/block/sda/device/queue_depth
    fi

    and so on.

    performance does not suffer (that I would care about) BUT the data reliab was more than making up for it. no more timeouts, no more syslog 'scaries'.

    vendors really do fuck up the protocol implementations. seagate is 'strange' in ways, so is WD, so is hitachi and ibm (I know they are not even in the biz anymore, at least for consumer drives).

    windows has a 'blacklist' of what things to not use when talking to drives and so does linux. its a fact of life.

    drive vendors are borderline idiots. sad but true ;(

  • by TheGratefulNet (143330) on Friday September 07, 2012 @04:31PM (#41266355)

    you'll see it in syslog!

    timeouts, retries, even exiting the bus and doing full bus resets (which are slow and you'll NOT miss them).

    as I posted before, older (5yr) samsungs were notorious for SAYING they support ncq but you would be foolish to let it just negotiate it and use it.

    this was how things were in the very early days of 10/100 ethernet and full/half duplex. yes, the early models 'negotiated' duplex but many of them got it wrong and you'd have to manually set this on hubs/switches since you knew better than the equipment. there were even early NIC chips that worked better at 10meg ethernet than 100baseT! we would do ftp transfer tests and quite often a GOOD 10baseT was more reliable (over time) than 100baseT. the same happened to gig-e, too, in the early years.

  • by Eponymous Hero (2090636) on Friday September 07, 2012 @05:02PM (#41266803)
    you didn't bother to RTFA, good for you. it says quite plainly that (only part of) the problem is not drives that don't support ncq, but those drives that have it and disable it. and that was a relatively small portion of TFA. here's how the disks lie:

    File systems need to be aware of the change to the underlying media and ensure that they adapt by always writing in multiples of the larger sector size. Historically, file systems were organized to store files smaller than 512 bytes in a single sector. With the change in disk technology, most file systems have avoided the slowdown of 512-byte writes by making 4,096 bytes the smallest allocation size. Thus, a file smaller than 512 bytes is now placed in a 4,096-byte block. The result of this change is that it takes up to eight times as much space to store a file system with predominantly small files. Since the average file size has been growing over the years, for a typical file system the switch to making 4,096 bytes the minimum allocation size has resulted in a 10- to 15-percent increase in required storage.

    just to clarify what the author's point was:

    The conclusion is that file systems need to be aware of the disk technology on which they are running to ensure that they can reliably deliver the semantics that they have promised. Users need to be aware of the constraints that different disk technology places on file systems and select a technology that will not result in poor performance for the type of file-system workload they will be using. Perhaps going forward they should just eschew those lying disks and switch to using flash-memory technology—unless, of course, the flash storage starts using the same cost-cutting tricks.

    if you want to argue that, great, go nuts. nobody who actually RTFA thinks the argument is really about ncq. the ac you responded to said

    the way I interpret TFA, the problem also applies to SATA drives which do not implement the NCQ specification.

    well, here's what TFA actually said:

    Luckily, SATA (serial ATA) has a new definition called NCQ (Native Command Queueing) that has a bit in the write command that tells the drive if it should report completion when media has been written or when cache has been hit. If the driver correctly sets this bit, then the disk will display the correct behavior.

    In the real world, many of the drives targeted to the desktop market do not implement the NCQ specification. To ensure reliability, the system must either disable the write cache on the disk or issue a cache-flush request after every metadata update, log update (for journaling file systems), or fsync system call. Both of these techniques lead to noticeable performance degradation, so they are often disabled, putting file systems at risk if the power fails. Systems for which both speed and reliability are important should not use ATA disks. Rather, they should use drives that implement Fibre Channel, SCSI, or SATA with support for NCQ

    i hope it's painfully obvious by now that the point about ncq is not that some drives don't have it; it's that some don't use it -- mostly so you don't go giving their drives bad reviews for being slow but unnoticeably reliable. if it's disabled, you can enable it. what sata drives don't have ncq? i asked wikipedia:

    SATA revision 1.0 (SATA 1.5 Gbit/s) .... During the initial period after SATA 1.5 Gbit/s finalization, adapter and drive manufacturers used a "bridge chip" to convert existing PATA designs for use with the SATA interface. Bridged drives have a SATA connector, may include either or both kinds of power connectors, and, in general, perform identically to their PATA equivalents. Most lack support for some SATA-specific features such as NCQ. Native SATA products quickly eclipsed bridged products with the introduction of the second generation of SATA drives.

    so yeah, probably not a whole lot of these drives being sold new, but there are lots of shops that buy used gear because it's cheap. these older sata drives haven't all just disappeared when revision 2.0 came out.

  • by ChumpusRex2003 (726306) on Friday September 07, 2012 @05:28PM (#41267207)

    The "Turn off Windows write-cache buffer flushing on the device" option activates an ancient windows bug, and should never be used.

    When Windows 3.11 was released, MS accidentally introduced a bug, whereby a call to "sync" (or whatever the windows equivalent was called) would usually be silently dropped. At the time, a few programmers noticed that their file I/O appeared to have improved, and attributed this to MS's much marketed new 32-bit I/O layer. What a lot of naive developers didn't notice was that the reason their I/O appeared to be faster was that the OS was handling file steams in an aggressive write-back mode, and then calls to "sync" were being ignored by the OS.

    Because of this, there was a profusion of office software, in particular, accounting software, which would "sync" frequently - some packages would call "sync" on every keypress, or everytime enter was pressed, or the cursor moved to the next data entry field. As on 3.11, this call was effectively a NOP, a lot of packages made it onto client machines, and because it was fast, no one noticed.

    With Win95, MS fixed the bug. Suddenly, corporate offices around the world had their accounting software reduced to glacial speed, and tech support departments at software vendors rapidly went into panic mode. Customers were blaming MS, Win95 was getting slated, lawyers were starting to drool, etc. Developers were calling senators and planning anti-trust actions. The whole thing was getting totally out of hand.

    In the end, MS decided the only way to deal with this bad PR, was to put an option into windows, where the bug could be reproduced for software which depended upon it. The option to activate the bug was hidden away reasonably well, in order to stop most people from turning it on, and running their file-system in a grossly unstable mode. However, in Win95 - Vista, it had a rather cryptic name "Advanced performance", which meant that a lot of hardware enthusiasts would switch it on, in order to improve performance, without any clear idea of what it did. At least in Win7 it now has a clear name, even though it still doesn't make clear that it should only be used for when using defective software.

  • by anomaly256 (1243020) on Friday September 07, 2012 @05:40PM (#41267385)
    Green drives from Seagate do not appear to have NCQ. As per below, I have 1 normal and 4 greens in this box:

    ~$ cat /sys/block/sd?/device/queue_depth
    31
    1
    1
    1
    1

    ~$ cat /sys/block/sd?/device/queue_type
    simple
    none
    none
    none
    none

fortune: cannot execute. Out of cookies.

Working...