Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage Bug

Your Hard Drive Lies to You 512

fenderdb writes "Brad Fitzgerald of LiveJournal fame has written a utility and a quick article on how all hard drives from the consumer level to the highest level 'enterprise' grade SCSI and SATA drives do not obey the fsync() function. Manufacturers are blatantly sacrificing integrity in favor of scoring higher on 'pure speed' performance benchmarking."
This discussion has been archived. No new comments can be posted.

Your Hard Drive Lies to You

Comments Filter:
  • Why do we need it? (Score:4, Interesting)

    by Godman ( 767682 ) on Friday May 13, 2005 @03:25AM (#12517247) Homepage Journal
    If we are just now figuring out that fsync's don't work, then the question is, why do we care? Have we been using them, and they just haven't been working or something?

    If we've made it this far without it, why do we need it now?

    I'm just curious...
  • Of course it does! (Score:5, Interesting)

    by grahamsz ( 150076 ) on Friday May 13, 2005 @03:26AM (#12517253) Homepage Journal
    Having written some diagnostic tools for a smaller hard disk maker (who i'll refrain from naming) it's amazing to me that disks work at all.

    Most systems can identify and patch out bad sectors so that they aren't used. What surprised me is that the manufacturers have their own bad sector table, so when you get the disk it's fairly likely that there are already bad areas which have been mapped out.

    Secondly the raw error rate was astoundingly high. It's been quite a few years but it was somewhere between on error in every 10E5 to 10E6 bits. So it's not unusual to find a mistake in every megabyte read. Of course CRC picks up this error and hides that from you too.

    Granted this was a few years ago, but i wouldn't be surprised if it's as bad (or even worse) now.
  • More information (Score:5, Interesting)

    by Halo1 ( 136547 ) on Friday May 13, 2005 @03:39AM (#12517305)
    There was an interesting discussion [apple.com] on this topic a while ago on Apple's Darwin development list a while ago.
  • Sadly unpredictable (Score:5, Interesting)

    by grahamsz ( 150076 ) on Friday May 13, 2005 @03:46AM (#12517328) Homepage Journal
    i know all disks ultimately fail, but it's frustrating that some can be really abused and run for years, when others die abruptly.

    While working at said hard disk company i had one of their smaller disks sitting on the end of a steel ruler on my desk. I spun round on my chair, as i do when i'm thinking, and hit the other end of the ruler with my elbow. This of course launched the disk across the room, slamming it against the wall.

    Given that I was in the process of writing software to diagnose failure's I was quite excited about this accident. Of course i return the disk to the test setup and there's nothing wrong.

    In my experience, the only sure fire way to have a disk fail is to place any piece of important, but un-backed-up, work on it.
  • Re:Which ones ? (Score:5, Interesting)

    by ewhac ( 5844 ) on Friday May 13, 2005 @03:46AM (#12517330) Homepage Journal
    Can someone explain how OSes could lie?

    Easy. The driver gets a 'sync' command from the OS. However, the driver writer believes that most other programmers call fsync() when they don't really need to, and decides to "optimize" this case. So he passes the command on to the drive, but returns immediately (allowing the drive command to complete asynchronously). This makes his driver appear faster.

    Fortunately, most driver writers have their priorities straight about data integrity, so this kind of thinking isn't very common.

    Schwab

  • by grahamsz ( 150076 ) on Friday May 13, 2005 @03:52AM (#12517354) Homepage Journal
    Obviously everything will ultimately fail. I know that the semiconductor industry make the same part, test it to see how fast it is, then sell it as different models based on the test results.

    I was surprised that some reasonable proportion of hard drives sold have errors on them at that point in time.

    Part of me wonders if this explains the anecdotal stories that SCSI disks are more reliable than their cheaper ATA counterparts - even when they use the same physical hardware. Perhaps (and this is blind speculation) the drives with fewer errors get sold to the customers willing to pay more.
  • by Sinner ( 3398 ) on Friday May 13, 2005 @04:06AM (#12517399)
    Parent either doesn't know what he's talking about, or is a troll. Pity there isn't an "incoherent rant" moderation option, or we could avoid the ambiguity.
  • Linux 2.6 and IDE (Score:1, Interesting)

    by Anonymous Coward on Friday May 13, 2005 @05:16AM (#12517605)
    Isn't this something that Alan Cox is complaining about in the Linux 2.6 IDE layer? Something about fsync not always waiting for the completion of the cache flush? He tells everyone on LKML to turn the disk write-cache off on IDE disks to make fsync work properly. Or am I clueless?
  • by pyropunk51 ( 819247 ) on Friday May 13, 2005 @05:24AM (#12517625) Homepage
    As anybody who's ever used (or had to use :-( ) SpinRite [grc.com] will tel you, your HDD not only lies to you, it cheats and steals as well. To whit: It makes it seem there are no bad sectors, when in fact the surface is riddled with them, only the manufacturer hides this fact from you by having a bad sector table. Also errors are corrected on the fly by some CRC checking. You can ask the SMART for the stats, but you can do very little about the results it gives you, other than maybe buying a new disk (which most likely has a different set of problems - you just don't know what they are). And where have you ever seen a 40Gb drive that is exactly 40 billion bytes big? The bottom line is: Reliability is NOT profitable. Where would Hardware manufacturers be if we didn't have to buy a new disk every 2 years!
  • by pe1chl ( 90186 ) on Friday May 13, 2005 @05:43AM (#12517685)
    But since then, the angular momentum of drives has decreased, and cache size has increased.
    Of course write speed has increased as well, but typical cache size of 8MB and write speed of 50MB/s would mean 160ms of continuous writing when the head already is positioned correctly.
    Assuming the cache can contain blocks scattered over the entire disk, it does not seem realistic to write everything back on power failure.
  • by stereoroid ( 234317 ) on Friday May 13, 2005 @05:56AM (#12517769) Homepage Journal
    Microsoft have had a few problems in this area - see KB281672 [microsoft.com] for example.

    Then they released Windows 2000 Service Pack 3, which fixed some previous cacheing bugs, as documented in KB332023 [microsoft.com]. The article tells you how to set up the "Power Protected" Write Cache Option", which is your way of saying "yes, my storage has a UPS or battery-backed cache, give me the performance and let me worry about the data integrity".

    I work for a major storage hardware vendor: to cut a long story short, we knew fsync() (a.k.a. "write-through" or "synchronize cache") was working on our hardware, when the performance started sucking after customers installed W2K SP3, and we had to refer customers to the latter article.

    The same storage systems have battery-backed cache, and every write from cache to disks is made write-through (because drive cache is not battery-backed). In other words, in these and other Enterprise-class systems, the burden of honouring fsync() / write-through commands from the OS has switched to the storage controller(s), the drives might as well have no cache for all we care. But it still matters that the drives do honour the fsync() we send to them from cache, and not signal "clear" when they're not - if they lie, the cache drops that data, and no battery will get it back..!

  • by Kiryat Malachi ( 177258 ) on Friday May 13, 2005 @07:40AM (#12518170) Journal
    Correct is the definitions that follow standard usage, and usage in EVERY OTHER BRANCH OF THE COMPUTER WORLD.

    How fast is a kilobit per second data transmission? Is it 1024 bits/s or 1000 bits/s?

    As much as it pains me, because I know they did it to screw customers, moving to the standard was correct. It *ought* to match everything else for reasons of consistency; it is more important to have current consistency across all current measurements inside of the computer than it is to have historical consistency of measurements used previously.
  • by pv2b ( 231846 ) on Friday May 13, 2005 @08:21AM (#12518391)
    Right. And the author is implementing a program that sends raw commands to ATA drives... in perl. Right. He does no such thing, at least not what I can see, by glancing at the source code of the perl script. Granted, I'm not fluent in perl, but it doesn't seem to do anything else than to do an fsync() equivalent. Please do correct me if I'm wrong.

    The truth is that he doesn't know wtf he's talking about. I decide to cut him some slack though, because the FreeBSD 4 man pages at least are very misleading, and I don't know what man pages he did read.

    By the way, I sent him an e-mail. It's available on my web space [altunderctrl.se]. I'm not posting it in full here, because it's a little long and it would be redundant, since a lot of the surrounding posts discuss pretty much the same thing as I said.
  • by c_oflynn ( 649487 ) on Friday May 13, 2005 @09:02AM (#12518683)
    >I found it nice to see how M$ worked around it,
    >just waiting 2 seconds, how ingenious !

    What would you have done? Verifying all data would probably take longer than 2 seconds, and you can't trust the disk to tell you when it's written the data.

    So you'd either have to figure out all the data that was in the cache, and verify that against the disk surface and only write when all that is done, or wait a bit. Making some assumptions about buffer size and transfer speed, then adding a saftey factor, is probably where the 2 second came from.

    Did it work? Well it'd appear so. Whats so bad about MS's fix?
  • by kublikhan ( 838265 ) on Friday May 13, 2005 @01:03PM (#12521277)
    Couldn't they just stick a large capacitor or small battery on the harddrive that is only used for flushing the write cache to the platters in the event of a power failure? It should be a simple enough matter, we only need a few seconds here, and it would solve this whole mess.

Always draw your curves, then plot your reading.

Working...