Your Hard Drive Lies to You 512
fenderdb writes "Brad Fitzgerald of LiveJournal fame has written a utility and a quick article on how all hard drives from the consumer level to the highest level 'enterprise' grade SCSI and SATA drives do not obey the fsync() function. Manufacturers are blatantly sacrificing integrity in favor of scoring higher on 'pure speed' performance benchmarking."
Why do we need it? (Score:4, Interesting)
If we've made it this far without it, why do we need it now?
I'm just curious...
Of course it does! (Score:5, Interesting)
Most systems can identify and patch out bad sectors so that they aren't used. What surprised me is that the manufacturers have their own bad sector table, so when you get the disk it's fairly likely that there are already bad areas which have been mapped out.
Secondly the raw error rate was astoundingly high. It's been quite a few years but it was somewhere between on error in every 10E5 to 10E6 bits. So it's not unusual to find a mistake in every megabyte read. Of course CRC picks up this error and hides that from you too.
Granted this was a few years ago, but i wouldn't be surprised if it's as bad (or even worse) now.
More information (Score:5, Interesting)
Sadly unpredictable (Score:5, Interesting)
While working at said hard disk company i had one of their smaller disks sitting on the end of a steel ruler on my desk. I spun round on my chair, as i do when i'm thinking, and hit the other end of the ruler with my elbow. This of course launched the disk across the room, slamming it against the wall.
Given that I was in the process of writing software to diagnose failure's I was quite excited about this accident. Of course i return the disk to the test setup and there's nothing wrong.
In my experience, the only sure fire way to have a disk fail is to place any piece of important, but un-backed-up, work on it.
Re:Which ones ? (Score:5, Interesting)
Easy. The driver gets a 'sync' command from the OS. However, the driver writer believes that most other programmers call fsync() when they don't really need to, and decides to "optimize" this case. So he passes the command on to the drive, but returns immediately (allowing the drive command to complete asynchronously). This makes his driver appear faster.
Fortunately, most driver writers have their priorities straight about data integrity, so this kind of thinking isn't very common.
Schwab
Re:Of course it does!-Perfect world. (Score:3, Interesting)
I was surprised that some reasonable proportion of hard drives sold have errors on them at that point in time.
Part of me wonders if this explains the anecdotal stories that SCSI disks are more reliable than their cheaper ATA counterparts - even when they use the same physical hardware. Perhaps (and this is blind speculation) the drives with fewer errors get sold to the customers willing to pay more.
Re:Author lied when implied that DRIVES are the is (Score:3, Interesting)
Linux 2.6 and IDE (Score:1, Interesting)
Re:Of course it does! (Score:3, Interesting)
Re:Why do we need it? (Score:3, Interesting)
Of course write speed has increased as well, but typical cache size of 8MB and write speed of 50MB/s would mean 160ms of continuous writing when the head already is positioned correctly.
Assuming the cache can contain blocks scattered over the entire disk, it does not seem realistic to write everything back on power failure.
Examples from the World of Windows. (Score:5, Interesting)
Then they released Windows 2000 Service Pack 3, which fixed some previous cacheing bugs, as documented in KB332023 [microsoft.com]. The article tells you how to set up the "Power Protected" Write Cache Option", which is your way of saying "yes, my storage has a UPS or battery-backed cache, give me the performance and let me worry about the data integrity".
I work for a major storage hardware vendor: to cut a long story short, we knew fsync() (a.k.a. "write-through" or "synchronize cache") was working on our hardware, when the performance started sucking after customers installed W2K SP3, and we had to refer customers to the latter article.
The same storage systems have battery-backed cache, and every write from cache to disks is made write-through (because drive cache is not battery-backed). In other words, in these and other Enterprise-class systems, the burden of honouring fsync() / write-through commands from the OS has switched to the storage controller(s), the drives might as well have no cache for all we care. But it still matters that the drives do honour the fsync() we send to them from cache, and not signal "clear" when they're not - if they lie, the cache drops that data, and no battery will get it back..!
Re:An acceptable alternative. (Score:3, Interesting)
How fast is a kilobit per second data transmission? Is it 1024 bits/s or 1000 bits/s?
As much as it pains me, because I know they did it to screw customers, moving to the standard was correct. It *ought* to match everything else for reasons of consistency; it is more important to have current consistency across all current measurements inside of the computer than it is to have historical consistency of measurements used previously.
Re:Err... "lying" is the default setting. RTFM. (Score:4, Interesting)
The truth is that he doesn't know wtf he's talking about. I decide to cut him some slack though, because the FreeBSD 4 man pages at least are very misleading, and I don't know what man pages he did read.
By the way, I sent him an e-mail. It's available on my web space [altunderctrl.se]. I'm not posting it in full here, because it's a little long and it would be redundant, since a lot of the surrounding posts discuss pretty much the same thing as I said.
Re:Err... "lying" is the default setting. RTFM. (Score:3, Interesting)
>just waiting 2 seconds, how ingenious !
What would you have done? Verifying all data would probably take longer than 2 seconds, and you can't trust the disk to tell you when it's written the data.
So you'd either have to figure out all the data that was in the cache, and verify that against the disk surface and only write when all that is done, or wait a bit. Making some assumptions about buffer size and transfer speed, then adding a saftey factor, is probably where the 2 second came from.
Did it work? Well it'd appear so. Whats so bad about MS's fix?
Put a capacitor on the harddrive (Score:3, Interesting)