Slashdot Log In
Error-Proofing Data With Reed-Solomon Codes
Posted by
kdawson
on Monday August 04, @12:39AM
from the trust-but-verify dept.
from the trust-but-verify dept.
ttsiod recommends a blog entry in which he details steps to apply Reed-Solomon codes to harden data against errors in storage media. Quoting: "The way storage quality has been nose-diving in the last years, you'll inevitably end up losing data because of bad sectors. Backing up, using RAID and version control repositories are some of the methods used to cope; here's another that can help prevent data loss in the face of bad sectors: Hardening your files with Reed-Solomon codes. It is a software-only method, and it has saved me from a lot of grief..."
Related Stories
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.

It can make files a bit hard to read, though (Score:4, Funny)
Reply to This
Re:It can make files a bit hard to read, though (Score:5, Insightful)
It really depends where you store the FEC, some techniques store the fec separately others concatenate and others interleave the FEC. Each method has its own advantages and disadvantages.
Reply to This
Parent
Re:It can make files a bit hard to read, though (Score:5, Funny)
I for one welcome our Perl Overlords
Reply to This
Parent
Drives already do this (Score:4, Informative)
... at least CDROMs employ RS codes.
Reply to This
Re:Drives already do this (Score:4, Interesting)
My understanding is that it is possible to drill a few holes no larger than 2mm in diameter equally spread over the surface of an "audio cd" and with the help of h/w RS erasure decoding, channel interleaving and channel prediction (eg:probabilistically reconstruct missing right channel from known left channel) one can produce a near perfect reconstruction - that's what usually happens to overcome scratches and other kinds of simple surface defects.
Reply to This
Parent
Re:Drives already do this (Score:5, Informative)
Reply to This
Parent
Re:Drives already do this (Score:5, Informative)
Radial scratches go from center to edge, azimuthal scratches go around the center.
Reply to This
Parent
Re:Drives already do this (Score:5, Informative)
A CD-ROM sector contains 2352 bytes, divided into 98 24-byte frames. The CD-ROM is, in essence, a data disk, which cannot rely on error concealment, and therefore requires a higher reliability of the retrieved data. In order to achieve improved error correction and detection, a CD-ROM has a third layer of Reed-Solomon error correction.[1] A Mode-1 CD-ROM, which has the full three layers of error correction data, contains a net 2048 bytes of the available 2352 per sector. In a Mode-2 CD-ROM, which is mostly used for video files, there are 2336 user-available bytes per sector. The net byte rate of a Mode-1 CD-ROM, based on comparison to CDDA audio standards, is 44.1k/s×4B×2048/2352 = 153.6 kB/s. The playing time is 74 minutes, or 4440 seconds, so that the net capacity of a Mode-1 CD-ROM is 682 MB.
I'd say that's a yes.
Reply to This
Parent
Re:Drives already do this (Score:5, Interesting)
My biggest failed prediction in the world of computers was the CD-ROM.
I was an audio CD early adopter and I knew from articles I read that audio CD's often had a certain defect rate. The defect rate was usually such that you would never hear it. One artist even published all the defects in the liner notes.
Based upon this, I presumed that you would never get the defect rate to zero and that no one would trust a data medium with anything less than perfection - and thus predicted the CD-ROM would never catch on.
They don't have to get the rate to zero. Just close enough to zero for the RS to function.
Reply to This
Parent
Re:Drives already do this (Score:5, Informative)
Reply to This
Parent
Re:Drives already do this (Score:5, Interesting)
Reply to This
Parent
Re:Drives already do this (Score:5, Funny)
Dude, put the bong down.
Reply to This
Parent
Re:Drives already do this (Score:5, Interesting)
I loved my DAT (for audio) portable recorder, it employed Double-Reed-Solomon error correction, you would have to do some serious hammering to the side of the recorder to get the tape to "skip" in a way the error correction could not correct it and you'd hear it drop out, running and recording was NOT out of the question though.
Now what do the consumers have for recorders - cr*ppy, cheap, nasty, low bitrate, overcompressed MP3 recorders. The recording industry killed off an excellent (but expensive) format to palm off rubbish compressed audio to the masses. (Proper PCM recorders are no different in price to the DAT decks).
Reply to This
Parent
Yes, RS should be a file system service. (Score:4, Insightful)
I agree. I'm willing to have a small loss in speed and a small increase in price to have better data integrity.
There is already data integrity technology embedded in hard drives, and I support making it more robust.
Reply to This
Parent
Re:Yes, RS should be a file system service. (Score:5, Funny)
Those who sacrifice speed for data integrity deserve neither - BF
Reply to This
Parent
Speed? (Score:4, Interesting)
My question is of speed; this seems a promising addition to anyone's back up routine. However, most folks I know have 100s of gigs of data to back up. While differentials could be involved, right now tar'ing to tape works fast enough taht the backup is done before the first staff shows up for work.
I assume we're beating the hell out of the processor here; so I'm wondering how painful is this in terms of speed?
Reply to This
Re:Speed? (Score:5, Interesting)
The speed of encoding and decoding directly relates to the type of RS and the amount of FEC required. Generally speaking erasure style RS can go as low as O(nlogn) (essentially inverting and solving for a vandermonde or Cauchy style matrix) A more general code that can correct errors (the difference between an error and an erasure is that in the latter you know the location of the error but not its magnitude) may require a more complex process, something like Syndrome-Berlekamp Massey-Forney which is about O(n^2).
It is possible to buy specialised h/w (or even GPUs) to perform the encoding steps (getting roughly 100+MB/s) and most software encoders can do about 50-60+Mb/ for RS(255,223) - YMMV
Reply to This
Parent
what about quickpar and dvdisaster? (Score:5, Informative)
quickpar [quickpar.org.uk] especially has been in use on usenet/newsgroups for years....o yea...forgot....they are trying to kill it.
anyways...there's also dvdisaster [dvdisaster.net] which now has several ways of "hardening".
one of them seems to catch my attention: adds error correction data to a CD/DVD (via a disc image/iso)
Reply to This
Datarecovery "data". (Score:5, Insightful)
Working for a datarecovery company, I know that about half the cases where data is lost the whole drive "disappears". So, bad sectors? You can solve that problem with reed solomon! Fine! But that doesn't replace the need for backups to help you recover from: accidental removal, fire, theft and total disk failure (and probably a few other things I can't come up with right now)... .
Reply to This
Re:ZFS? (Score:5, Informative)
checksums really only help in detecting errors. Once you've found errors, if you have an exact redundancy somewhere else you can repair the errors. What reed-solomon codes do is provide the error detecting ability but also the error correcting ability whilst at the same time reducing the amount of redundancy required to a near theoretical minimum.
btw checksums have limits on how many errors they can detect within lets say a file or other kind of block of data. A simple rule of thumb (though not exact) is that 16 and 32 bit checksums can detect upto 16,32 bit errors respectively anymore and the chance of not detecting every bit error goes up, it could even result in not finding any errors at all.
Reply to This
Parent
Re:ZFS? (Score:4, Informative)
I have been a ZFS user for a while and know a lot of its internals. Let me comment on what you said.
Not in ZFS. When the checksum reveals silent data corruption, ZFS attempts to self-heal itself by rewriting the sector with a known good copy. Self-healing is possible if you are using mirroring, raidz (single parity), raidz2 (dual parity), or even a single disk (provided the copies=2 filesystem attribute is set). The self-healing algorithm in the raidz and raidz2 cases is actually interesting as it is based on combinatorial reconstruction: ZFS makes a series of guesses as to which drive(s) returned bad data, it reconstructs the data block from the other drives, and then validates whether this guess was correct or not by verifying the checksum.
All the ZFS checksumming algorithms (fletcher2, fletcher4, SHA-256) generate 256-bit checksums. The default is fletcher2 and offers very good error detection (even errors affecting more than 256 bits of data) assuming unintentional data corruption (the fletcher family are not a cryptographic hash algorithms, it is actually possible to intentionally find collisions). SHA-256 is collision-resistant therefore it will in practice detect all data corruptions. It would be computationally infeasible to come up with a corrupted data block that still matches the SHA-256 checksum.
A good intro to the ZFS capabilities are these slides [opensolaris.org]
Reply to This
Parent
Re:As I understand it... (Score:4, Insightful)
You're asking the wrong question.
The right question is: Given a 1Gb file, how much "mutation" do you have to do to it to produce a file with the same hash? And the answer to that is: Enough to make the data unrecoverable no matter what you do.
Reply to This
Parent
Re:Been doing this from the past 3 decades (Score:5, Funny)
"Awk! Parroty Error! Parroty Error! Pieces of Seven, Pieces of Seven"
(*BOOM*) never did like that bird.
Reply to This
Parent
Re:Version control != backups (Score:4, Insightful)
The best solution is for developers to use their own private branches. Then they can commit as much as they want, and integrate into the main branch when they're ready. Unfortunately subversion has crappy support for integration (even with version 1.5 AFAICT) compared to something like perforce.
Reply to This
Parent
This is not the same thing as PAR ... (Score:5, Informative)
The difference is that TFA interleaves the data so it is robust against sector errors. A bad sector contains bytes from many different data blocks so each data block only loses one byte which is easy to recover from. If you use PAR and encounter a bad sector, you're SOL.
PAR was designed to solve a different problem and it solves that different problem very well but it wasn't designed to solve the problem that is addressed by TFA. Use PAR to protect against "the occasional bit error" as you suggest, but use the scheme given in TFA to protect against bad sectors.
Reply to This
Parent