Forgot your password?
typodupeerror
Data Storage IT

Ask Slashdot: Practical Bitrot Detection For Backups? 321

Posted by timothy
from the error-detected-goodbye dept.
An anonymous reader writes "There is a lot of advice about backing up data, but it seems to boil down to distributing it to several places (other local or network drives, off-site drives, in the cloud, etc.). We have hundreds of thousands of family pictures and videos we're trying to save using this advice. But in some sparse searching of our archives, we're seeing bitrot destroying our memories. With the quantity of data (~2 TB at present), it's not really practical for us to examine every one of these periodically so we can manually restore them from a different copy. We'd love it if the filesystem could detect this and try correcting first, and if it couldn't correct the problem, it could trigger the restoration. But that only seems to be an option for RAID type systems, where the drives are colocated. Is there a combination of tools that can automatically detect these failures and restore the data from other remote copies without us having to manually examine each image/video and restore them by hand? (It might also be reasonable to ask for the ability to detect a backup drive with enough errors that it needs replacing altogether.)"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Practical Bitrot Detection For Backups?

Comments Filter:
  • PAR2 (Score:5, Informative)

    by Anonymous Coward on Tuesday December 10, 2013 @01:18PM (#45651923)
  • ZFS filesystem (Score:5, Informative)

    by Anonymous Coward on Tuesday December 10, 2013 @01:19PM (#45651939)

    One single cmd will do that,

    zpool scrub

    • Yep, ZFS
    • Re:ZFS filesystem (Score:5, Informative)

      by vecctor (935163) on Tuesday December 10, 2013 @01:41PM (#45652283)

      Agreed, ZFS does exactly this, though without the remote file retrieval portion.

      To elaborate:

      http://en.wikipedia.org/wiki/ZFS#ZFS_data_integrity [wikipedia.org]

      End-to-end file system checksumming is built in, but by itself this will only tell you the files are corrupt. To get the automatic correction, you also need to use one of the RAID-Z modes (multiple drives in a software raid). OP said they wanted to avoid that, but for this kind of data I think it should be done. Having both RAID and an offsite copy is the best course.

      You could combine it with some scripts inside a storage appliance (or old PC) using something like Nas4Free (http://www.nas4free.org/), but I'm not sure what it has "out of the box" for doing something like the remote file retrieval. What it would give is the drive health checks that OP was talking about; this can be done with both S.M.A.R.T. info and emailing error reports every time the system does a scrub of the data (which can be scheduled).

      Building something like this may cost a bit more than for just an external drive, but for this kind of irreplaceable data it is worth it. A small atom server board with 3-4 drives attached would be plenty, would take minimal power, and would allow access to the data from anywhere (for automated offsite backup pushes, viewing files from other devices in the house, etc).

      I run a nas4free box at home with RAID-Z3 and have been very happy with the capabilities. In this configuration you can lose 3 drives completely and not lose any data.

      • Re:ZFS filesystem (Score:5, Informative)

        by Guspaz (556486) on Tuesday December 10, 2013 @01:52PM (#45652421) Homepage

        You don't need raidz or multiple drives to get protection against corrupt blocks with ZFS. It supports ditto blocks, which basically just means mirrored copies of blocks. It tries to keep ditto blocks as far apart from eachother on the disk as possible.

        By default, ZFS only uses ditto blocks for important filesystem metadata (the more important the data, the more copies). But you can tell it that you want to use ditto blocks on user data too. All you do is set the "copies" property:

        # zfs set copies=2 tank

        • by cas2000 (148703)

          true, but you do need multiple disks (mirrored or raidz) to protect against drive failure.

          two or more copies of your data on the one disk won't help at all if that disk dies.

          fortunately, zfs can give you both raid-like multiple disk storage (mirroring and/or raidz) as well as errror detection and correction.

          That ZFS_data_integrity [wikipedia.org] link in the post you were replying to gives a pretty good summary of how it works.

          The paragraphs immediately above that (titled 'Data integrity', 'Error rates in hard disks', and

    • Re: (Score:2, Informative)

      by Mike Kirk (58190)

      I'm another fan of backups to disks stitched together with ZFS. In the last year I've had two cases where "zfs scrub" started to report and correct errors in files one to two months in advance of a physical hard drive failure (I have it scheduled to run weekly). Eventually the drives faulted and were replaced, but I had plenty of warning, and RAIDZ2 kept everything humming along perfectly while I sourced replacements.

      For offsite backups I currently rotate offline HDD's, but I should move to Cloud storage. G

    • ZFS immediately came to mind when I read the summary.
  • I don't know if there's a better solution, but you could store checksums of each archived file, and then periodically check the file against its checksum. It'd be a bit resource intensive to do, but it should work. I think some advanced filesystems can do automatic checksums (e.g. ZFS, BTRFS), but those may not be an option, and I'm not entirely sure how it works in practice.
    • I use checksums to check for bitrot.

      .
      Once a week, I use openssl to calculate a checksum for each file; and I write that checksum, along with the path/filename, to a file. The next week, I do the same thing, and I compare (diff) the prior checksum file with the current checksum file.

      With about a terabyte of data, I've not seen any bitrot yet.

      Long term, I plan to move to ZFS, as the server's disk capacity will be rising significantly.

    • Re: (Score:2, Interesting)

      by Anonymous Coward

      Periodically checking them is the important part that no one seems to want to do.

      A few years back we had a massive system failure and once we recovered the underlying problems and began recovery we found that most of the server image backup tapes for 6 months+ could not be loaded. The ops guys took a severe beating for it.

      You think this stuff will never happen but it always does. We had triple redundancy with our own power backups but even that wasn't on a regular test cycle. Some maintenance guy left the s

    • Re:Checksums? (Score:5, Informative)

      by Waffle Iron (339739) on Tuesday December 10, 2013 @01:50PM (#45652393)

      I never archive any significant amount of data without first running this script at the top:

      find -type f -not -name md5sum.txt -print0|xargs -0 md5sum >> md5sum.txt

      It's always good to run md5sum --check right after copying or burning the data. In the past, at least a couple of percent of all the DVDs that I've burned had some kind of immediate data error

      (A while back, I rescanned a couple of hundred old DVDs that I burned ranging up to 10 years old, and I didn't find a single additional data error. I think that a lot of cases where people report that DVDs deteriorate over time, they never had good data on them in the first place and only discover it later.)

      • I don't have a large amount of critical data to backup (mostly documents for research). I've been using PAR (or rather relying on it) to verify and correct errors when recovering data.

        That said, I realize I should probably also have a checksum. Should one consider a different algorithm then MD5, for example to prevent collisions of the hashes?

        • While MD5 isn't really secure against intentional attacks any more, the probability of an random collision is still negligible.

          I originally started using MD5 for this purpose because in a test I did many years ago one some machine, md5sum actually ran faster than cksum. The shorter cksum data also does have a chance to generate hash collisions on reasonable sized data sets, although that probably doesn't matter too much for just disk error checking. I don't use the newer algorithms because they're overkill

      • Or use sha1deep from the md5deep package. [sourceforge.net] It's made specifically for hashing and comparing file trees and has heaps of behavior-modifying options.
  • ZFS (Score:5, Interesting)

    by Electricity Likes Me (1098643) on Tuesday December 10, 2013 @01:27PM (#45652075)

    ZFS without RAID will still detect corrupt files, and more importantly tell you exactly which files are corrupt. So a distributed group of ZFS drives could be used to rebuild a complete backup by only copying uncorrupt files from each.

    You still need redundancy, but you can get away without the RAID in each case.

  • Bitrot does happen.
    When a disk has a bad block and detects that, it will try to read the data from it and put it on a block from the reserve-pool. However, the data might be bad and corrupt, so you lose data.
    Disks do have a Reed-Solomon (aka par-files) index, so it can repair some damage, but it doesn't always succeed.

    Anyway, what I do for important things, is have par2 blocks that go along with the data. All my photo-archives have par2 files attached to them.

    I reckon you could even automate it. To have a s

  • A paranoid setup (Score:5, Interesting)

    by brokenin2 (103006) * on Tuesday December 10, 2013 @01:42PM (#45652297) Homepage

    If you really want hassle free and safe, it would be expensive, but this is what I would do:

    ZFS for the main storage - Either using double parity via ZFS or on a raid 6 via hardware raid.

    Second location - Same setup, but maybe with a little more space

    Use rsync between them using the --backup switch so that any changes get put into a different folder.

    What you get:

    Pretty disaster tolerant
    Easy to maintain/manage
    A clear list of any files that may have been changed for *any* reason (Cryptolocker anyone?)
    Upgradable - just change drives
    Expense - You can build it for about $1800 per machine or $3600 total if you go full-on hardware raid. That would give you about 4TB storage after parity (4 2TB drives - $800, Raid Card - $500, basic server with room in the case - $500)

    What you don't get: Lost baby pictures/videos. I've been there, and I'd pay a lot more than this to get them back at this point, and my wife would pay a lot more than I would..

    Your current setup is going to be time consuming, and you're going to lose things here and there anyway.. If you just try to do the same thing but make it a little better, you're still going to have the same situation, just not as bad. In this setup you have to have like 5 catastrophic failures to lose anything, sometimes even more..

    • by Minwee (522556)

      Expense - You can build it for about $1800 per machine or $3600 total if you go full-on hardware raid. That would give you about 4TB storage after parity (4 2TB drives - $800, Raid Card - $500, basic server with room in the case - $500)

      Either use a RAID controller or use ZFS. It's not a good idea to use both at the same time.

      • by brokenin2 (103006) *

        I've used them together. Seems to work just fine.. Just don't let ZFS know that there's more than 1 drive. You can't have them both trying to manage the redundant storage.

        ZFS has some great features besides it's redundant storage. You can get them from other filesystems too though I suppose, but I like snapshots built into the filesystem. It *is* overkill to have the filesystem doing checksums and the raid card detecting errors as well, but that's why this is the paranoia setup... Not really looking for t

        • by cas2000 (148703)

          > Just don't let ZFS know that there's more than 1 drive.

          That is *precisely* the wrong thing to do. As in, the exact opposite of how you should do it.

          Instead, configure the RAID card to be JBOD and let ZFS handle the multiple-drive redundancy (raidz and/or mirroring), as well as the error detection and correction.

          Otherwise, there is little or no benefit in using ZFS. ZFS can't correct many problems if it doesn't have direct control over the individual disks, and RAID simply can't do the things that ZF

      • by fnj (64210)

        Never use a RAID controller, period. ZFS builtin RAIDZ is far superior in every way.

    • Use rsync between them using the --backup switch so that any changes get put into a different folder. ...
      A clear list of any files that may have been changed for *any* reason (Cryptolocker anyone?)

      +1 Clever.

    • Re:A paranoid setup (Score:4, Informative)

      by cas2000 (148703) on Tuesday December 10, 2013 @10:06PM (#45657065)

      good post, except for three details:

      1. if you're using ZFS on both systems, you're *much* better off using 'zfs send' and 'zfs recv' than rsync.

      do the initial full copy, and from then you can just send the incremental snapshot differences from then on.

      one advantage of zfs send over rsync is that rsync has to check each file for changes (either file timestamp or block checksum or both) every time you rsync a filesystem or directory tree. With and incremental 'zfs send', it only sends the incremental difference between the last snapshot sent and the current snapshot.

      you've also got the full zfs snapshot history on the remote copy as well as on the local copy.

      (and, like rsync, you can still run the copy over ssh so that the transfer is encrypted over the network)

      2. your price estimates seem very expensive. with just a little smart shopping, it wouldn't be hard to do what you're suggesting for less than half your estimate.

      3. if you've got a choice between hardware raid and ZFS then choose ZFS. Even if you've already spent the money on an expensive hardware raid controller, just use it as JBOD and let ZFS handle the raid function.

  • WinRAR isn't perfect, but it works on a number of platforms, be is OS X, Windows, Linux, or BSD. This provides not just CRC checking, but one can add recovery records for being able to repair damage. If storing data on a number of volumes (like optical media), one can make recovery volumes as well, so only four CDs out of a five CD set are needed to get everything back.

    It isn't as easy as ZFS, but it does work fairly well for long term archiving, and one can tell if the archive has been damaged years to d

  • BTRFS and ZFS both do checksumming and can detect bit-rot. If you create a RAID array with them (using their native RAID capabilities) they can automatically correct it too. Using rsync and unison I once found a file with a nice track of modified bytes in it -- spinning rust makes a great cosmic ray or nuclear recoil detector. Or maybe the cosmic ray hit the RAM and it got written to disk. So, use ECC RAM.

    But "bit-rot" occurs far less frequently than this: I find is that on a semi-regular basis my ent

  • There's really no way around it. Storage media is not permanent. You can store your important stuff on RAID but keep the array backed-up often. RAID is there to keep a disk*N failure from borking your production storage and that's it. If you can afford cloud storage, encrypt your array contents (encfs is good) and mirror the contents with rsnapshot [rsnapshot.org] or rsync [samba.org] to amazon, dropbox, a friends raid array, whatever. SATA drives are cheap enough to keep a couple sitting around to just plug in and mirror to every w

  • I have been going through this issue myself. In a single weekend of photo and video taking, I can easily fill up a 16 gig memory card, sometimes a 32 gig. About 10 years ago I lost about two years worth of pictures due to bitrot (ie my primary failed, and the backup DVD-Rs were unreadable after only a year - I was able to recover only a handfull of photos using disc-recovery software). Since then, I kept at least three backups, and reburning discs every couple of years. But if I can fill up two BD-Rs in a w

  • by dshk (838175)
    It might be an overkill, but the open source backup software Bacula has a verify task, which you can schedule to run regularly. It can compare the contents of files to thir saved state in backup volumes, or it can compare the MD5 or SHA1 hashes which were saved in the previous run. I assume other backup software has similar features.
  • Have mercy! (Score:5, Funny)

    by c0d3g33k (102699) on Tuesday December 10, 2013 @02:32PM (#45652945)

    We have hundreds of thousands of family pictures and videos we're trying to save using this advice. But in some sparse searching of our archives, we're seeing bitrot destroying our memories. With the quantity of data (~2 TB at present),

    As the proud owner of dozens of family photo albums, a stack of PhotoCDs etc which rarely see the light of day, the bigger challenge is whether anyone will ever voluntarily look at those terabytes of photos. Having been the victim of excruciating vacation slide shows that only consisted of 40-50 images on a number of occasions (not to mention the more modern version involving a phone/tablet waving in my face), I can only imagine the pain you could inflict on someone with the arsenal you are amassing.

  • by TheloniousToady (3343045) on Tuesday December 10, 2013 @02:44PM (#45653063)

    Don't forget the old-fashioned method: make archival prints of your photos and spread copies among your relatives. Although that isn't practical for "hundreds of thousands", it is practical for the hundreds of photos you or your descendants might really care about. The advantage of this method is that it is a simple technology that will make your photos accessible into the far future. And it has a proven track record.

    Every other solution I've seen described here better addresses your specific question, but doesn't really address your basic problem. In fact, the more specific and exotic the technology (file systems, services, RAID, etc.) the less likely your data is to be accessible in the far future. At best, those sorts of solutions provide you a migration path to the next storage technology. One can imagine that such a large amount of data would need to be transported across systems and technologies multiple times to last even a few decades. But will someone care enough to do that when you're gone? Compare that to the humble black-and-white paper print, which if created and stored properly can last for well over a hundred years with no maintenance whatsoever.

    Culling down to a few hundred photos may seem like a sacrifice, but those who receive your pictures in the future will thank you for it. In my experience, just a few photos of an ancestor, each taken at a different age or at a different stage of life, is all I really want anyway. It's also important to carefully label them on the back, where the information can't get lost, because a photo without context information is nearly meaningless. Names are especially important: a photo of an unknown person is of virtually no interest.

    Sorry I don't have a low-tech answer for video, but video (or "home movies", as we used to call it) will be far less important to your descendants anyway.

    • by Grizzley9 (1407005) on Tuesday December 10, 2013 @03:50PM (#45653783)
      Agreed. Looking through a family picture album from the late 1800's I realized my hundreds of GB's of current family pics will likely die with me. There are a ton of family images and a select few family pics may be copied by progeny but unlike their printed counterparts, there are no names or locations on many (and sometimes dates if the exif gets corrupted or overwritten).

      So what good is a bunch of pics or videos of long past events except to the person involved? Digital images today, unless meticulously managed and edited do little good for historical purposes like the photo album of yesterday. Especially if those are locked away in some online archive that may or may not be easily accessed if the owner can keep up with format and company changes over the decades they will have them and descendants know where they are.
  • by Rob the Bold (788862) on Tuesday December 10, 2013 @02:46PM (#45653073)

    A family archive maintained by the "tech guy/gal" in the family is also subject to failure from death or disability or the aforementioned maintainer. Any storage/backup solution should therefore be sufficiently documented (probably on paper, too) that the grieving loved ones can get things back after a year or two of zero maintenance and care of the system. That would also imply eschewing home-brew type systems in favor of using standard tools so a knowledgeable tech person not familiar with the creator's original design can salvage things in this tragic but possible scenario. Document the system so even if the family can't do it themselves, and an IT guy has to be contracted to resurrect the data, he'll have the information needed to do so.

    Any system sufficiently dependent on regular maintenance by just one particular person is indistinguishable from a dead-man time-bomb.

  • by neo-mkrey (948389) on Tuesday December 10, 2013 @02:48PM (#45653101)
    100,000s -- like 300,000? More? How many of them will you actually ever look at again? Less 1% I'm guessing. Here's my advice (and it's what I do), step 1) when transferring pics to your computer, delete the ones that are out of focus, bad lighting, framed poorly, etc. This is about 15%. Step 2) once a month, go through the photos you have taken the previous month and delete those that just don't mean as much anymore (if they have decreased in emotional value in 30 days, just think how utterly worthless they would be in 5 years?). This takes care of another 30%. Step 3) once every 3 months, I and my wife pick the cream of the crop for physical prints. This is about 10%. These are stuck into photo albums, labeled and kept in a fire proof safe in our basement. So 200 photos a month, gets reduced to ~100, and then 10 per month are printed. YMMV
  • by carlcmc (322350) on Tuesday December 10, 2013 @02:51PM (#45653127)
    Convert photos to DNG in Adobe Lightroom and use the ability for it to check for file changes. Store on a Drobo with dual disk redundancy.
  • by rainer_d (115765) on Tuesday December 10, 2013 @02:56PM (#45653163) Homepage
    but there is a catch: to reliably detect bit-rot and other problems, you also need server-grade hardware with ECC.
    ZFS (especially when your dataset-size increases and you add more RAM) is picky about that, too.
    Bit-rot does not only occur in hard-disks or flash.
    You should really, really take a hard look at every set of photos and select one or two from each "set", then have these printed (black and white, for extra longevity).
    If this results in still too many images, only print a selection of the selection and let the rest die.
  • The solution to Bitrot and reading of old media is very simple and honestly I don't know why it comes up so much. Storage is DIRT CHEAP. 2TB of Data is NOTHING, you can get a 3TB+ external drive for $100 or even less on sale. Buy 3 drives, keep 1 in SAFELOCATION*, Back up to 1 drive every even week, and the second one every odd week, and once a month swap the one in the SAFELOCATION out for a local one and repeat the cycle. Increase or decrease frequency of SAFELOCATION swapping depending on level of paran

  • We wrote our own parallel filesystem to handle just that. It stores a checksum of the file in the metadata. We can (optionally) verify the checksum when a file is read, or run a weekly "scrubber" to detect errors.

    We also have Reed-Solomon 6+3 redundancy, so fixing bitrot is usually pretty easy.

  • As other people have mentioned, a lot of these errors can occur while you are actually copying the files. I have copied files and immediately executed md5sums on the source and dest files only to find differences. Unfortunately, I didn't start this practice until after I had to restore from backup only to find that some of the backup files were corrupted.

    And given that this seems to be a common problem, why in the holiest of hells does the cp command not have a verify option? Yeah, it's easy enough to
  • I've used ZFS under Linux for 5 years now for exactly this sort of thing. I picked ZFS because I was putting photos and other things on it for storage that I wasn't likely to be looking at actively and wouldn't be able to detect bit-rot until it was far too late. ZFS has detected and corrected numerous device corruption or unreadable issues over the years and corrected them, via monthly "zpool scrub" operations.

    I have been backing these files up to another ZFS system off-site. But now I'm starting to loo

  • by MooseTick (895855) on Tuesday December 10, 2013 @04:15PM (#45654049) Homepage

    Here's a cheap easy solution (assuming you can write some basic scripts)

    1. Start by taking an MD5 of all your pics.Save the results.
    2. Backup everything to a 2nd drive. Take MD5s and be sure they match using basic scripts.
    3. Perioducally scan drive 1 and 2 and compare against their expected MD5 value. If one has changed, copy it from the other (assuming it is still correct)

    You could expand this with more drives if you are extra paranoid. You could do this cheap, check regularly, and know when bitrot is happening.

  • I think that when writable CDs first came out, we thought that they would last forever. And in some sense they do last long enough. The other day I found a CD binder full of games and a few backups from 1996. The most surprising of all was a collection of photos that I thought had been long lost, and with a little rsync running over and over and over, I got all the files off intact and saved them to my Flickr account.

    The most important thing to understand, I think, is that we have to look at digital storage

1 Billion dollars of budget deficit = 1 Gramm-Rudman

Working...