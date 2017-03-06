Ask Slashdot: Best File System For the Ages? 98
New submitter Kormoran writes: After many, many years of internet, I have accumulated terabyte HDDs full of software, photos, videos, eBooks, articles, PDFs, music, etc. that I'd like to save forever. The problem is, my HDDs are fine, but some files are corrupting. Some videos show missing keyframes and some photos are ill-colored. RAID systems can protect online data (to a degree), but what about offline storage? Is there a software solution, like a file system or a file format, specifically tailored to avoid this kind of bit rot?
Dude, if your hard drives were fine, your files wouldn't be corrupted. Keep RAID backups if you want a solution. The file system doesn't make a Fing difference.
I prefer to chisel the 0s and 1s into a stone tablet. Very secure, no bit rot.
Reasonably akin to that, and a helluva lot more convenient, we have:
https://en.wikipedia.org/wiki/... [wikipedia.org]
Now, finding a READER in a thousand years...
bit rot (Score:4, Informative)
zfs
It's pretty sad that in this day and age, only one person has highlighted the relevance of ZFS here, and they're an AC. Someone mod parent up. RAID is borderline necessary if you don't have multiple backups, (to recover from in the event of random corruption caused by gamma rays from outer space or a butterfly flapping their wings on another continent or whatever) but so far as I know, only ZFS has built-in checksumming to detect/prevent the data corruption in the first place.
No, RAID Is not sufficient to p
You really need something like ZFS which puts a checksum on every file and verifies it, so if it does get an error it can resolve it.
ZFS also has its own flavors of RAID 1/5/6.
Tell me about a usable linux distribution that has a fully working zfs implementation.
I should have an answer for you shortly. Say, in half a decade or so, give or take.
Whose to say zfs will be around in a few decades?
The real solution here is relatively frequent backups, multiple copies in different filesystem and physical formats (ie. flash, hard drive, optical). Over time you just keep moving your file store to the new mediums. I have files that are over twenty five years old now, some of them coming from DOS and Windows 3.1, others from my old original Slackware 3 installs. Along the way some of those files have been on CD-Rs, DVDs, early USB thumb drives, various hard
zfs
ZFS is a pretty good solution. Multiple NAS ZFS systems [freenas.org] with snapshots and replication are even better.
I personally like XFS in production (including LVM), but ZFS is hard to beat if bitrot is your #1 concern.
Like all hardware, disk drives have two states - failed and going to fail. Bitrot will also occur with long term storage, whether you notice or not.
A self-healing file system with substantial redundancy capabilities like ZFS is the obvious answer.
However, there are many ways to configure ZFS, and some configurations have better redundancy than others. A misconfigured system would be worse than useless because of the false sense of security. Exactly how many terabytes of data you have also matters for cre
Terabytes over decades on NTFS (Score:3)
Not a single example in 30TB over 20 years? I think you should check again.
No kidding.
I have two raids, clones of each other. On the weekends, during off-hours, I run md5sums on them. Scripts automatically compare to prior versions.
So far the newer the raid card, the less I've seen. A lot of older (eg, 10, 15 year old) raid cards didn't patrol read automatically, or do consistency checks automatically. My current cards do, and I've scheduled them for weekly.
With *that*, I see files 'caught' via bitrot once in a while... and corrected. Maybe, 2 or 3 files per year, out of bill
Schrodinger's bit rot. If you never look in the box again after putting the cat in it, you can pretend it lived forever.
I tried downloading an old attachment (6-7 years ago now) from my gmail account but the attachment is corrupted. No matter how many times I download it or to what computer, it's corrupted. I wonder what Google is using?
I tried downloading an old attachment (6-7 years ago now) from my gmail account but the attachment is corrupted. No matter how many times I download it or to what computer, it's corrupted. I wonder what Google is using?
What type of file is it? It might be a media format the player software no longer recognises (find an older player). Or if it is an exe it might be a 16 or 32 bit exe that won't run in a 64 bit environment. (find an older operating system). If it's not confidential, could you post a link so we can try it?
Is this possible (Score:1)
Is this even possible long term? What would have happened if you stored all of your information on PATA drives 10 years ago, its rare to find a motherboard with PATA on it now, yes there are converters and 3rd party PCI cards, but those are eventually going to dry up too.
Now, say you choose SATA, what happens when M2 becomes the defacto standard? So, why dont you choose M2? What happens when M2 is phased out?
It is not just the file system and the data you need to think about, its the physical hardware too.
That's easy: use an external USB controller. You can still buy cheap PATA-USB interfaces, and of course SATA and M.2.
USB has been around 20 years, and it could be another 20 before we lose USB 2.0 / 3.0 compatibility.
USB has been around 20 years, and it could be another 20 before we lose USB 2.0 / 3.0 compatibility.
Before that we had FireWire 400/800 and SCSI I/II/III. Won't be long before Apple obsoletes USB 1/2/3 for something with a much smaller connector.
Is this even possible long term? What would have happened if you stored all of your information on PATA drives 10 years ago, its rare to find a motherboard with PATA on it now, yes there are converters and 3rd party PCI cards, but those are eventually going to dry up too.
Now, say you choose SATA, what happens when M2 becomes the defacto standard? So, why dont you choose M2? What happens when M2 is phased out?
It is not just the file system and the data you need to think about, its the physical hardware too. With the rate things change in hardware, and connecting that hardware to other hardware, its unrealistic that you could expect to be able to use your current storage media in 10 years, let alone 20, 30 or 40 years.
This is the problem with maintaining your own hardware, and a really useful use case for cloud storage, so long as you can trust the provider to keep the hardware up to date while your files stay clean, private and available.
This is the problem with maintaining your own hardware, and a really useful use case for cloud storage, so long as you can trust the provider to keep the hardware up to date while your files stay clean, private and available.
If you want to keep your data private, get it off the Internet. No cloud provider can guarantee your data will stay private, much less clean and available.
I've had a theseus' ZFS pool that I started years ago on a set of PATA drives. RAID-Z2 on OpenSolaris. It's since moved to SATA drives, been expanded a few times, moved from Debian to FreeBSD to now FreeNAS.
Setup a pool with the level of redundancy you need and as technology changes use a system compatible with the old and new tech and just replace drives as needed.
Tape suffers from bit rot.
And tape standards themselves also suffer form obsolescence. QIC-80 format, anyone?
Clay pots in the desert (Score:2)
It doesn't have to be crockery. Flat clay tablets work fine too, if you don't bomb them.
LOL this... Had someone a while back want data stability for a millenia, including the system to read the data. The conclusion we came to was carve it in marble or in fired ceramic, including the instructions for building the data reader in plain text.
HDDs are NOT fine (Score:3)
If the bits on your drive are changing while the drive is offline, that isn't a filesystem issue. A filesystem issue would be if your OS wrote the wrong information to the drive, but that can't happen with an offline drive.
It looks like there are (at least) two with CRC: zfs and btrfs. Here's info for btrfs CRCs: https://en.wikipedia.org/wiki/... [wikipedia.org]
You'd still need a backup or RAID solution to replace a bad black.
Tape drives will store your stuff for upwards of 10 years, up to 30 if you store them really well. They're also available in large sizes and is pretty cheap (about a cent per GB).
And if you believe any of that, I have a very interesting investment offer for you...
RAID (Score:3)
Still RAID is a good choice for your redundancy of choice.
Or paper: http://ollydbg.de/Paperbak/#1 [ollydbg.de]
Not entirely true. Most modern hardware raid cards will automatically perform consistency checks, and additional block checks often. Mine do so weekly, and were defaulted to monthly.
It's true that if the raid card + drives never power up -- no go. But, if the computer is just on? The raid card and drives powered? Modern hardware raid will automatically scan the entire drive and fix issues.
Not all RAIDs are equal (Score:2)
How about DNA? (Score:2)
"Our colleagues from ETH Zurich did a test and found that the half life of DNA after a chemical treatment can be 4000 years in room temperature, much better than my CDs!"
yeah, but now you can't update wikipedia til you have a kid. And if mom and dad have conflicting edits, watch out...
Error correction codes. PAR2, btrfs, partitions,VM (Score:2)
The magic phrase to Google is "error correction codes" (ECC).
PAR2 uses Reed-Solomon error correction. parchive is the ECC file format specification, for Linux you will want PyPar or par2tbb, and on Windows you use a GUI called QuickPar.
Btrfs can be set to use ECC on a single disk.
You can slice a single disk into partitions and then use RAID1 or LVM mirroring, or RAID5 or RAID6. LVM can alao be useful to divide (and combine) any number of drives into any number of volumes, then you can RAID across the volum
ext4 (Score:2)
ZFS and lots of redundancy (Score:3)
ZFS will guard against bit rot. That's not enough. RAID isn't enough. You need redundancy outside your home or office. Cloud maybe expensive for the amount of data you have, but Amazon S3 maybe the most affordable in that range. You could get S3 for maybe $15-20 a month if you have a terabyte of data. If that's cost prohibitive, rotate external drives regularly and keep one at work. You'll lose very little data since you're archiving things.
ZFS will guard against bit rot. That's not enough. RAID isn't enough. You need redundancy outside your home or office. Cloud maybe expensive for the amount of data you have, but Amazon S3 maybe the most affordable in that range. You could get S3 for maybe $15-20 a month if you have a terabyte of data. If that's cost prohibitive, rotate external drives regularly and keep one at work. You'll lose very little data since you're archiving things.
AWS S3 pricing is $0.023/GB or $23/TB/month.
But for infrequently accessed data, AWS Glacier offers the same durability of S3 for only $0.004/GB or $4/TB/month. There's an infrequent access tier in between those two for $12.50/TB/month.
Volume discounts kick in above 50TB.
Any Linux FS (Score:3)
I'd go for any Linux file system because Linux is the platform that evolves the least. It's still in the 90s so in 2037 it will still be current.
(Watch out of the hater storm! Here they come!)
But it's kinda true if you omit the snideness of the first statement. Because it's maintained by the user base, it's less likely to "devolve" into something incompatible due to market pressure. I, myself, would go for an Apple file system but Apple isn't so keep in keeping the Mac current and it doesn't bode well for the future. There might be a great change in the horizon.
"some photos are ill-colored" (Score:2)
That's a well known problem to photographers, photos colors are affected over time. Keep the photo negatives in a safe place!
That struck me as odd too. If the colours in digital photos or movies don't look right, I would try to display them with different software. It's more likely that the software that displays is reading and interpreting the format of the file differently than bit-rot would only affect the colour pallette and not make the whole file unreadable.
Backblaze: SMART metrics of imminent failure (Score:2)
Backblaze made a report of what SMART drives they see indicating imminent drive failure: https://www.backblaze.com/blog... [backblaze.com]
Lots of parity (Score:2)
No media is perfect. There's just varying likelyhood of error rates over time, depending on the quality of the media. Without knowing ahead of time whether a specific piece of media is going to fail, the question needs to change from "How do I keep it from getting corrupted" to "How do I mitigate eventual corruption?"
And the question basically boils down to one answer: redundency.
Off the top of my head, I can think of three things you can do, and these are not mutually exclusive.
1. Multiple copies of dat
ZFS (Score:2)
"Is there a software solution, like a file system or a file format, specifically tailored to avoid this kind of bit rot?"
Yes, ZFS is specifically tailored for this. Configure a zpool running RAID-Z2 with a hot spare or RAID-Z3. Half a dozen 6TB or 8TB disks should suffice.
Set it to auto-scrub regularly. Send logs and warnings to your email, and pay attention to them. (This is the hard part). Especially pay attention if they stop arriving. (This is even harder).
I have used Nexenta for some time, but the free
Online (Score:2)
'Forever' is a long time.
'Offline' is difficult to deal with long-term (i am thinking decades to centuries) such is the nature of technology and the lack of any real history we have of digital data management,
Personally I would say the best bet is keeping your data 'live' online to some extent, it is the only real way to monitor and control the inevitable decay.
Basically your data's lifespan is related to how long you can convince someone to care for it for you.
Different objectives mean different solutions (Score:2)
Pick your poison:
- Tape: inexpensive and slow, require frequent testing (backup we do, it's restoration the problem!), usually unreadable after 6 to 12 months or less (that's in production people).
- WORM: more expensive than tape and just as slow, work well in the medium term (meaning 10 years top).
- XFS NAS: faster than the above, require good hardware and a bit more work than either tape or worm. Don't forget to setup replication to multiple systems. May suffer from bitrot in the long term (checksumming/h
this doesn't make any sense (Score:2)
Use permanent storage. (Score:2)
HDDs will die. If you want something that will last for many decades or even centuries without getting corrupted then you need to stop using a volatile filesystem. The best option is to go with write once media. The best option I know is M-DISC.
M-DISC's design is intended to provide greater archival media longevity.[3][4] Millenniata claims that properly stored M-DISC DVD recordings will last 1000 years.[5] While the exact properties of M-DISC are a trade secret,[6] the patents protecting the M-DISC technology assert that the data layer is a "glassy carbon" and that the material is substantially inert to oxidation and has a melting point between 200 and 1000 C.[7][8] -- Wikipedia
HDDs will die. If you want something that will last for many decades or even centuries without getting corrupted then you need to stop using a volatile filesystem. The best option is to go with write once media. The best option I know is M-DISC.
M-DISC's design is intended to provide greater archival media longevity.[3][4] Millenniata claims that properly stored M-DISC DVD recordings will last 1000 years.[5] While the exact properties of M-DISC are a trade secret,[6] the patents protecting the M-DISC technology assert that the data layer is a "glassy carbon" and that the material is substantially inert to oxidation and has a melting point between 200 and 1000 C.[7][8] -- Wikipedia
Did you even bother reading the wiki you linked to or did you just copy and paste the first paragraph ?
"However, according to the French National Laboratory of Metrology and Testing at 90 C and 85% humidity the DVD+R with inorganic recording layer such as M-DISC show no longer lifetimes than conventional DVD±R.[11]"
Two distinctly different problems (Score:1)
It may have nothing to do with bits. It's possible the problem is a media player and/or driver compatibility issue or bug. I've seen where one media player/displayer can display an image or video fine, but another gags on it or distorts it. Probably a bug in the encoder and/or decoder.
As far as backups, make at least 2 copies. Bit-error-recovery schemes will usually require more storage space such that it's probably less hassle and more "insurance" to keep 2 regular copies rather than one copy with some fan
Snapraid (Score:2)
ZFS is nice I use it it makes assumptions about sane gear that are not safe on desktop grade hardware. BTRFS I also use works great. But for your specific use case snapraid is the thing to use. By that use case things that never change a big pile of files you keep adding to. Mind you your going to have to replace drives over time.
What you might need (Score:2)
A archival optical format. M-DISC DVDs and Blu-ray are theoretically able to retain data for 1000 years. And DVD uses some error correcting codes already, Reed-Solomon I believe.
An SSD is a bad choice for archival, in some cases MLC Flash can decay and accumulate errors in 3 months while unpowered [extremetech.com].
For a file system that is likely to be understood in the distance future, ISO 9660 with no file larger than 2 GiB should do the trick.
Packing your data into a custom archive file format that has more sophisticated
Here's how I'd do it. (Score:2)
1. Add lots of redundancy in the form of PAR2 files.
2. Store the whole lot as a tar format, dumped to the drive as a block device. This format is so simple that a future programmer will have no trouble reverse-engineering it, even if all documentation has somehow been lost, and there are no key structures which will render the whole thing impossible to read if lost. Just to be sure, the first thing going on there is a copy of the tar format specification.
3. Include also a copy of the par2 software for sever
RAID if you must, but cloud is better (Score:2)
Just RAID it (preferably mirroring)store multiple redundant copies, physically separated. Either use a checksumming filesystem (i.e. zfs) or make your own checksums so you can recognize bitrot.
But you'll never know when things have degraded beyond recovery,
Unless you're prepared to regularly validate that the data is still readable, you'd be better off storing the data at any major cloud vendor and let *them* verify integrity over time. Or better, mirror the data across multiple cloud providers.
My most im
Filesystems and hardware that mitigate bit rot (Score:1)
If I understand you correctly, you are asking what filesystems can error-correct in the face of physical bit rot.
I don't know of any commonly-used "disk-type" (local, not specifically designed for archival/offline media) file systems that have checksumming or RAID-style redundant data within the filesystem itself. Some distributed/clustered file systems have features like this, but they aren't well suited for offline storage in the way that you are thinking about (or, when used for offline storage, the red
Are you sure it's bit-rot? (Score:1)
It may be that the codec you are using now isn't bug-for-bug compatible with the codec that was used to store the file.
It's also possible that the file was saved in a "not quite industry standard format" but that it would look fine on vintage hardware running a vintage OS with vintage device drivers and vintage software, but today's hardware and software interprets these "not quite industry standard-format" files in a way that exposes their flaws.
Got a Pentium II computer and a copy of Windows 98 in the bas
Reed-Solomon Erasure Coding (Score:2)
How about getting rid of it? (Score:2)
You've got terabytes of information you will never access again. How about just getting rid of most of it? Pick some subset you want to keep and then buy 3 HDDs and create triple copies of it Repeat this every year and you'll probably not lose any of the information.