Developer Shares A Recoverable Container Format That's File System Agnostic (github.com) 133
Long-time Slashdot reader MarcoPon writes: I created a thing: SeqBox. It's an archive/container format (and corresponding suite of tools) with some interesting and unique features. Basically an SBX file is composed of a series of sector-sized blocks with a small header with a recognizable signature, integrity check, info about the file they belong to, and a sequence number. The results of this encoding is the ability to recover an SBX container even if the file system is corrupted, completely lost or just unknown, no matter how much the file is fragmented.
Nifty (Score:2)
Thanks, looks interesting. I can see some applications for use in long term storage... it's better to get some data back rather than lose it all.
why? (Score:1)
That's an interesting property, but what's the use case?
How often does your filesystem get corrupt and instead of restoring from backups, you curse the fragmented tar file that can't be reassembled?
How practical is it to keep files in an sbx container rather than extracting them? Can apps read files inside an sbx container?
Re:why? (Score:5, Interesting)
That's an interesting property, but what's the use case?
I can't say I know them all, or even the best/killer ones, but I listed some on the readme. Probably the most immediate/interesting application would be on a digital camera, for photos/video.
Can apps read files inside an sbx container?
Yes. The blocks are of a fixed size, so the format is seekable and reading from it is far simpler than, say, reading from a ZIP file.
Re:why? (Score:4, Informative)
So the only failure mode this protects from is corruption of metadata while every data block remains intact. On any sane filesystem, that sounds useless: the only cases this might happen are filesystems that can't handle unclean shutdown (FAT, ext2) or the disk lies about barriers. And those cameras that still use FAT have software you can't update, so you can't install that SBX thingy -- if you could, you'd be better off switching to a better filesystem.
In its present state, I'd suggest you scrap the whole project, it's a waste of time.
On the other hand, it would be an entirely different story if you added some form of erasure code that operates on amounts of data bigger than a single sector (most storage devices already have per-sector erasure codes).
Re: (Score:3)
Indeed. Give it built in redundancy so that the data could be recovered reliably after almost any not-completely-terminal disk failure, and *then* you'd have something I'd be extremely interested in. Can't tell you how much archived data I've lost over the years due to "bit rot"
Yeah, I should have had it archived in three different locations, but who actually does that for personal data?
Re: (Score:3)
Yeah, I should have had it archived in three different locations, but who actually does that for personal data?
From what I've seen, a typical intelligent person learns about the importance of backups after around 30 data loss events.
Re: (Score:2)
Is much, much, much more likely to nuke some data than it is to nuke the filesystem's metadata.
Re: (Score:2)
https://www.usenix.org/legacy/... [usenix.org]
Re: (Score:2)
Re:why? (Score:4, Informative)
PAR (or a RAR archive + recovery records, etc.) try to address the problem of losing some small parts of a file (due for example to physical errors), using some amount of redundancy. SeqBox try to address the issue of identifying and reassembling all part of a file, when they are all still on the physical media, but without the file systems indexes / structures to locate them (es. after a quick format, zero writes on the first sectors, etc.).
If you combine the two, creating an SBX container of a RAR + recovery records for example, you get both qualities.
Re: (Score:3)
A more interesting feature might be a firmware update to a spinning disk that cuts the drive capacity exactly in half. Basically a hard drive is going to have probably double the drive read heads as platters. Just store a copy of the data on a different head/platter surface, ideally, if possible, on a different platter. Add the firmware update to deal with the details. 1/2 the capacity, 1/2 the data rate, but some degree of added redundancy.
You're looking for btrfs -dDUP then. It does what you describe, and unlike any other filesystem except ZFS, the data is checksummed so it can be recovered even in case of a silent corruption (which happens way more often than people notice).
Obviously you want the same for metadata (-mDUP) which happens to be the default for rotational media. This pretty much renders the format described in the article pointless -- metadata corruption due to hardware failures that don't kill the entire disk is pretty much
Not to seems like a philistine... (Score:2)
...but this is better than a backup, how, exactly?
Re: (Score:3)
Re: (Score:2, Interesting)
Flash storage uses wear leveling, which can fail. If that happens, the flash chip can mostly still be read, but all erase blocks on the chip will be in a "random" order compared to the lost logical wear leveled position. Then you want a way to recover the logical order of the blocks, which this format allows you to do.
Re: (Score:3)
"A camera that writes to an SD card using a journaling filesystem?"
(To get the right effect, read aloud in the tone of voice one might use for saying, "A planet where apes evolved from men?")
Re: (Score:3)
Re:Not to seems like a philistine... (Score:4, Informative)
Re: (Score:2)
Re: (Score:2)
How are you going to recover an SBX file containing a JPG file if the batteries give up when writing?
If you're saying the batteries give up after writing the data but before updating the filesystem's metadata, then any recovery program that supports that filesystem will be able to recover the data. And really, this is a problem with older shit like FAT and NTFS and ext2 (for which there are plenty of tools available).
Re: (Score:2)
That's a silly strawman it really is, I don't hear anyone suggesting you can magically recover a file that hasn't even been written yet.
Fail? (Score:2)
What if your file system and/or hardware uses a different sector size? Didn't those change size over the last decades?
Re:Fail? (Score:5, Informative)
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
So recovery would require reading all disks to reconstruct one object.
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Amiga OFS has 488 _user_ bytes per block. The rest is the block header, which can be used to, I don't know, recover blocks even when part of the disk is lost, for example. The actual block size was still 512 bytes like everybody else uses, because that's something the hardware generally supports.
https://en.wikipedia.org/wiki/... [wikipedia.org]
Re: (Score:2)
Oh sorry, I thought he meant the other way around... That his format is a container around existing formats.
What about files stored in MFT? (Score:2)
Re: (Score:3)
Re: (Score:3)
Next question - for the encoding of the file, you're putting a 16-byte header in front of every blocksize-piece of data, correct? If that's the case, and if you're storing the entire block of original data after that pre-pended header, then how are you assuring that the spill-over piece of data will be on a contiguous block on the disk? For example, say you're encoding a single 4096 byte file using a 4K blocksize. The SBX-eq
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
About the the separate file with hashes, instead, the main issue would be that if the file system is in an inconsistent / damaged state, that file too would be inaccessible. So it would need to be kept somewhere else, and that would complicate things a lot.
Re: (Score:1)
Re: (Score:2)
As for the SBX of hashes not being locatable due to metadata corruption, you can avoid that by applying a header to the SBX blocks themselves.
OK, I see what you mean.
Re: (Score:2)
Re: (Score:2)
From what I understand his idea isn't to be hardware failure-proof, it's to be file system failure-proof.
Re: (Score:2)
but why? (Score:2)
I mean the chances of the filesystem being corrupted without the file itself also being corrupted seem small to none to me.
This would be great for SSDs (Score:5, Informative)
Unlike HDD controllers, SSD controller do wear-leveling, so there is no guarantee that your data will be written as as a contiguous block of memory (regardless of what the filesystem says), only that it will be in 4096 byte blocks. Recovering deleted data from a SSD is no simple task because it means you need to know or guess the controller behavior for wear-leveling in order to go back and find the order of previously written data. With this you would be able to just read the raw memory even after the controller has been reset and still be able to recover the data. I think it would be a nice option to have a filesystem be able to encode user files in something like this highly recoverable format. The only real problem is that the file has to be completely rewritten even if you only modify part it in order to differentiate the new version from the old version.
Re: (Score:2)
Re: (Score:2)
The memory used in SSDs are all 4096 byte blocks of NAND FLASH memory. 512 bytes is the sector size for HDDs... though they may have changed that in recent years.
Re: (Score:2)
Re: (Score:2)
Incorrect. Modern SSDs use large page NAND which has anywhere from 128kiB to 1MiB block sizes.
In NAND, you can only erase on block boundaries. However, when you write, you can write on page boundaries, of which there can be anywhere from 16 to 128 pages per block. Small page NAND (old NAND) had 512 byte pages (and typically 32 pages per block, givi
Re: (Score:2)
I only meant that "sectors" were 4096 bytes but thanks for the additional info. I suppose it only makes sense that they also use larger blocks in order to achieve higher read/write throughput rates.
Re: (Score:2)
Usually when an SSD fails, you get some stupid small device. Intel SSDs give you a 8MB hard drive named "BAD CONTEXT" which can't be read from or written to, and JMicron drives give you a 4GB drive named "JM-Loader 001".
When this happens, you don't get to see your actual disk sectors at all. Without access to the actual contents of your drive, having a container format that lets you recover data won't help.
Re: (Score:2)
I've not had that experience, however, it seems like a good reason to have an open source SSD controller firmware, so that you can force it to let you access it.
forward error correction? (Score:2)
It seems to me this would be a lot more useful if it directly incorporated forward error correction.
Re: (Score:2)
Re: (Score:2)
Since loss and recovery takes place at the block level, it's best if you arrange for error recovery (and compression) to take into account block boundaries.
What it does and why it's (partially) useful (Score:5, Insightful)
There is some confusion as to what this is actually doing.
Most filesystems have use special structures to store the name and location of your files on the drive. Directories, cluster bitmaps, etc etc. The reason why it's difficult at best to recover files from a hard drive when parts of the filesystem have been damaged is that it's difficult to identify where on your hard drive the files are. Besides the special filesystem directories, no where else stores information on what is stored where. If you lose the directory it's hard to tell one file's data from another on your hard drive.
That is where SBX comes in. What it does is make sure that every physical sector that stores data for a particular file is labelled with a number that identifies that file, and a sequence number so you can reconstruct what order that piece is in the original file. Really, for the amount of overhead, something like that should be embedded into every filesystem. Basically a distributed backup of all the filesystem metadata.
Some people are criticizing this that is solves non problems. I disagree. While it isn't the solution to global warming, it is both simple and clever (and will thus suffer from a lot of people who will disparage it out of a "well anyone could have thought of that" attitude). It won't save you from a full hardware crash. It won't save you from physically bad sectors in that file. What it will save you from is accidental deletion and from loss of the filesystem's metadata structures. How often does this happen? Twice to me from failures of a whole-disc-encryption system driver.
I wouldn't use this for every file, but for critical ones, sure. Why not. The problem is, where it is most useful, for very volatile files that change a lot (databases etc) between backups, is where it can't really be used until/unless different applications start supporting it. So it unfortunately has limited use in the places where it would really help the most. Like I said above, this sort of thing really needs to get rolled into a filesystem. The amount of overhead it costs is meaningless in today's storage environment.
Re: (Score:1)
Reiserfs already did this. You could break the filesystem and it can (in most cases) reconstruct the whole filesystem by scanning it for data.
An amusing bug, if you ran the filesystem reconstruction on a filesystem that contained other reiserfs filesystem images, it would turn into a mess. They fixed this by adding a mechanism to escape out data that looked like the magic metadata identifiers.
Deleted data was also ressurrected. Sometimes this was useful, sometimes not so much.
Re: (Score:2)
Re: (Score:3)
Re: (Score:3)
It would seem like it but: a) this doesn't need to be applied at a filesystem level and b) it isn't encumbered by licensing issues, a dead project, or an experimental filesystem, in respective order.
Okay so it is actually experimental, but not be filesystem wide it is also much simpler and able to contain failures.
Re: (Score:1)
it isn't encumbered by licensing issues, a dead project, or an experimental filesystem, in respective order.
The licensing issue can be disputed. Licensing issues are always a matter of use cases. In this case the license specifically says that it is provided without liability or warranty. While the author might be willing to provide those under a different licensing deal the same can be said for any other filesystem.
Primarily I would dispute you claim that this isn't an experimental filesystem. It claims to not be a filesystem but it fits every checkbox for one, except perhaps having to delegate block allocation
Re: (Score:2)
Re: (Score:2)
Use-case of distributed pieces? (Score:1)
Could this be also be used when the file contents are deliberately separated? Eg, distribute the file pieces (sectors?) to different audiences / storage locations, such that one has to get cooperation from all piece-holders to retrieve the net results? Eg: nuclear launch codes, and other less dramatic scenarios.
Re: (Score:2)
Apple Lisa/early-Mac "tags" on steroids (Score:1)
The Lisa and early Macintosh drives supported 532-byte sectors. The extra sectors were used for "tags" - basically a less-sophisticated version of this scheme and without the "block 0."
For details on why "tags" were eliminated, see Macintosh Technote #94, "Tags," by Bryan Stearns, November 15, 1986.
Re: (Score:2)
Re: (Score:1)
Yes, it was a bit of a common things on older system, from the times where the mass storage hardware was far from precise and reliable, to do things like check that the drive seek really landed the head in the requested track.
"Modern" (probably mid-1980s and newer) hardware had firmware that would do that for you, and for hard disks at least, the firmware started keeping its own meta-data of a sort so that as far as the computer was concerned, the error rate was acceptably low unless there was an actual bad spot on the drive or some other "hard" failure.
Forget Agnostic! (Score:1)
I want to storage that is File System Atheist!
(And would that be like Write Only Memory?)
Re: (Score:2)
Beware padding oracle with compression& encryp (Score:3)
Compression before encryption often results in a padding oracle or other problems. If you're designing a system that is supposed to be secure, avoid compression until you fully understand the issues. Avoid compressing and encrypting chosen plaintext at all - you'll never be sure you understand all of the issues with that.
Re: (Score:2)
Been struggling to understand this for the last 3 decades (not full time) ... :/
Re: (Score:2)
Re: (Score:2)
My comment doesn't directly relate to having the *two* compression steps. I probably should have replied to the same person you replied to.
Re: (Score:3)
There's no way it can. LUKS is great but it wastes tons of disk space on vms.
It can! Just turn on discard (and have the system inside issue trim commands). This does have an impact on encryption, though, which might or might not be acceptable for you: it is possible to tell used from unused disk space, which leaks information about usage patterns inside the VM.
Re: If you can compact encrypted images... (Score:5, Insightful)
If you can meaningfully compact *anything* that's encrypted, the encryption was improperly implemented. You *always* want to compact files prior to encryption, and a well-encrypted compressed file should be statistically indistinguishable from random noise.
Re: (Score:3)
Hash the block number with the password (Score:2)
Hash the password AND block number through a key-stretching routine to get the encryption key. It is important to avoid using the same key for all blocks. If different blocks are XORed with the same key, I can still see your penguin:
https://blog.filippo.io/the-ec... [filippo.io]
Re: (Score:2)
Also, that name, Seqbox (Score:2)
I wrote three pieces of software:
Strongbox
Throttlebox
Clonebox
Then you chose Seqbox. :)
Re: (Score:2)
Re: (Score:3)
You obviously tried to keep the per-block header small to minimize overhead. But that has caused questionable decisions that may make this format less useful than it could be.
It's surely a compromise, but I think it's pretty sensible for the present version (but some variations can surely be implemented as different versions, to better suite different scenarios).
Firstly, at 48 bits, the UID is a bit short. If UIDs are chosen randomly and with even distribution, there's a 1 in 1000 chance of a duplicate UID with just 750000 files.
That seems a bit off, 48bit assuming even distribution would give 281,474,976,710,656. But again, 750.000 files would seems an enormous number for the practical uses I was thinking about at the moment.
Secondly, the block sequence number is a 32bit value, so 4 billion blocks in a file max. With this format, files are limited to 2TB.
Yes, 2TB with the 512 block, or 16TB with 4K block. It's not good for everything but it's probably good for a lot of case
Re: A couple of problems (Score:2)
Thousands of images are often used in medical imaging for a single scan. I have a production filesystem with 2B images, over 200TB. 16TB file sizes aren't all that hard to come by either. Obviously there is also the birthday problem, ZFS alleviates it by using bit comparisons in combination with 128 bit checksums.
I'm baffled that 48 bit checksums is still considered good enough nowadays.
Re: (Score:2)