Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Data Storage Software Linux

A Short History of Btrfs 241

diegocgteleline.es writes "Valerie Aurora, a Linux file system developer and ex-ZFS designer, has posted an article with great insight on how Btrfs, the file system that will replace Ext4, was created and how it works. Quoting: 'When it comes to file systems, it's hard to tell truth from rumor from vile slander: the code is so complex, the personalities are so exaggerated, and the users are so angry when they lose their data. You can't even settle things with a battle of the benchmarks: file system workloads vary so wildly that you can make a plausible argument for why any benchmark is either totally irrelevant or crucially important. ... we'll take a behind-the-scenes look at the design and development of Btrfs on many levels — technical, political, personal — and trace it from its origins at a workshop to its current position as Linus's root file system.'"
This discussion has been archived. No new comments can be posted.

A Short History of Btrfs

Comments Filter:
  • Looks promising (Score:5, Informative)

    by PhunkySchtuff ( 208108 ) <kai@automatica.c[ ]au ['om.' in gap]> on Saturday August 01, 2009 @04:30AM (#28907305) Homepage

    This looks like a promising filesystem - as ZFS on linux is, at present, doomed to die an ugly death, btrfs looks to address a lot of the shortcomings of other filesystems and bring a clean, modern fs to linux. It goes beyond ZFS in some areas too, such as being able to efficiently shrink a filesystem, and keeps a lot of the cool things that ZFS made popular, such as Copy-On-Write.

    It looks like Btrfs also addresses some decisions that were made with the direction that ZFS would be going in, or how it would handle certain problems that now with hindsight behind the developers, they possibly would have done things differently.

    Apple are really struggling with ZFS, with it being announced as a feature in early betas of both Leopard (10.5) and Snow Leopard (10.6), as well as being there in a very limited form in Tiger (10.4) - maybe development on Btrfs will leapfrog ZFS for consumer-grade hardware and Apple can finally look at deprecating HFS.

    • Re: (Score:3, Informative)

      by dirtyhippie ( 259852 )

      ... but btrfs is GPL. Therefore Apple can't use it, unless perhaps they are able to work out licensing from Oracle.

      • by am 2k ( 217885 )

        Or implement it themselves. It's only a spec after all.

      • Re:Looks promising (Score:5, Informative)

        by PhunkySchtuff ( 208108 ) <kai@automatica.c[ ]au ['om.' in gap]> on Saturday August 01, 2009 @06:31AM (#28907703) Homepage

        Apple has, and does, use GPL'd code and complies with the terms of the license.

        Take, for example, WebKit, which is a fork of KHTML. It's now released as LGPL:
        http://webkit.org/coding/lgpl-license.html [webkit.org]

        This code powers the browser that Apple ship with Mac OS X, Safari - which is arguably one of the most important pieces of code in the whole OS.

        As a result of it's quality, speed and standards adherence, it's now used by companies like Nokia and Adobe...

        • Re:Looks promising (Score:4, Informative)

          by TheRaven64 ( 641858 ) on Saturday August 01, 2009 @08:03AM (#28908077) Journal

          The GPL and LGPL are very different. The LGPL does not affect any code beyond that originally covered by the license. You can link LGPL'd WebKit against proprietary-licensed Safari with no problems.

          Apple also ship GPL'd software like bash, but they don't link it against any of their own code.

          Linking GPL'd code into the kernel would require the rest of the kernel to be released under a license that places no restrictions that are not found in the GPL. That's not a problem for Apple's code; they own the copyright and they can release it under any license they choose. It would be a massive problem for third-party components. DTrace, for example, is heavily integrated into OS X's Instruments developer app and is CDDL (GPL-incompatible). Various drivers are under whatever license the manufacturers want, and are mostly GPL-incompatible. A GPL'd component would need to be very compelling to make Apple rewrite DTrace, most of their drivers, and a lot of other components. Btrfs is not this compelling. Even if Btrfs were sufficiently good, it would take less effort for them to just completely rewrite it than to rewrite all of the GPL-incompatible components.

          • Damn! That sucks then...

      • Someone has little understanding of the GPL. Apple need not open source it's kernel in order to use a kernel module. Things might be simpler if MacOS were GPL'd, but the use of GPL'd code within the OS is quite possible. As someone else already points out, Microsoft makes use of GPL'd code.

        • by dgatwood ( 11270 )

          Unless something changed very recently, Mac OS X does not support using filesystem kernel extensions as boot devices, and the booter has no mechanism for extending it to read new filesystems, either. Thus, if it isn't linked into the kernel and understood by EFI, you can't boot from it. That pretty much puts the kibosh on ext4 except as a shared data partition, and if you do that, you might as well use MacFUSE and never have to touch the Mac OS X kernel at all....

      • by cowbutt ( 21077 )

        Apple are also welcome to study the GPLed source of btrfs and develop their own independent, but compatible implementation.

    • Apple are really struggling with ZFS, with it being announced as a feature in early betas of both Leopard (10.5) and Snow Leopard (10.6), as well as being there in a very limited form in Tiger (10.4)

      It's also available on 10.5/6 [macosforge.org] with some limitations. It's not marketed because it's not quite feature-complete, but it works in OS X and on FreeBSD. There's almost no chance of Apple adopting btrfs in the OS X kernel though, because the GPL is incompatible with the license of a number of other components.

  • So, (Score:2, Insightful)

    by Josh04 ( 1596071 )
    Is this ever going to replace ext4? The ext series of file systems are 'good enough' for most people, so unless it has some epic benchmarks I can't imagine a huge rush to reformat. Maybe that's what drives file system programmers insane. The knowledge that for the most part, it's going nowhere. FAT12 is still in use, for Christ's sake.
    • Re:So, (Score:5, Interesting)

      by PhunkySchtuff ( 208108 ) <kai@automatica.c[ ]au ['om.' in gap]> on Saturday August 01, 2009 @04:54AM (#28907391) Homepage

      Aside from Copy on Write, one other feature that this filesystem has that I would consider essential in a modern filesystem is full checksumming. As drives get larger and larger, the chance of a random undetected error on write increases and having full checksums on every block of data that gets written to the drive means that when something is written, I know it's written. It also means that when I read something back from the disk, I know that it was the data that was put there and didn't get silently corrupted by the [sata controller | dodgy cable | cosmic rays] on the way to the disk and back.

      • by am 2k ( 217885 )

        Doesn't help against RAM issues though, because those will just get into the checksum as well.

        • Re: (Score:3, Insightful)

          by aj50 ( 789101 )

          I had this exact problem very recently.

          If my data was important, I should have been using ECC RAM.

        • Re: (Score:3, Interesting)

          by borizz ( 1023175 )
          Odds are the checksum then won't match anymore and you'll be notified. It's better than silent corruption.
          • by am 2k ( 217885 )

            No, I was talking about alterations that happen before the checksum is calculated.

      • by BrentH ( 1154987 )
        What I'd like to know if btrfs does continuous checking of these checksums, preferably when there's not a lot of activity. Checksums are an excellent idea, but unless you check your files every now and again (automagically), you still don't know anything.
        • Re:So, (Score:5, Informative)

          by PhunkySchtuff ( 208108 ) <kai@automatica.c[ ]au ['om.' in gap]> on Saturday August 01, 2009 @06:57AM (#28907795) Homepage

          What you do know is that when you read a block of data back from the disk, that block is what was supposed to be written to the disk.

          If a file that is never read is corrupted somehow, then you will only discover that corruption when you read the file.

          Having checksums is very good if you have a RAID-1 mirror. With full block checksums, you can read each half of the mirror and if there is an error, you know which one is correct, and which one isn't. At present, if a RAID-1 mirror has a soft error like this, due to corruption, you don't know which half of the mirror is actually correct.

          With ZFS, for instance, you can create a 2-disk RAID-1 mirror and then use dd to write zeroes to one half of the mirror, at the raw device level (ie, bypassing the filesystem layer) and when you go to read that data back from the mirror, ZFS knows that it's invalid and instead uses the other side of the mirror. It then has an option to resilver the mirror and write the valid data back to the broken half, if you so want.

        • by swilver ( 617741 )

          Surface scans with SMART already catch these problems (blocks going bad).

          There's no need to check for single bit errors in files / blocks themselves as that's never gonna happen when the block was either fully error corrected by the drive or simply not returned at all due to a checksum failure (in which case it's a bad block).

          If a checksum DOES fail in such a setup (with the filesystem doing the check), it is actually more likely that it was damaged in memory (which could be fixed by simply reading it again

          • by toby ( 759 ) *

            Who was talking about single bit errors?

            SMART is irrelevant here. You need to study ZFS a lot more.

      • by swilver ( 617741 )

        If the checksumming is done by the CPU, on non-ECC memory, you might as well not use it as the data is most likely going to get corrupted at the source (your memory) not in transfer.

        The biggest source of bit errors at the moment is non-ECC memory as far as I can tell. Most busses are already protected in some form due to their high-speed nature. Hard drives themselves use many forms of error correction routinely to even read any sane data at all.

        On my own system I noticed a problem when copying large amou

        • by borizz ( 1023175 )
          At least with a checksumming FS you'll be notified of the error, even if it happens in RAM. That way you atleast know something is not right.
    • Re:So, (Score:5, Insightful)

      by borizz ( 1023175 ) on Saturday August 01, 2009 @05:00AM (#28907425)
      Snapshots are nice too. Makes stuff like Time Machine and derivatives much more elegant. ZFS has built in RAID support (which, I assume, works on the block level, instead of on the disk level), maybe Btrfs will get this too.
      • Re:So, (Score:5, Informative)

        by joib ( 70841 ) on Saturday August 01, 2009 @05:41AM (#28907547)


        ZFS has built in RAID support (which, I assume, works on the block level, instead of on the disk level), maybe Btrfs will get this too.

        Yes, btrfs currently has built-in support for raid 0/1/10, 5 and 6 are under development.

        • by anilg ( 961244 )

          ZFS raid works on disk level. When you create a pool.. you have to specify the disks (either as mirrors or parity). You can offline a disk if required (due to disk error or if you need to back it up somewhere).. these make them very interesting in certain applications. Nexenta, for example, has Devzones, where the root filesystem is on it's own ZFS dataset.

    • by zsau ( 266209 )

      People buy new computers often enough. For Btrfs to replace ext4 (I'm still using ext3 and didn't even realise an ext4 had been released!), I think all it will take is for major distributions to change the default file system for new installs. Obviously the number of people who replace existing file systems different ones will be comparatively low.

    • by Macka ( 9388 )

      The author of the article thinks so. He's predicting btrfs will be the default Linux filesystem 2 years from now.

  • by MMC Monster ( 602931 ) on Saturday August 01, 2009 @05:29AM (#28907513)

    Is it Beta? The fact that Linus runs it as his root fs doesn't tell me much. Now, if you told me that's what he uses for ~/, I would be more impressed.

    The important question to me is, how long 'til it gets in the major distributions?

    • by joib ( 70841 ) on Saturday August 01, 2009 @05:39AM (#28907539)
      The important question to me is, how long 'til it gets in the major distributions?

      The article predicts a couple of years until it's safe enough as default in new distros.

      • by TheRaven64 ( 641858 ) on Saturday August 01, 2009 @06:51AM (#28907775) Journal
        Meanwhile, FreeBSD and OpenSolaris are shipping with a version of ZFS that is usable now...
      • The article predicts a couple of years until it's safe enough as default in new distros.

        It's phrased as an upper bound, "Btrfs will be the default file system on Linux within two years."

        Note that it's the prediction of a single person, Valerie Aurora-formerly-Henson, who doesn't try to explain any rationale behind the prediction.

        She even says "Check back in two years and see if I got any of these predictions right!" ... So take it with an appropriate amount of salt*, whatever that means to you.

        (* please con

    • I asked when it would be usable for "people who backed up their data" about a year ago -- which is about how long I've been using it -- and the answer was, "No firm date." If you load up a 2.6.31 kernel, the commits have reached the point where not only shouldn't you see significant on-disk format changes, but that the bulk of non-RAID tweaking to occur is probably performance related. (RAID is coming, but it's only just started.) Grub still doesn't know about btrfs, and that's semi-back-burnered functio

    • The fact that Linus runs it as his root fs doesn't tell me much. Now, if you told me that's what he uses for ~/, I would be more impressed.

      It gets even worse. FTFA:

      Linus Torvalds is using it as his root file system on one of his laptops.

      Maybe one of his spares?

      I'm speculating, but note that the article doesn't say "his main laptop", which it could, and which would be a better "seal of approval", so it probably would if it was true...

    • by Enleth ( 947766 )

      "Real men don't use backups, they post their stuff on a public ftp server and let the rest of the world make copies."

        - Linus Torvalds

  • That this is a "service" provided by LWN so that non-subscribers can read premium content; this story would be free for all come Thursday, but apparently "diegocgteleline.es" didn't feel compelled to mention that, that LWN's weekly page is premium content, and that premium content subscribers help LWN stay afloat -- when it's almost gone under a couple times.

  • With the COW-enabled b-tree storing everything including metadata and packing it in the same block as the data it describes, the atructure looks quite similar to reiserfs (v3) in terms of error tolerance and recovery. Should this get a tool like the reiserfsck --rebuild-tree, I'm switching - this single feature (well, and some quite sensible performance) is keeping reiserfs on my systems. Saved me a lot of grief several times, when an ext filesystem would be a totally lost cause (or lots of $$$ for a data r

    • Similar to reiserfs in terms of tolerance. Oh great, thats safe, just don't marry it. :-), Look up reiserfs on slashdot, if you don't get that. Seriously though thanks for all the hard work from Linus developers.

      ---

      Linux [feeddistiller.com] Feed @ Feed Distiller [feeddistiller.com]

  • Oh great (Score:5, Funny)

    by teslatug ( 543527 ) on Saturday August 01, 2009 @06:50AM (#28907773)
    As if fsck wasn't bad enough to use in business talks, now I have to get prepared for btrfsck
  • You can't even settle things with a battle of the benchmarks: file system workloads vary so wildly that you can make a plausible argument for why any benchmark is either totally irrelevant or crucially important.

    As pointed out, filesystem workloads vary massively, which is why it's good to have a choice of different filesystems which can be chosen based on individual requirements. Only offering a single filesystem like many other OS's do is extremely inefficient. One size does not fit all.

  • Meh (Score:5, Funny)

    by Dachannien ( 617929 ) on Saturday August 01, 2009 @09:00AM (#28908401)

    Who cares? In a few years' time, this will be obsoleted by its successor, icantbelieveitsnotbtrfs.

    • When Fedora added btrfs support they didn't allow it as a root filesystem at installation time ... unless you passed the special command "icantbelieveitsnotbtr" on the kernel command line ;-)

  • I don't really like the name. The first time I look at the name, I thought it was short for "Bit Rot Filesystem".

  • First paragraph here describes how ZFS works in contrast - FTFA:

    In my opinion, the basic architecture of btrfs is more suitable to storage than that of ZFS. One of the major problems with the ZFS approach - "slabs" of blocks of a particular size - is fragmentation. Each object can contain blocks of only one size, and each slab can only contain blocks of one size. You can easily end up with, for example, a file of 64K blocks that needs to grow one more block, but no 64K blocks are available, even if the file

  • Hmmm, just keep meat products out of it so that it can be used in the Middle East without causing a religious problem and needs to be shut down once a week... ;)

As the trials of life continue to take their toll, remember that there is always a future in Computer Maintenance. -- National Lampoon, "Deteriorata"

Working...