Stories
Slash Boxes
Comments
typodupeerror delete not in

Comments: 241 +-   A Short History of Btrfs on Saturday August 01, @04:13AM

Posted by Soulskill on Saturday August 01, @04:13AM
from the new-and-shiny dept.
storage
software
linux
diegocgteleline.es writes "Valerie Aurora, a Linux file system developer and ex-ZFS designer, has posted an article with great insight on how Btrfs, the file system that will replace Ext4, was created and how it works. Quoting: 'When it comes to file systems, it's hard to tell truth from rumor from vile slander: the code is so complex, the personalities are so exaggerated, and the users are so angry when they lose their data. You can't even settle things with a battle of the benchmarks: file system workloads vary so wildly that you can make a plausible argument for why any benchmark is either totally irrelevant or crucially important. ... we'll take a behind-the-scenes look at the design and development of Btrfs on many levels — technical, political, personal — and trace it from its origins at a workshop to its current position as Linus's root file system.'"
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • Looks promising (Score:5, Informative)

    by PhunkySchtuff (208108) <kaiNO@SPAMautomatica.com.au> on Saturday August 01, @04:30AM (#28907305) Homepage

    This looks like a promising filesystem - as ZFS on linux is, at present, doomed to die an ugly death, btrfs looks to address a lot of the shortcomings of other filesystems and bring a clean, modern fs to linux. It goes beyond ZFS in some areas too, such as being able to efficiently shrink a filesystem, and keeps a lot of the cool things that ZFS made popular, such as Copy-On-Write.

    It looks like Btrfs also addresses some decisions that were made with the direction that ZFS would be going in, or how it would handle certain problems that now with hindsight behind the developers, they possibly would have done things differently.

    Apple are really struggling with ZFS, with it being announced as a feature in early betas of both Leopard (10.5) and Snow Leopard (10.6), as well as being there in a very limited form in Tiger (10.4) - maybe development on Btrfs will leapfrog ZFS for consumer-grade hardware and Apple can finally look at deprecating HFS.

    • Re: (Score:3, Informative)

      ... but btrfs is GPL. Therefore Apple can't use it, unless perhaps they are able to work out licensing from Oracle.

      • Or implement it themselves. It's only a spec after all.

      • Re:Looks promising (Score:5, Informative)

        by PhunkySchtuff (208108) <kaiNO@SPAMautomatica.com.au> on Saturday August 01, @06:31AM (#28907703) Homepage

        Apple has, and does, use GPL'd code and complies with the terms of the license.

        Take, for example, WebKit, which is a fork of KHTML. It's now released as LGPL:
        http://webkit.org/coding/lgpl-license.html [webkit.org]

        This code powers the browser that Apple ship with Mac OS X, Safari - which is arguably one of the most important pieces of code in the whole OS.

        As a result of it's quality, speed and standards adherence, it's now used by companies like Nokia and Adobe...

        • Re:Looks promising (Score:4, Informative)

          by TheRaven64 (641858) on Saturday August 01, @08:03AM (#28908077) Homepage Journal

          The GPL and LGPL are very different. The LGPL does not affect any code beyond that originally covered by the license. You can link LGPL'd WebKit against proprietary-licensed Safari with no problems.

          Apple also ship GPL'd software like bash, but they don't link it against any of their own code.

          Linking GPL'd code into the kernel would require the rest of the kernel to be released under a license that places no restrictions that are not found in the GPL. That's not a problem for Apple's code; they own the copyright and they can release it under any license they choose. It would be a massive problem for third-party components. DTrace, for example, is heavily integrated into OS X's Instruments developer app and is CDDL (GPL-incompatible). Various drivers are under whatever license the manufacturers want, and are mostly GPL-incompatible. A GPL'd component would need to be very compelling to make Apple rewrite DTrace, most of their drivers, and a lot of other components. Btrfs is not this compelling. Even if Btrfs were sufficiently good, it would take less effort for them to just completely rewrite it than to rewrite all of the GPL-incompatible components.

        • Re: (Score:3, Informative)

          I think that since it's a part of the kernel, it would count as a derivative work which would mean the whole kernel would have to be GPL'd as well.

          This is similar to the reason that ZFS can't just be ported to linux, the code is under CDDL which is incompatible with GPL.

          • Re:Duh... (Score:5, Informative)

            by caseih (160668) on Saturday August 01, @12:09PM (#28909897)

            Wow. FUD flies fast and hard on slashdot. Zealots? Are you serious? Rather than mod your post as +1 Funny, I think I'll blow some karma and respond, just to set the record straight.

            Laying aside misconceptions about the GPL, the main reason BtrFS is GPL is because it's part of the Linux kernel which is also GPL! How hard is it to grasp that? If Apple or anyone else wants to license Oracle's BtrFS code, they are welcome to negotiate and get the code under a different license than the GPL. It's that simple. BtrFS is an implementation of an idea, a specification. If Apple wants to write their own BtrFS driver, they are welcome to do that. Or Microsoft.

            Why are developers who don't want their code to be ripped off (used without payment in a closed product) by companies and incorporated into a product are labeled zealots? How is this different than software companies requiring code to be licensed by third parties? So a company who creates some really cool technology that they license for a fee to others for use in products zealots? There really is no difference.

            While I haven't written any software of note, I also use the GPLv2 (evaluating v3) since I want my software to be able to be freely used by those that want to use it, but if my code is that valuable to a company, I want to get paid for my trouble. If no one is willing to pay me, then that's fine. They are welcome to use my software without restriction, but if they redistribute it, to do so under the terms of the GPL. Guess that makes me a zealot.

            • Re: (Score:3, Interesting)

              Why are developers who don't want their code to be ripped off (used without payment in a closed product) by companies and incorporated into a product are labeled zealots?

              Perhaps because they are writing software which is by FAR most useful when it is used as far and wide as possible, while using a license which makes that goal extremely difficult to achieve, unnecessarily.

              Honestly, the only reason anyone cares about Btrfs is because the license on ZFS is too restrictive for inclusion in Linux, and NOBODY ha

    • Apple are really struggling with ZFS, with it being announced as a feature in early betas of both Leopard (10.5) and Snow Leopard (10.6), as well as being there in a very limited form in Tiger (10.4)

      It's also available on 10.5/6 [macosforge.org] with some limitations. It's not marketed because it's not quite feature-complete, but it works in OS X and on FreeBSD. There's almost no chance of Apple adopting btrfs in the OS X kernel though, because the GPL is incompatible with the license of a number of other components.

  • So, (Score:2, Insightful)

    Is this ever going to replace ext4? The ext series of file systems are 'good enough' for most people, so unless it has some epic benchmarks I can't imagine a huge rush to reformat. Maybe that's what drives file system programmers insane. The knowledge that for the most part, it's going nowhere. FAT12 is still in use, for Christ's sake.
    • Re:So, (Score:5, Interesting)

      by PhunkySchtuff (208108) <kaiNO@SPAMautomatica.com.au> on Saturday August 01, @04:54AM (#28907391) Homepage

      Aside from Copy on Write, one other feature that this filesystem has that I would consider essential in a modern filesystem is full checksumming. As drives get larger and larger, the chance of a random undetected error on write increases and having full checksums on every block of data that gets written to the drive means that when something is written, I know it's written. It also means that when I read something back from the disk, I know that it was the data that was put there and didn't get silently corrupted by the [sata controller | dodgy cable | cosmic rays] on the way to the disk and back.

      • Doesn't help against RAM issues though, because those will just get into the checksum as well.

        • Re: (Score:3, Insightful)

          I had this exact problem very recently.

          If my data was important, I should have been using ECC RAM.

        • Re: (Score:3, Interesting)

          Odds are the checksum then won't match anymore and you'll be notified. It's better than silent corruption.
      • What I'd like to know if btrfs does continuous checking of these checksums, preferably when there's not a lot of activity. Checksums are an excellent idea, but unless you check your files every now and again (automagically), you still don't know anything.
        • Re:So, (Score:5, Informative)

          by PhunkySchtuff (208108) <kaiNO@SPAMautomatica.com.au> on Saturday August 01, @06:57AM (#28907795) Homepage

          What you do know is that when you read a block of data back from the disk, that block is what was supposed to be written to the disk.

          If a file that is never read is corrupted somehow, then you will only discover that corruption when you read the file.

          Having checksums is very good if you have a RAID-1 mirror. With full block checksums, you can read each half of the mirror and if there is an error, you know which one is correct, and which one isn't. At present, if a RAID-1 mirror has a soft error like this, due to corruption, you don't know which half of the mirror is actually correct.

          With ZFS, for instance, you can create a 2-disk RAID-1 mirror and then use dd to write zeroes to one half of the mirror, at the raw device level (ie, bypassing the filesystem layer) and when you go to read that data back from the mirror, ZFS knows that it's invalid and instead uses the other side of the mirror. It then has an option to resilver the mirror and write the valid data back to the broken half, if you so want.

    • Re:So, (Score:5, Insightful)

      by borizz (1023175) on Saturday August 01, @05:00AM (#28907425)
      Snapshots are nice too. Makes stuff like Time Machine and derivatives much more elegant. ZFS has built in RAID support (which, I assume, works on the block level, instead of on the disk level), maybe Btrfs will get this too.
      • Re:So, (Score:5, Informative)

        by joib (70841) on Saturday August 01, @05:41AM (#28907547)


        ZFS has built in RAID support (which, I assume, works on the block level, instead of on the disk level), maybe Btrfs will get this too.

        Yes, btrfs currently has built-in support for raid 0/1/10, 5 and 6 are under development.

    • People buy new computers often enough. For Btrfs to replace ext4 (I'm still using ext3 and didn't even realise an ext4 had been released!), I think all it will take is for major distributions to change the default file system for new installs. Obviously the number of people who replace existing file systems different ones will be comparatively low.

  • by MMC Monster (602931) on Saturday August 01, @05:29AM (#28907513)

    Is it Beta? The fact that Linus runs it as his root fs doesn't tell me much. Now, if you told me that's what he uses for ~/, I would be more impressed.

    The important question to me is, how long 'til it gets in the major distributions?

    • by joib (70841) on Saturday August 01, @05:39AM (#28907539)
      The important question to me is, how long 'til it gets in the major distributions?

      The article predicts a couple of years until it's safe enough as default in new distros.

      • by TheRaven64 (641858) on Saturday August 01, @06:51AM (#28907775) Homepage Journal
        Meanwhile, FreeBSD and OpenSolaris are shipping with a version of ZFS that is usable now...
            • by joib (70841) on Saturday August 01, @07:40AM (#28907979)

              Just because a replied to your snarky message with another equally snarky one, doesn't mean I'm not able to put it into words. For instance, a few reasons why I prefer Linux over *BSD or Solaris:

              - better package management

              - better hw support

              - better ISV support

              - the uncertain future of Solaris (after all, Sun got bought because they were bleeding red ink left and right, will the Solaris devs escape the inevitable layoffs and Oracle continue pumping money into Solaris development just to try to keep up with Linux?)

              - Lack of tier-1 commercial support for *BSD.

              - Much larger community

              - Better availability of qualified Linux sysadmins

              • by asaul (98023) on Saturday August 01, @08:17AM (#28908147)

                For hardware support it really depends what segment of the market you are arguing about. If you are talking white box, low end mostly self supported stuff then no doubt, Linux wins hands down. But as a sysadmin I find Linux to be the of the most painful platform to work on compared to Solaris or AIX - predominantly because of the lack of standardised, stable and properly supported management interfaces.

                Fibre channel support is a joke. Sure, for the most part you can dynamically bring stuff in and out, and udev goes a short way to bringing some consistancy. The problem is when something goes wrong you are left with pretty much just rebooting - messages tell you nothing - is the device there or not? Usable details are buried away in /proc and /sys and typically are only useful for developers. Solaris and AIX had cfgadm/cfgmgr and lsdev and friends to tell you what state things are in or what has happened. There are useful and informative error messages (typically). So far on RHEL 3/4/5 all I ever see is odd octal dumps from drivers when errors occur, and wierd hangs and IO errors when devices get broken. It gets worse as you change fibre drivers and versions. Options which exist in one disappear in others. Vendor drivers add customisations which cause other issues.

                The lack of stablity in terms of being able to do things between versions gets me as well. On AIX/Solaris you write a script for Solaris 8, and it just works going forwards to other versions. Solaris 10 changes things a bit, but for the most part you can still poke around the same places or the same way to get info back. In short they tend not to break things that work.

                Linux goes the other way - a change is made, and thats that, it seems to be up to you to either track or figure it out. You find yourself having to customise things for many many variations of platform - not just major versions, but minor versions as well. Changes to config file locations, the ways those files are defined etc.

                Don't get me wrong, I got into UNIX on Linux and I wont dispute its strength in drivers or community, but that community is not "Enterprise" focused. Its why I use it for my PVR and not my file server. The rapid changes in Linux are why the DVB-T cards I got became supported so quickly after the hardware changed. I get the differences, but its not one size fits all.

                • Re: (Score:3, Interesting)

                  Yeah, right.

                  I want to see you back out a series of patches on Linux and revert to the previous configuration because the updates broke something.

                  # echo =package-cat/package-offending-version >> /etc/portage/package.mask
                  # emerge -C =package-cat/package-offending-version && emerge package-cat/package

                  Rinse and repeat for any other packages which may be borked.

                    • by ion.simon.c (1183967) on Saturday August 01, @10:10AM (#28908837)

                      Fail.

                      1. Not available on the majority of Linux installations

                      Something similar seems to be available in APT:
                      http://www.debian.org/doc/manuals/apt-howto/ch-apt-get.en.html [debian.org]
                      Check section 3.10.
                      And here's the rough equivalent for RPM:
                      http://www.linuxjournal.com/article/7034 [linuxjournal.com]

                      So, what distro is no longer covered?

                      2. Removing a package is not the same as reverting to an earlier version of the same package.

                      I guess that you missed the latter half of the last command that I posted:

                      # emerge -C =package-cat/package-offending-version && emerge package-cat/package

                      An English translation of that command is

                      Remove the offending package and install the latest available that's not masked if the removal was successful.

                      I could have written that command as:

                      # emerge package-cat/package

                      and -as I had previously masked the offending package version- Portage would have done the right thing.

                      So, in summary:

                      No, you're a towel.

                      :D

    • I asked when it would be usable for "people who backed up their data" about a year ago -- which is about how long I've been using it -- and the answer was, "No firm date." If you load up a 2.6.31 kernel, the commits have reached the point where not only shouldn't you see significant on-disk format changes, but that the bulk of non-RAID tweaking to occur is probably performance related. (RAID is coming, but it's only just started.) Grub still doesn't know about btrfs, and that's semi-back-burnered functio

  • That this is a "service" provided by LWN so that non-subscribers can read premium content; this story would be free for all come Thursday, but apparently "diegocgteleline.es" didn't feel compelled to mention that, that LWN's weekly page is premium content, and that premium content subscribers help LWN stay afloat -- when it's almost gone under a couple times.

  • With the COW-enabled b-tree storing everything including metadata and packing it in the same block as the data it describes, the atructure looks quite similar to reiserfs (v3) in terms of error tolerance and recovery. Should this get a tool like the reiserfsck --rebuild-tree, I'm switching - this single feature (well, and some quite sensible performance) is keeping reiserfs on my systems. Saved me a lot of grief several times, when an ext filesystem would be a totally lost cause (or lots of $$$ for a data r

  • Oh great (Score:5, Funny)

    by teslatug (543527) on Saturday August 01, @06:50AM (#28907773)
    As if fsck wasn't bad enough to use in business talks, now I have to get prepared for btrfsck
        • Re:Oh great (Score:5, Interesting)

          by toby (759) * on Saturday August 01, @03:35PM (#28911587) Homepage Journal

          I'd rephrase that. It eliminates the common cases where you'd need fsck on a conventional filesystem.

          ZFS' design makes consistency failure extremely unlikely. I understand why they claim it doesn't need fsck ("always consistent on disk"). [sun.com] There is controversy over whether there should be a scavenging tool. [opensolaris.org] Some people want one for peace of mind.

          But again, most cases of ZFS pool loss where some believe a scavenger may have saved them, may actually have been solved by more aggressive rollback (I believe work is being done on this).

          Anyone interested in this issue should follow the ZFS mailing list. [opensolaris.org]

  • Meh (Score:5, Funny)

    by Dachannien (617929) on Saturday August 01, @09:00AM (#28908401)

    Who cares? In a few years' time, this will be obsoleted by its successor, icantbelieveitsnotbtrfs.

    • Re: (Score:2, Offtopic)

      It's not that hard!

      No, it's not - but you're wrong. :"Linus's" is correct. The 's' after the apostrophe only gets dropped in plurals.

      • It's fine to drop the 2nd s, at least in British english. Though, there doesn't seem to be any hard and fast rule..

        • Re: (Score:3, Interesting)

          There doesn't seem to be any hard and fast rules about anything in British english! ;-)

          In Fowler's Modern English Usage, which is generally considered to be the bible of english usage by UK journalists and writers, there's an article called "Possessive Puzzles". In that, he says it was "formerly customary" to drop the last 's', but not any more.

          If it was formerly customary in Fowler's day, i reckon it must be well and truly archaic now.

            • Re: (Score:3, Insightful)

              In spoken English, you generally pronounce the second 's' (unless you are a pedant of some sort), so it would stand to reason that the second 's' should remain. There is another motivation: the "'s" is actually a clitic that attaches to phrases (usually noun phrases) and is thus a separate word, not a part of the word it is attached to. As such, it should always be spelled out (as it is always pronounced).

      • Both are correct, depending on who you ask. It's a British English vs. American English thing. Up here in Canada, just the apostrophe seems to be the preferred form.

        • I was taught in my American schools (grade school, even) that you drop the second S on names that end in S. "Linus'"is how I was taught.
        • That's interesting. I think in Britain the preferred form of anything is to put an apostrophe anywhere it can possibly be put and drop any syllables that can possibly be dropped.

          It seems the version with the final 's' is more modern than the version without it. Considering their efforts to update the language, it's perhaps not entirely surprising that the Americans prefer the more modern form.

    • It's not that hard!

      Not necessarily true: http://www.bartleby.com/141/strunk.html#1 [bartleby.com]

    • I've lost files on lots of different systems, including NTFS and ext3 and the kitchen sink. It's rarely the fault of the filesystem itself.
    • Re: (Score:3, Insightful)

      I wish the parent hadn't been modded down. He makes a point that should be addressed.

      I've lost data on every file system that I've ever used, including NTFS, and the highly touted ReiserFS. Nothing guarantees the security of your data. The nearest you can come to data security, is to backup, backup, and backup again. Those people and organizations that keep regular backups seldom lose data. However, even those people can lose data in the event of a physical disaster (fire, flood, theft, being hit by a

    • Re: (Score:3, Insightful)

      We all know that the data is not zeroed on deletion, so why can't we have a File System that (preferably after fs umount) can scan the blocks and retrieve any file whose data blocks have not been overwritten yet, even if it takes a lengthy whole disk surface scan.

      Why would you use such shenigans? Simply make the filesystem mark deleted files as "hide from directory listing, and really delete only if you need the space". Then add a couple of syscalls to examine these "recyclable" files and restore them to n

    • Re: (Score:3, Interesting)

      I undelete stuff all the time on Linux. you just open the trash and pull the stuff out. Once you empty the trash it is gone though. If you're using a command-line and 'rm' stuff though, that's entirely your fault for using such a low-level power-user interface for file management.

      There are serious performance consequences and fragmentation consequences of supporting undelete at the filesystem level. But supporting snapshots is something high performance filesystems do, and snapshots are way more useful than

Life is the childhood of our immortality. -- Goethe