Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

On the State of Linux File Systems

Posted by kdawson on Sat Nov 29, 2008 03:45 PM
from the here-hold-this-for-me dept.
kev009 writes to recommend his editorial overview of the past, present and future of Linux file systems: ext2, ext3, ReiserFS, XFS, JFS, Reiser4, ext4, Btrfs, and Tux3. "In hindsight it seems somewhat tragic that JFS or even XFS didn't gain the traction that ext3 did to pull us through the 'classic' era, but ext3 has proven very reliable and has received consistent care and feeding to keep it performing decently. ... With ext4 coming out in kernel 2.6.28, we should have a nice holdover until Btrfs or Tux3 begin to stabilize. The Btrfs developers have been working on a development sprint and it is likely that the code will be merged into Linus's kernel within the next cycle or two."
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by tytso (63275) * on Saturday November 29 2008, @03:49PM (#25927531) Homepage

    The article states that ext4 was a Bull project; and that is not correct.

    The Bull developers are one of the companies involved with the ext4 development, but certainly by no means were they the primary contributers. A number of the key ext4 advancements, especially the extents work, was pioneered by the Clusterfs folks, who used it in production for their Lustre filesystem (Lustre is a cluster filesystem that used ext3 with enhancements which they supported commercially as an open source product); a number of their enhancements went on to become adopted as part of ext4. I was the e2fsprogs maintainer, and especially in the last year, as the most experienced upstream kernel developer have been responsible for patch quality assurance and pushing the patches upstream. Eric Sandeen from Red Hat did a lot of work making sure everything was put together well for a distribution to use (there are lots of miscellaneous pieces for full filesystem support by a distribution, such as grub support, etc.). Mingming Cao form IBM did a lot of coordination work, and was responsible for putting together some of the OLS ext4 papers. Kawai-san from Hitachi supplied a number of critical patches to make sure we handled disk errors robuestly; some folks from Fujitsu have been working on the online defragmentation support. Aneesh Kumar from IBM wrote the 128->256 inode migration code, as well as doing a lot of the fixups on the delayed allocation code in the kernel. Val Henson from Red Hat has been working on the 64-bit support for e2fsprogs in the kernel. So there were a lot of people, from a lot of different companies, all helping out. And that is one of the huge strengths of ext4; that we have a large developer base, from many different companies. I believe that this wide base of developer is support is one of the reasons why ext3 was more succesful, than say, JFS or XFS, which had a much smaller base of developers, that were primarily from a single employer.

    • by tytso (63275) * on Saturday November 29 2008, @04:04PM (#25927637) Homepage

      Oh, by the way... forgot to mention. If you are looking for benchmarks, there are some very good ones done by Steven Pratt, who does this sort of thing for a living at IBM. They were intended to be in support of the btrfs filesystem, which is why the URL is http://btrfs.boxacle.net/ [boxacle.net]. The benchmarks were done in a scrupulously fair way; the exact hardware and software configurations used are given, and multiple workloads are described, and the filesystems are measured multiple times against multiple workloads. One interesting thing from these benchmarks is that sometimes one filesystem will do better at one workload and at one setting, but then be disastrously worse at another workload and/or configuration. This is why if you want to do a fair comparison of filesystems, it is very difficult in the extreme to really do things right. You have to do multiple benchmarks, multiple workloads, multiple hardware configurations, because if you only pick one filesystem benchmark result, you can almost always make your filesystem come out the winner. As a result, many benchmarking attempts are very misleading, because they are often done by a filesystem developer who consciously or unconsciously, wants their filesystem to come out on top, and there are many ways of manipulating the choice of benchmark or benchmark configuration in order to make sure this happens.

      As it happens, Steven's day job as a performance and tuning expert is to do this sort of benchmarking, but he is not a filesystem developer himself. And it should also be noted that although some of the BTRFS numbers shown in his benchmarks are not very good, btrfs is a filesystem under development, which hasn't been tuned yet. There's a reason why I try to stress the fact that it takes a long time and a lot of hard work to make a reliable, high performance filesystem. Support from a good performance/benchmarking team really helps.

      • Re: (Score:3, Interesting)

        As a result, many benchmarking attempts are very misleading, because they are often done by a filesystem developer who consciously or unconsciously, wants their filesystem to come out on top, and there are many ways of manipulating the choice of benchmark or benchmark configuration in order to make sure this happens.

        Wouldn't it be logical to assume a filesystem developer has an idea on what the workload and hardware will be like _before_ writing his filesystem, then picking a benchmark that suits his idea
        • I dont think it would be logical at all.

          First, It would mean that each workload would require a different filesystem design.

          Then it would also mean that you dont need synthetic benchmarks at all, just a run the expected workload as your "benchmark".

        • Wouldn't it be logical to assume a filesystem developer has an idea on what the workload and hardware will be like _before_ writing his filesystem, then picking a benchmark that suits his ideas on what a filesystem is supposed to do?

          No, that would be illogical, unless again they were trying to craft bullshit benchmarks. The developer does not know how I will use the filesystem, and so any such benchmark is not useful to me. I also want to know how well the filesystem will perform if I have to perform some new task on it.

      • Repeato ad absurdium...

        What is that gibberish supposed to mean? Christ, I hate mock-Latin. If you want a fancy-sounding term referring to repeating something again and again, use ad nauseam.

          • Re: (Score:3, Interesting)

            on Windows i can see the file extension of every file on my hard drive. i determine the file type based on the same attribute that my shell does. if i get a file attachment or am browsing a directory, i can immediately distinguish executables from non-executables. if i'm looking for a PNG image, i just look for the appropriate icon and the .png extension, and i can double click on the icon and open the image without the possibility of accidentally running a malicious executable.

            however, on a lot of people's

            • using meta data or magic number to determine file format would have the same drawback. how would you determine the format of a file at a glance using meta data? you wouldn't have a safe/accurate and intuitive means of determining file type.

              I don't think that there is a 100% "safe and accurate" way to display the file type, assuming you are depending on a possibly-hostile file to supply the information in the first place. There are, however, a few things that an operating system can do to make life safer for users:

              1) Clearly mark executable files. Have some visual indication whether a file is set to be executable (this, of course, assumes that your operating system has an execute bit; if it doesn't, that's a bigger problem). This indication should be consistent, universal, and impossible to override with metadata or custom icons. It should apply both to CLI shells and GUIs. (Although not necessarily in the exact same way; however my personal preference for such an indicator, which is putting the file name in bold, would work both in a GUI and CLI environment.)

              2) Don't use the same action to execute as to open. Using the same action (the double-click) both to "run" and to "open" -- which are two very different actions -- is probably responsible for the vast majority of user-propagated malware today. I would love to see an operating system rigorously enforce a separate 'run' action, so that a user clicking on what appears or claims to be a data file (intending to open an application and read that file) could not accidentally execute it.

              3) Break the filesystem into 'data' and 'executable' sections, and bar files on the 'data' sections from being marked as executable under any circumstances. I don't think this would be as effective as #2, but it would probably involve less user retraining. In order for content to be executed, it would have to be copied or installed onto the executable partition (which in normal operation could even be mounted read-only).

              You could do all of this with the data-type indicator as part of the file name, or as a separate piece of metadata; it doesn't really matter. There's no 'safety' advantage to doing it either way, it's just that keeping it in the file name is considered very ugly by a lot of people (myself included). I'm personally a fan of the way that the Mac used to do it, with a two part code (one for the file's actual type, the other for the application that either created it or should be used to open it), except that unlike the Mac, it should be easily editable by the user, and a lot of standardization and interoperability challenges would have to be solved. I'll be surprised if I see the filename.ext thing die in my lifetime, honestly. It's just too entrenched.

  • Lightweight (Score:5, Insightful)

    by postbigbang (761081) on Saturday November 29 2008, @03:51PM (#25927551)

    A cute FA in some ways, but bereft of content. Wish there was something to see here, like comparisons regarding integrity, access costs, evolution from JFS and Andrews journaled FS, etc. No real meat (with apologies to the vegetarians out there). Just a lightweight historical analysis with some glib suggestions of current adaptations.

  • ZFS!! (Score:3, Interesting)

    by Anonymous Coward on Saturday November 29 2008, @04:04PM (#25927639)

    What Sun needs to do is release ZFS under a proper license so we can finally have 1 unified filesystem. Yes, we can use it under FUSE, but this brings unnecessary overhead and problems. It will be nice when we can transport disks around, similar to fat(32), and not have to worry about whether another OS will be able to read it or not. On top of that, CRC block checksumming, high performance, smb/nfs/iscsi support integrated, Volume AND partition manager.

    Come on Sun! Are you listening??

    • Re:ZFS!! (Score:5, Funny)

      by TheRaven64 (641858) on Saturday November 29 2008, @04:15PM (#25927709) Homepage Journal
      Sun has released it under a proper license and we can finally have 1 unified filesystem. The 'we' in this case being Solaris, Mac OS X, and FreeBSD users, of course.
    • Re:ZFS!! (Score:5, Insightful)

      by diegocgteleline.es (653730) on Saturday November 29 2008, @04:53PM (#25927947)

      ZFS has redefined the way future filesystems are going to be designed. But there is no way that it's going to be the "last" filesystem.

      As shocking as it may seem to those who have drunk the marketing kool aid, we'll see more filesystems. Filesystem research is as alive as it always was. They'll try to copy the good ideas of ZFS and they will try to avoid the disadvantages (which every software has). So you are never going to have "1 unified filesystem". It's never going to happen. And it's a good thing.

        • Re:ZFS!! (Score:5, Interesting)

          by Kent Recal (714863) on Saturday November 29 2008, @06:59PM (#25928641)

          I hear you and I'm sure the filesystem developers have the same ideas in their heads.
          The problem is that there are some really hard problems involved with these things.

          In the end everybody wants basically the same thing: A volume that we can write files to.
          This volume should live on a pool of physical disks to which we can add and remove disks at will and during runtime.

          The unused space should always be used for redundancy, so when our volume is 50% full then we'd expect that 50% of the disks (no matter which) can safely fail at any time without data loss.

          Furthermore we don't really want to care about any of these things. We just want to push physical disks into our server, or pull them, and the pool should grow/shrink automagically.
          And ofcourse we want to always be able to split a pool into more volumes, as long as there's free space in the pool we're splitting from. Ideally without losing redundancy in the process.

          We want all these things and on top we want maximum IOPS and maximum linear read/write performance in any situation. Oh, and we won't really be happy until a pool can span multiple physical machines (that will auto re-sync after a network split and work-as-expected over really slow and unrealiable networks), too.

          ZFS is a huge step forward in many of these regards and there's a whole industry built solely around these problems.
          Only time will tell which of these goals (and the ones that I omitted here) can really be achieved and how many of them can be addressed in a single filesystem.

        • Re: (Score:3, Insightful)

          Get a grip.

          The people of Jonestown CHOSE their fate. They weren't systematically hunted down for their race nor were they killed for being in the wrong city in the wrong building at the wrong time. Everyone in that cult CHOSE to give up their worldly belongings, uproot their lives to Guyana, AND drink the cyanide laced juice (not actually kool-aid) for "revolutionary" causes.

          As long as propaganda and rhetoric have their effects, we should ABSOLUTELY continue to use that metaphor as a reminder against blind

      • Re:ZFS!! (Score:5, Insightful)

        by harry666t (1062422) <harry666tNO@SPAMgmail.com> on Saturday November 29 2008, @04:47PM (#25927907)
        You can have an alternative implementation of ext2 that wouldn't have to use GPL'd code from Linux. I saw ext2/3 drivers for Windows and I'm pretty sure that at least some of the non-GPL OSs out there (Mac? BSDs? Solaris?) can read/write ext2.

        However, you can't reimplement ZFS under any other license (CDDL is licensing some of the patents that cover the ZFS only to the users of the original implementation or its derivatives). I'd say it's *BOTH* GPL's and CDDL's fault (what's more, Sun chose CDDL exactly because it's GPL-incompatible).
          • Re:ZFS!! (Score:4, Insightful)

            by beelsebob (529313) on Saturday November 29 2008, @07:32PM (#25928837)

            Because Sun are licensing the software for you under a free and open source software license. The only thing stopping you from using it is that it's a license that doesn't agree with the ideology of the "all FOSS must be viral" GPL guys, and thus can't be used with GPLed software. There's plenty of non-GPLed projects that are happily getting on and using ZFS, but GPL guys can't. I'd say that makes it pretty obvious what the problem is.

      • Re:ZFS!! (Score:5, Interesting)

        by dokebi (624663) on Saturday November 29 2008, @05:00PM (#25927991)

        UFS (of BSDs) is under the most liberal license possible, yet it's definitely not the most widely used. FAT32 is patented by MS, and it is the most widely used. So, do you still think the problem is GPL?

      • Re:ZFS!! (Score:5, Interesting)

        by atrus (73476) <atrus@at[ ]trivalie.org ['rus' in gap]> on Saturday November 29 2008, @05:26PM (#25928141) Homepage
        Because having a block based filesystem that has no notion of what the underlying storage is "dumb". ZFS fixes those problems.

        Want to create a new filesystem in ZFS? Sure, no problem. You don't even need to specify a size, it will use whatever space the storage pool has available, no pre-allocation needed. How about removing one? Ok, its removed. Yes, it only took a second to do that. A traditional LVM + FS system can't do that - you need to resize, move, and tweak filesystems when doing any of the above operations - time consuming and limited.

        And if you're asking why you'd want to create and remove filesystems on the fly, there is one word for that: snapshots. Its quite feasible to generate snapshots many times per day for a ZFS backed fileserver (or even database server). Someone created a file at 9am and then accidentally nuked it before lunch? Don't worry, its still present in the 10am and 11am snapshots. All online, instantly available.

      • Re:ZFS!! (Score:5, Interesting)

        by ArbitraryConstant (763964) on Saturday November 29 2008, @05:43PM (#25928233) Homepage

        > But hey maybe I'm missing something, why not improve or create a replacement for LVM instead of including volume management in the filesystem?

        Maybe. But it would be a lot harder.

        Think about LVM snapshots for example. LVM allocates a chunk of the disk for your filesystem, and then a chunk of disk for your snapshot. When something changes in the parent filesystem, the original contents of that block are first copied to the snapshot. But if you've got two snapshots, it has to be copied to two places, and each snapshot needs its own space to store the original data. Because ZFS/BTRFS/etc are unified, they can keep the original data for any number of snapshots by the simple expedient of leaving it alone and writing the new data someplace new.

        LVMs can grow/shrink filesystems, but filesystems deal with this somewhat grudgingly. LVM lacks a way to handle dynamic allocation of blocks to filesystems in such a way that they can give them back dynamically when they're not using them. ZFS/BTRFS/etc can do this completely on the fly. LVMs rely on an underlying RAID layer to handle data integrity, but most RAID doesn't do this very well. BTRFS is getting a feature that allows it to handle seeky metadata differently than data (eg, use an SSD as a fast index into slow but large disks).

        It is conceivable that an advanced volume manager could be created that does all these things and all the rest (eg checksumming) just as well... but I think the key point is that this isn't something you can do without a *much* richer API for filesystems talking to block devices. They'd need to be able to free up blocks they don't need anymore, and have a way to handle fragmentation when both the filesystem and the volume manager would try to allocate blocks efficiently. They'd need substantially improved RAID implementations, or they'd need to bring the checksumming into the volume manager. I'm not saying it can't be done, but doing it as well as ZFS/BTRFS/etc when you're trying to preserve layering would be very tough. At a minimum you'd need new or substantially updated filesystems and a new volume manager of comparable complexity to ZFS/BTRFS/etc. I understand the preference for a layered approach, but I just don't think it's competitive here.

        • You just have to draw the layers differently.

          I've repeatedly proposed something, only to find that ZFS already implements it: Define one layer which is solely responsible for storing your bare primitives, like a sequence of data. It is the FS-level equivalent of malloc/free.

          Then, implement everything else on top of that layer. Databases could sit directly on the layer -- no reason they need to pretend to create files. Filesystems would sit on that layer, implementing structures like directories and POSIX fi

      • Actually, if you want a filesystem usable by everyone it will definitely have to come from Microsoft.

        It doesn't much matter whether they cooperate. Even if they insist that the Windows boot device continue to be NTFS, there's a standard way to write filesystem drivers for Windows (ext2 is already supported), and it's easy to put just about everything except Windows itself on another partition, if we have to. (Which we probably won't -- we could even slipstream it in.)

        So, we can force the issue.

        And suppose I have a portable hard drive, which I want to make sure is readable everywhere -- all I have to do is

  • by bboxman (1342573) on Saturday November 29 2008, @04:18PM (#25927725)

    Just my 2 bits. As a user of Linux in a software/algorithm context, my personal beefs with ext3 / the current kernel line are:

    1) IO priority isn't linked to to process priority, or at least, not in a decent manner. it is all too easy to lock up the system with one process that is IO heavy (or a multiple of these) -- hurting even high priority processes. As the IO call is handled by a system level (handling buffering, etc.) -- it garners a relatively high priority (possibly falling under the RT scheduler) and as a result IO heavy processes can choke other processes.

    2) ext3+nfs simply sucks with very large amount of files. I used to routinely have directories with 500,000 files (very easy to reach such amounts with a cartesian multiplication of options). The result is simply downright appalling performance.

    • by tytso (63275) * on Saturday November 29 2008, @04:28PM (#25927791) Homepage

      NFS semantics require that the data be stably written on disk before it can be client's RPC request can be acknowledged. This can cause some very nasty performance problems. One of the things that can help is to use a second hard drive to store an external journal. Since the journal is only written during normal operation (you need it when you recover after an system crash), and the writes are contiguous on disk, this eliminates nearly all of the seek delays associated with the journal. If you use data journalling, so that data blocks are written to the journal, the fact that no writes are required means that the data can be written onto stable storage very quickly, and thus will accelerate your NFS clients. If you want things to go _really_ fast, use a battery-backed NVRAM for your external journal device.

      • Re: (Score:3, Interesting)

        NFS semantics require that the data be stably written on disk before it can be client's RPC request can be acknowledged.

        This hasn't been true since NFSv2. We're at NFSv4 now...

    • The CFQ IO scheduler has been able to link IO priority with process priority for ages. But there's a performance issue in the ext3 journaling code that has been affecting many people for some time....

  • It is rarely an issue to me, but once in a while it is convenient to be able to plug an USB disk on a machine with Windows or Mac OS X. What portable file systems are there that will cover those cases? Last I did some looks around a few years back I ended up concluding that the best option for a file system supported on both Linux and Windows was ext2 (with third party drivers for Windows). The only other file system supported on both was FAT, which have several drawbacks.

    Moving forward, what file system
    • Re: (Score:3, Informative)

      Unless you're dealing with backward firmware/BIOS code that only understands FAT, you should be using UDF. Vista supports it, OS X supports it, Linux supports it, and everything back to win98 has readonly support - but you can get third-party drivers just like for ext2.

  • by r00t (33219) on Saturday November 29 2008, @04:27PM (#25927779) Journal

    We're checksumming free disk space. That's dumb.
    It makes RAID rebuilds needlessly slow.

    We're unable to adjust redundancy according to
    the value that we place on our data. Everything
    from the root directory to the access time stamps
    gets the same level of redundancy.

    The on-disk structure of RAID (the lack of it!)
    prevents reasonable recovery. We can handle a
    disk that disappears, but not one that gets
    some blocks corrupted. We can't even detect it
    in normal use; that requires reading all disks.
    We have extremely limited transactional ability.
    All we get for transactions is a write barrier.
    There is no way to map from RAID troubles (not
    that we'd detect them) to higher-level structures.

    With an integrated system, we could do so much
    better. Sadly, it's blocked by an odd sort of
    kernel politics. Radical change is hard. Giving
    of the simplicity of a layered approach is hard,
    even when obviously inferior. There is this idea
    that every new kernel component has to fit into
    the existing mold, even if the mold is defective.

    • Re: (Score:3, Insightful)

      Linux developers are aware of this issue; this is one of the things which is addressed by btrfs.

    • by Blackknight (25168) on Saturday November 29 2008, @05:20PM (#25928109) Homepage

      What is this we? ZFS is the fix for all of the issues you mentioned, it does checksums on every block it writes and the RAID 5 write hole is history. You can also set how many copies per file you want to keep.

    • by Piranhaa (672441) on Saturday November 29 2008, @05:29PM (#25928153)

      That's the goal of ZFS. Each block is checked with a 256-bit CRC checksum on every access. It incorporates a volume and partition manager in '1 tool', and knows where data is written to. On rebuilds it only repairs data that is actually there, which saves significant time. You should also setup weekly or bi-weekly scrubs (once a month for enterprise grade drives), which reads EVERY block written to and verifies it. This ensures that each block is still good, none is suffering from flipped bits, and that your disk isn't slowly failing on you.

  • by Minigun_Fiend (909620) on Saturday November 29 2008, @04:36PM (#25927849) Homepage
    ..called TLDRFS It simply ignores any files larger than 64KB.
  • by Britz (170620) on Saturday November 29 2008, @05:02PM (#25928007)

    Maybe not for a desktop machine, but for servers I like to use XFS. That started way back then when XFS was the first (and then only AFAIR) fs that supported running on softraid. It was not that long ago and CPU cycles were already so cheap on x86 that softaid was already a pretty nice solution for small servers.

    For small servers I have not changed that setup (XFS on softraid level one on two cheap drives) ever since.

    I guess for the big machines it might be very different. I am pretty happy with XFS as it is.

  • Reiser4 (Score:5, Interesting)

    by Enderandrew (866215) <enderandrewNO@SPAMgmail.com> on Saturday November 29 2008, @05:28PM (#25928149) Homepage Journal

    Hans was a jerk who has difficult to work with, and now he is a convicted murderer. That doesn't change the fact that Reiser4 as is may be the best desktop file system for Linux users, even with plenty of room for improvement.

    There are filesystems in development like Btrfs and Tux3 that look promising, but why should Reiser4 be abandoned? It is GPL. Anyone can pick it up and maintain it, or fork it.

    Does anyone know anything about the future of Reiser4?

    • Re:Reiser4 (Score:5, Interesting)

      by Ant P. (974313) on Saturday November 29 2008, @05:55PM (#25928281) Homepage

      Reiser4 is still being maintained, by one ex-Namesys person IIRC.
      The main problem is the Linux kernel devs - they were too busy trying to find reasons to keep it out of the kernel (I can agree with their complaints about code formatting, but after that they descend deep into BS-land) to actually improve it. From the outside it sounds a lot like the story about the RSDL scheduler - completely snubbed because it stepped on the toes of one kernel dev and his pet project.

      • by acb (2797) on Saturday November 29 2008, @07:56PM (#25928955) Homepage

        Over and above this, it'll need a new name. I know it doesn't make one iota technical difference, but people are fussy about such things; change the name, and people don't care if it was developed by fiends. Keep it and people will find excuses to edge away and it'll wither on the vine.

        The Volkswagen was a runaway success despite its Nazi origins, but had it been named the "Hitlerwagen", things would have probably turned out a lot differently.

      • Re: (Score:3, Informative)

        "From the outside it sounds a lot like the story about the RSDL scheduler - completely snubbed because it stepped on the toes of one kernel dev and his pet project."

        ReiserFS v4 wasn't included in the mainline kernel because Hans was being an even greater prick than usual to the kernel maintainers who asked him to fix his bugs and adhere to kernel coding conventions.

        RSDL wasn't included in the mainline kernel because Linus considered Con to be unreliable, and wanted to have a scheduler with a developer he co

      • Re: (Score:3, Interesting)

        Honestly, I have never lost data with Reiser4 and I have a toddler who loves pushing power buttons. However every single time I have tried ext3 or ext4 I have lost data. And I've lost data within weeks with ext3 and ext4.

        Reiser4 recovers "dirty" shutdowns better than ext3.

        I had one time when I had a dirty shutdown, and e2fsck decided to wipe my /etc directory and put all the files in lost+found.

  • JFS (Score:5, Insightful)

    by adrianbaugh (696007) on Saturday November 29 2008, @07:49PM (#25928915) Homepage Journal

    Sad to see JFS being overlooked so. While it may not have the postmodern features to compete in the wake of JFS, it's still in many cases the best current filesystem for linux. It's remarkably crashproof, has the lowest CPU loading of any of {ext3 jfs xfs reiser3}, good all-round performance (generally either first or second in benchmarks) and is fast at deleting big files. I haven't used anything else in a couple of years - I used to put reiser3 on /var, but got fed up with its crash intolerance. It's sad to see jfs so overlooked, because at least until btrfs or tux3 come out it's arguably the best option available.

    • by tytso (63275) * on Saturday November 29 2008, @03:55PM (#25927573) Homepage

      Ext4 supports up to 128 megabytes per extent, assuming you are using a 4k blocksize. On architectures where you can use a 16k page size, ext4 would be able to support 2^15 * 16k == 512 megs per extent. Given that you can store 341 extent descriptors in a 4k block, and 1,365 extent descriptors in a 16k block, this is plenty...

      • Re: (Score:3, Interesting)

        You seem very knowledgeable regarding filesystems in general. I'm interested in learning more about filesystems and how they work. To give you an idea of where I am, I believe I know what blocksize is, but I don't know what an extent is, and how it relates to performance (or why the grandparent would like extents several megabytes large).

        What resources would you suggest to people who are looking to learn more?

    • Re: (Score:3, Informative)

      The only real problem I have is there doesn't exist a modern journaling FS which would work just as well on all 3 platforms.

      I agree with you that's really important. I'd also like zfs to be that filesystem. However, as long as you don't need that drive to be the root drive of your respective file system, you might be interested in some of these links:

      I can use ext3, but cannot plug it into a Mac.

      Give this [sourceforge.net] a try. The latest news is that you get write support in Tiger, but I use it in Leopard without problems.

      Also don't worry about the ext2 part. Ext3 is designed to be backwards compatible with ext2. It can be mounted as ext2 (it just won't get journaling)

      You didn't ask

    • XFS is also nice, but the lack of a proper userspace fsck has turned me away there.

      Eh? Man xfs_repair(8)

      Just because it's not called "fsck" (and not run at boot time) does not mean that the functionality is not there when you need it.

      A crash does not mean you need to run fsck; that is why you pay the price for the journaling overhead, right? When xfs detects errors at runtime, run xfs_repair, and bask in the glory of "a proper userspace fsck."