Stories
Slash Boxes
Comments
typodupeerror delete not in

Comments: 187 +-   The Hairy State of Linux Filesystems on Wednesday February 11 2009, @04:09PM

Posted by timothy on Wednesday February 11 2009, @04:09PM
from the when-shrinkage-is-what-you-want dept.
storage
software
linux
RazvanM writes "Do the OSes really shrink? Perhaps the user space (MySQL, CUPS) is getting slimmer, but how about the internals? Using as a metric the number of external calls between the filesystem modules and the rest of the Linux kernel I argue that this is not the case. The evidence is a graph that shows the evolution of 15 filesystems from 2.6.11 to 2.6.28 along with the current state (2.6.28) for 24 filesystems. Some filesystems that stand out are: nfs for leading in both number of calls and speed of growth; ext4 and fuse for their above-average speed of growth and 9p for its roller coaster path."
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by dbIII (701233) on Wednesday February 11 2009, @04:13PM (#26818955)
    In the case of NFS for instance, hasn't there been a performance improvement? Isn't that the thing that matters?
    • by epiphani (254981) <epiphaniNO@SPAMdal.net> on Wednesday February 11 2009, @04:19PM (#26819071)

      Two things of note with NFS...

      1. NFSv4 support was added. v4 is complex and has a lot of authentication stuff in it that wasn't in v3.

      2. SunRPC is "part" of the NFS tree, but is effectively just a transport layer. It is completely abstracted, hence the numbers of symbols. It could be used for other stuff, so it pushes up that number too.

        • by Anonymous Coward on Wednesday February 11 2009, @06:43PM (#26820909)

          Hi. I'm the infamous Anonymous Coward, and it's time we had a talk.

          For years now, I've been enhancing the discussion on Slashdot through interesting interjections and humorous anecdotes (often about homosexual African Americans), but I feel things just aren't working out.

          It takes me an awful lot of time, researching composing and spell chekcing the many hundreds of valuable posts I make a day, and although I don't request anything in return all I ever see is abuse. You moderate my comments down for absolutely no good reason.

          I've had enough.

          From this point on I'm just not going to bother. It's over.

          I've been feeling this way for a while, slowly I've put less and less effort in my posts, repeating the same ideas over and over and, now, even started repeating whole posts verbatim.

          It's been fun, Slashdot, but I'm disillusioned. You broke my heart, and I am never doing to give you the benefit of my insight again.

          Be happy.
          Love and regrets,
          Anon.

    • Yes/no (Score:4, Insightful)

      by EmbeddedJanitor (597831) on Wednesday February 11 2009, @04:22PM (#26819129)
      The number of calls in the interface do matter because they increase complexity. This makes fs maintainability and development a bit harder from version to version as it gets less clear what each call should do. Many of the calls are optional, or can be performed by defaults, which does help to simplify things.

      There is little calling overhead from using multiple calls. Of course these interface changes are all done for a good reason: performance, stability, security.

      • Re: (Score:3, Insightful)

        The number of calls in the interface do matter because they increase complexity

        Replacing 100 lines of in-driver code to one function call from a shared library?

        • Re:Yes/no (Score:5, Insightful)

          by Suzuran (163234) on Wednesday February 11 2009, @04:59PM (#26819653)

          Function calls are not free. Especially in kernel space. Everything costs time. You need to do the most you can with the least instructions. 100 lines of inline code will probably run faster than a function call.

          • Re:Yes/no (Score:5, Informative)

            by ckaminski (82854) <ckaminski.pobox@com> on Wednesday February 11 2009, @05:49PM (#26820291) Homepage
            Okay, you MIGHT, just MIGHT have a point with a microkernel architecture, or if the filesystems are implemented in user space (Fuse), but is irrelevant in kernel modules in Linux - you're not crossing interrupt boundaries, so calling a kernel function is just as cost effective as rolling your own.
              • Re:Yes/no (Score:4, Insightful)

                by hackerjoe (159094) on Wednesday February 11 2009, @07:21PM (#26821315)

                What's your point? Processors can pipeline across branches just fine, and the main effect of cache is to give a performance boost to smaller code -- code that separates and reuses functions rather than inlining them willy-nilly.

                Inlining can still be a win sometimes, but compilers will do that for you automatically anyway...

          • Re:Yes/no (Score:5, Informative)

            by Daniel Phillips (238627) on Wednesday February 11 2009, @06:28PM (#26820779)

            Function calls are not free. Especially in kernel space. Everything costs time. You need to do the most you can with the least instructions. 100 lines of inline code will probably run faster than a function call.

            Never having been one to accept unsupported claims at face value, I just tested that assertion on a Pentim-M here, with a small C program that either calls a function to increment a counter, or directly increments the counter a number of times. I compiled with O0 to be sure gcc does not change around my code at all. Just the instructions, thanks. Funny thing? A hundred increments runs within 1% of the speed of 100 calls to a function to do the increment. And yes I unrolled those calls to isolate the cost of what I was measuring. So... rather surprisingly, the cost of these function calls is as close as doesn't matter, to exactly zero.

            Loops on the other hand... cost a huge amount. I won't get into details. But Intel clearly does something to optimize function calls in microcode, or probably even hardware. Function calls just don't cost what you think they do. In many cases, the function call will cost less by not trashing as much of that incredibly valuable L1 instruction cache.

            • Re:Yes/no (Score:4, Interesting)

              by Zan Lynx (87672) on Wednesday February 11 2009, @09:21PM (#26822403) Homepage

              I believe I read somewhere or other that branch predictors need a certain number of instructions between the branch instruction and the branch target in order to do a good job. If the only instruction in the loop is a single increment, that might explain the problem. Unrolling the loop so it has more instructions might fix it.

            • Re:Yes/no (Score:4, Interesting)

              by Z34107 (925136) <zealoussniper&netscape,net> on Wednesday February 11 2009, @09:31PM (#26822485)

              So... rather surprisingly, the cost of these function calls is as close as doesn't matter, to exactly zero.

              If the compiler knows the relative address of the function ahead of time, they are really fast.

              Try replacing your direct function call with a function pointer instead. Assign the function pointer the address of your function during runtime. It will be many orders of magnitude slower.

              Not sure why this is; just something I discovered the hard way.

              • Try replacing your direct function call with a function pointer instead. Assign the function pointer the address of your function during runtime. It will be many orders of magnitude slower.

                It goes faster as an indirect functional call if anything. Go figure.

                Anyway... orders of magnitude difference? Under some other rules of physics maybe. It would probably be a good idea to compile and time your program, as I did.

            • Re:Yes/no (Score:4, Insightful)

              by ultranova (717540) on Thursday February 12 2009, @08:00AM (#26826063)

              Funny thing? A hundred increments runs within 1% of the speed of 100 calls to a function to do the increment. And yes I unrolled those calls to isolate the cost of what I was measuring. So... rather surprisingly, the cost of these function calls is as close as doesn't matter, to exactly zero.

              Not at all surprisingly, since 100 function calls and 100 integer additions will take so little time on a modern processor - and, I suspect, would even on an 8088 - that they amount to a rounding error. The machine's clock doesn't have sufficient resolution to measure them. You'd need a hundred million for a meaningful comparison.

      • Re:Yes/no (Score:5, Interesting)

        by Yokaze (70883) on Wednesday February 11 2009, @04:55PM (#26819599)

        > The number of calls in the interface do matter because they increase complexity.

        That is only true, if a similar functionality is provided and the function-calls are of similar complexity (e.g. number of parameters, complexity of arguments.

        To my limited knowledge, over work has been done to extract more common functionality from file-systems. Should that be the the case, it would increase the number of function calls, but reduce the overall complexity.

    • by Smidge207 (1278042) on Wednesday February 11 2009, @04:23PM (#26819137) Journal

      Yes, that sounds like "slimming down" to me. At least, I can understand what teh article is trying to get at. It seems like we went through a period of early operating system development over the past few decades where the stress was on throwing everything in, including the kitchen sink. It's at least interesting that Linux distros are putting in some amount of effort into pulling excess functionality out of the default installation while computers continue to become bigger, faster, stronger.

      And I think it is pointing at something similar to what is going on with OSX, and it is a trend. We've hit some kind of a milestone, I think, where most of our computer functionality is "good enough" for most of what we actually use them for. Something about the development of computer systems right now reminds me of... whenever it was... 10 years ago?... when people were using their computers mostly for word-processing, and their computers were good enough for that, so there wasn't a huge drive to accomplish a particular thing. Then people discovered that they could rip CDs into MP3s and share them, and there grew this whole new focus on multimedia and the Internet.

      Now we have those things handled, and it seems like the answer to "what's next?" is making both hardware and software smaller and less bloated. We're getting smart phones that are becoming something more like a real portable computer, and we're getting things like netbooks. I predict you're also going to start seeing better use of embedded systems, like maybe DVRs are just going to be built into TVs soon. Not sure on that one, but I think you're going to see things shrinking, devices being consolidated, and a renewed focus on making things more efficient and refined.

      Meh. It's rambling time...

      =Smidge=

    • by Kjella (173770) on Wednesday February 11 2009, @04:29PM (#26819265) Homepage

      Ever since.... well, the first abstraction there's been a holy flamewar of abstractions versus spaghetti code. The one side of the war claims that by building enough layers each layer is simple, well-understood with well-defined interactions and thus fairly bugfree. The other side claims that abstractions wrap things in so many layers that the whole code is like an onion without substance, separating cause from effect so it's difficult to grasp and that these layers seriously hurt performance. The answer is usually to do is simple if possible, complex if necessary. Of calls went up and performance went up it's probably necessary, but isolated an increase in cross calls would be a bad thing.

    • Re: (Score:3, Insightful)

      Each unique external call represents a piece of code that has to be present to make the module work. Assuming the average size of the code referenced by an external function call doesn't change more unique calls would mean the module would need more code to support it. At least I believe that's what the author's thinking is.

      Of course that's a pretty big assumption. If you have more external calls because the code being called is leaner and only half the size on average then you could have a 50% increase in

  • The briefness of the article and lack of actual functional analysis make me think this should have been a comment on the original /. article rather than a whole new article of its own.

    Slow news day?

    • Re:Is this a story? (Score:5, Informative)

      by hedwards (940851) on Wednesday February 11 2009, @04:35PM (#26819335)

      I don't like that it was restricted to just Linux FSes, comparing it against ones available for other OSes, would have given it at least some context. Based upon the article, it sounds like Linux is being trounced. But, one doesn't really know because there isn't a comparison to other OSes to have any clue whatsoever.

    • Goofy metric, too. (Score:5, Insightful)

      by Ungrounded Lightning (62228) on Wednesday February 11 2009, @06:24PM (#26820741) Journal

      Unless I've misread it, TFA's definition of "size" for a filesystem is "how many distinct external/kernel subroutines does it call?"

      That seems to be a very strange metric. Seems to me that a well-designed filesystem will have little code and make extensive (re)use of well-defined services elsewhere (rather than reinventing those wheels). This "slims" the filesystem's own unique code down to its own core functionality.

      Now maybe, if the functions are hooks in the OS for the filesystem, a larger number of them implies a poorer abstraction boundary. But it's not clear to me, absent more explanation, that this is what TFA is claiming.

  • by Spamhead (462189) on Wednesday February 11 2009, @04:15PM (#26818989) Homepage

    got to make one call...

  • What? (Score:5, Interesting)

    by svnt (697929) on Wednesday February 11 2009, @04:17PM (#26819047)

    While OSes may be "sliming down" as the article says, what does the removal of standard db packages from Ubuntu have to do with filesystem-related kernel calls?

    The article doesn't seem to mention the possiblity that more functionality may be pushed into the kernel from userspace, which might make sense in other situations, but I don't think that argument would hold up here.

    I am struggling to make the connection between the summary and the so-called article. The fact that they are not stripping/locking fs functionality means that OSes aren't shrinking? That's the hypothesis?

  • by Anonymous Coward on Wednesday February 11 2009, @04:19PM (#26819087)

    You are kidding arent you ?

            Are you saying that this linux can run on a computer without windows underneath it, at all ? As in, without a boot disk, without any drivers, and without any services ?

            That sounds preposterous to me.

            If it were true (and I doubt it), then companies would be selling computers without a windows. This clearly is not happening, so there must be some error in your calculations. I hope you realise that windows is more than just Office ? Its a whole system that runs the computer from start to finish, and that is a very difficult thing to acheive. A lot of people dont realise this.

            Microsoft just spent $9 billion and many years to create Vista, so it does not sound reasonable that some new alternative could just snap into existence overnight like that. It would take billions of dollars and a massive effort to achieve. IBM tried, and spent a huge amount of money developing OS/2 but could never keep up with Windows. Apple tried to create their own system for years, but finally gave up recently and moved to Intel and Microsoft.

            Its just not possible that a freeware like the Linux could be extended to the point where it runs the entire computer fron start to finish, without using some of the more critical parts of windows. Not possible.

            I think you need to re-examine your assumptions.

    • Dude Microsoft is giving up on NTFS for WinFS with Windows 7.0. Get your facts straight before you start to character assassinate an operating system. WinFS was to be a part of Vista, but Microsoft removed it before the retail version in order to meet deadlines.

      Did you know that Linux has limited NTFS support? I usually have to create a FAT32 partition to copy files between Windows XP and Linux. NTFS is usually read only or not available. Pfffssssttt!

      Just like wine, Microsoft will not release a finished product before its time.

      • Re:Where's NTFS ? (Score:5, Insightful)

        by quickOnTheUptake (1450889) on Wednesday February 11 2009, @04:49PM (#26819515)

        Did you know that Linux has limited NTFS support? I usually have to create a FAT32 partition to copy files between Windows XP and Linux. NTFS is usually read only or not available.

        Have you heard of NTFS-3G [ntfs-3g.org]?

        The NTFS-3G driver is a freely and commercially available and supported read/write NTFS driver for Linux, FreeBSD, Mac OS X, NetBSD, Solaris, Haiku, and other operating systems. It provides safe and fast handling of the Windows XP, Windows Server 2003, Windows 2000, Windows Vista and Windows Server 2008 file systems.

          • I use NTFS-3G constantly for recovering data and also use it for external hard drives to transfer between windows and *nix machines. In it's early iterations, it was slow, but now days you can't even tell the FS isn't native.
      • Re:Where's NTFS ? (Score:5, Informative)

        by ADRA (37398) on Wednesday February 11 2009, @04:57PM (#26819617)

        1. The AC was a satire. In fact, I remember reading those exact lines at least once before. Its actually quite funny, so props to the original troll for making something really nice to read.

        2. ntfs-3g should be all you need to handle read/writes in Linux these days. I think its nested on top of fuse, so you'll probably need it as well. (Side note, glad Linus finally caved on allowing fuse into his kernel releases)

        3. WinFS is a meta-layer on top of NTFS, so not in itself a disk file-system.

    • by jaavaaguru (261551) on Wednesday February 11 2009, @04:46PM (#26819481) Homepage

      In Linux, the open office might be the default for editing your wordfiles, and you might prefer ubuntu brown over the grassy knoll of the windows desktop, but mark my words young man - without the windows drivers sitting below the visible surface, allowing the linus to talk to the hardware, it is without worth.

      And so, by choosing your linux as an alternative to windows on the desktop, you still need a windows licence to run this operating system through the windows drivers to talk to the hardware. Linux is only a code, it cannot perform the low level function.

      My point being, young man, that unless you intend to pirate and steal the Windows drivers and services, how is using the linux going to save money ? Well ? It seems that no linux fan can ever provide a straight answer to that question !

      May as well just stay legal, run the Windows drivers, and run Office on the desktop instead of the linus.

    • Re:Where's NTFS ? (Score:5, Interesting)

      by Kjella (173770) on Wednesday February 11 2009, @04:52PM (#26819551) Homepage

      Sometimes I wish there was a way to make my own meta-mod, like "don't include mods from the people that modded this up ever again". The same copy-paste has been in tons of stories now, and it's not funny anymore because it's the EXACT same thing. I'd even rather hear one more variation on our insensitive clod overlords from Soviet Russia.

    • Sorry to disrupt the trolling copy-pasta, but :
      Is this post on ZDNet's forum [zdnet.com] the original form of this troll ?
      Or is this troll older, and jerryleecooper was already copying it from somewhere else ?

      I'm just curious to know where this fine piece of humorous trolling was originally born.

  • Thoughts (Score:5, Informative)

    by stevied (169) * on Wednesday February 11 2009, @04:19PM (#26819089)

    Thoughts:

    - This is measuring, I believe, calls to different functions; a call to one function from multiple places is only counted once. So it's really a measure of the diversity of external calls.

    - Size and complexity aren't necessarily the same thing. It's actually possible that as common functionality is abstracted out of filesystems, they get smaller but make more external calls. There was a point a few years ago when this was happening at quite a rapid pace in the fs code, I don't know if it is still true.

    - Journalled filesystems and networked filesystems are pretty complex creatures by their nature, the quoted numbers don't seem unreasonable. NFS in particular implements (IIRC) protocol versions 2, 3 and 4, and 4 had a lot of new stuff.

    • Re:Thoughts (Score:4, Interesting)

      In fact, if you think about it, the greater the number of different functions a filesystem driver uses, the less functionality it needs to have within itself. I also don't think the number of external calls is a significant measure of anything related to the size or performance, really. It all depends on what calls are being made and for what purpose.

      If anything, as you imply, it's a measure of complexity. But even that might not really be the case if you stop and think about it. As more stuff is abstracted out, the less code goes into the filesystem code, the simpler, really, not more complex that filesystem driver becomes.

      I think this was a really poor choice of metric and that almost renders this entire article moot.

      • Re: (Score:3, Informative)

        Yeah. There was a stage (starting in the 2.3 days, I think) where the kernel gradually grew a very complete "framework" which the filesystems just plugged into, basically filling in the gaps. Straightforward, unixy filesystems became ridiculously simple to implement, and even the more complex ones got a lot of non-essential complexity combed out. Of course, there were a fair few specialized callbacks and utility functions made available to the fs code as part of this, and that may have pushed the unique c

  • not following (Score:5, Insightful)

    by Eil (82413) on Wednesday February 11 2009, @04:23PM (#26819143) Homepage Journal

    What's your argument here? That filesystem code in the kernel shouldn't be growing more sophisticated over time?

    This rings of the armchair-pundit argument that the kernel is getting more and more "bloated" and a breath later crying out that there still aren't Linux hardware drivers for every computing device ever made.

      • Re:not following (Score:5, Insightful)

        by doshell (757915) on Wednesday February 11 2009, @05:24PM (#26819971)

        I have a good idea to get the drivers while still eliminating the bloat.

        Have an option to compile the kernel during installation, based on detected devices.

        And then whenever you buy a new webcam, replace your graphics card, or whatever, the kernel must be recompiled. People will love that.

        Also: the Linux kernel is modular. This means you don't actually hold in memory, at every time, the drivers to all the hardware devices known to man. Only those your machine actually needs to run. The remaining are just files sitting on your hard disk, ready to be loaded should the need arise. This is an adequate way to keep down the bloat while not inconveniencing the user every time a new piece of hardware pops up.

  • The state (Score:5, Funny)

    by hkb (777908) on Wednesday February 11 2009, @04:23PM (#26819155)

    The state of Linux filesystems may be in disarray, but it's nothing to kill your wife over...

    *rimshot*

  • by Anonymous Coward
    Not being a filesystem/db geek, I honestly can't tell if "speed of growth" refers to "how frequently it's updated" or to "how rapidly it allocates space to store things". And I don't understand what the number of external calls means at ALL. Is that a bad thing? A good thing? Why? Can someone please provide some context? This doesn't have any at all!
    • Re: (Score:3, Informative)

      Unfortunately all "speed of growth" is referring to is the rate of increase of the number of filesystem kernel calls of a particular filesystem from version 2.6.11 to 2.6.28 of the Linux kernel.

      Nothing to do with any sort of performance metric.

  • Check out Tux3 (Score:5, Informative)

    by Daniel Phillips (238627) on Wednesday February 11 2009, @04:47PM (#26819495)

    While Tux3 [tux3.org] is not yet ready to run on your desktop, and won't be for a good many months, it is relatively trim at around 6K lines, and is expected to be somewhere around 10K complete with versioning, recovery and proper code comments. Of course, that will still be significant growth in a few months, and nothing says it won't just keep growing. But Tux3 is starting much smaller than its peers, and already has a pretty good range of "big filesystem" features. One of our guiding principles is to keep it tight, therefore leaving fewer places for bugs to hide.

    • But Tux3 is starting much smaller than its peers, and already has a pretty good range of "big filesystem" features.

      Let us count the number of fast, slim projects have been sucked down this way...

      Programmer: "This shit is bloated. I'm starting a new project that will be slim and fast"
      <type type type>
      <build build make>
      User 1: "This is really nice and fast, but I need feature X"
      <add add add>
      Users 2 & 3: "I'd use it, but I really love $OTHER_PROGRAM's Y"
      Programmer: "Grrr..."
      <add add add>
      User 4: "I've heard that Z is doing $SPIFFY, why doesn't this do that?"
      <type type add add add add build>
      User 5: "This is all big and slow and bloated... I'm going with N instead..."
      Programmer: "Fuck you! Fuck you all motherfuckers!"

      Not to say that Tux3 will go or is going this way. Indeed, as long as people stick around who remember the guiding principle of keeping it small it shouldn't. Best of luck!

      /ext2fs fanatic
      //I can shredses the files... yes I can, and I know it will workses...

  • by Vellmont (569020) on Wednesday February 11 2009, @05:04PM (#26819723)

    The author of the article takes the position that filesystem external calls == "operating system size", and then proceeds to start measuring his new definition.

    What he never mentioned or even tries to justify was his metric. Why does more external calls (or as someone more accurately pointed out, diversity of external calls) equate to "operating system size"? Why does this equate to an even more abstract concept of "simple"?

    I don't see any reason to equate these measurements to such conclusions. "size" and "simple" are abstract concepts that involve a lot more than a simple count of external references.

  • by renoX (11677) on Wednesday February 11 2009, @05:34PM (#26820087)

    I think that you can compile only the filesystem you want in the kernel..
    So the only complexity which matter to an user is the one of the filesystem they select to compile in the kernel!

    • Re: (Score:3, Interesting)

      Nevermind COMPILING stuff. You can just plain choose not to USE stuff.

      Don't want the "bloat" of NFS or ext4, then don't bloody use them.

      Yeah, the spiffy new things or inherently complex things might
      show that complexity in the code. Imagine that. The source for
      Halo looks bigger than the source for Pacman.

      There is no news here.

      As an nvidia user, ATI can make their Linux drivers as bad and
      as bloated as they want. I don't care. It really doesn't effect
      me.

Put your brain in gear before starting your mouth in motion.