Forgot your password?
typodupeerror
Data Storage Software Linux

The Hairy State of Linux Filesystems 187

Posted by timothy
from the when-shrinkage-is-what-you-want dept.
RazvanM writes "Do the OSes really shrink? Perhaps the user space (MySQL, CUPS) is getting slimmer, but how about the internals? Using as a metric the number of external calls between the filesystem modules and the rest of the Linux kernel I argue that this is not the case. The evidence is a graph that shows the evolution of 15 filesystems from 2.6.11 to 2.6.28 along with the current state (2.6.28) for 24 filesystems. Some filesystems that stand out are: nfs for leading in both number of calls and speed of growth; ext4 and fuse for their above-average speed of growth and 9p for its roller coaster path."
This discussion has been archived. No new comments can be posted.

The Hairy State of Linux Filesystems

Comments Filter:
  • Is this a story? (Score:2, Insightful)

    by mlheur (212082) on Wednesday February 11, 2009 @05:14PM (#26818985)

    The briefness of the article and lack of actual functional analysis make me think this should have been a comment on the original /. article rather than a whole new article of its own.

    Slow news day?

  • Yes/no (Score:4, Insightful)

    by EmbeddedJanitor (597831) on Wednesday February 11, 2009 @05:22PM (#26819129)
    The number of calls in the interface do matter because they increase complexity. This makes fs maintainability and development a bit harder from version to version as it gets less clear what each call should do. Many of the calls are optional, or can be performed by defaults, which does help to simplify things.

    There is little calling overhead from using multiple calls. Of course these interface changes are all done for a good reason: performance, stability, security.

  • not following (Score:5, Insightful)

    by Eil (82413) on Wednesday February 11, 2009 @05:23PM (#26819143) Homepage Journal

    What's your argument here? That filesystem code in the kernel shouldn't be growing more sophisticated over time?

    This rings of the armchair-pundit argument that the kernel is getting more and more "bloated" and a breath later crying out that there still aren't Linux hardware drivers for every computing device ever made.

  • by Anonymous Coward on Wednesday February 11, 2009 @05:24PM (#26819167)
    Not being a filesystem/db geek, I honestly can't tell if "speed of growth" refers to "how frequently it's updated" or to "how rapidly it allocates space to store things". And I don't understand what the number of external calls means at ALL. Is that a bad thing? A good thing? Why? Can someone please provide some context? This doesn't have any at all!
  • Re:Yes/no (Score:3, Insightful)

    by shish (588640) on Wednesday February 11, 2009 @05:37PM (#26819357) Homepage

    The number of calls in the interface do matter because they increase complexity

    Replacing 100 lines of in-driver code to one function call from a shared library?

  • Re:Where's NTFS ? (Score:5, Insightful)

    by quickOnTheUptake (1450889) on Wednesday February 11, 2009 @05:49PM (#26819515)

    Did you know that Linux has limited NTFS support? I usually have to create a FAT32 partition to copy files between Windows XP and Linux. NTFS is usually read only or not available.

    Have you heard of NTFS-3G [ntfs-3g.org]?

    The NTFS-3G driver is a freely and commercially available and supported read/write NTFS driver for Linux, FreeBSD, Mac OS X, NetBSD, Solaris, Haiku, and other operating systems. It provides safe and fast handling of the Windows XP, Windows Server 2003, Windows 2000, Windows Vista and Windows Server 2008 file systems.

  • Re:Yes/no (Score:5, Insightful)

    by Suzuran (163234) on Wednesday February 11, 2009 @05:59PM (#26819653)

    Function calls are not free. Especially in kernel space. Everything costs time. You need to do the most you can with the least instructions. 100 lines of inline code will probably run faster than a function call.

  • by Vellmont (569020) on Wednesday February 11, 2009 @06:04PM (#26819723)

    The author of the article takes the position that filesystem external calls == "operating system size", and then proceeds to start measuring his new definition.

    What he never mentioned or even tries to justify was his metric. Why does more external calls (or as someone more accurately pointed out, diversity of external calls) equate to "operating system size"? Why does this equate to an even more abstract concept of "simple"?

    I don't see any reason to equate these measurements to such conclusions. "size" and "simple" are abstract concepts that involve a lot more than a simple count of external references.

  • by esampson (223745) on Wednesday February 11, 2009 @06:10PM (#26819795) Homepage

    Each unique external call represents a piece of code that has to be present to make the module work. Assuming the average size of the code referenced by an external function call doesn't change more unique calls would mean the module would need more code to support it. At least I believe that's what the author's thinking is.

    Of course that's a pretty big assumption. If you have more external calls because the code being called is leaner and only half the size on average then you could have a 50% increase in the number of function calls and still reduce footprint. Also if all of your calls go to modules that are highly utilized (i.e. most of the code in them is called) you could have a seriously reduced footprint over fewer external calls that are spread out among a large number of lightly utilized modules.

    And all of this, of course, ignores the fact that if you are going to be using a version of the operating system for a device such as a cell phone you probably wouldn't chose one of the file systems like NFS but would go for one better suited to the small amount of memory available.

  • by morgauo (1303341) on Wednesday February 11, 2009 @06:15PM (#26819863)

    I didn't see anything in the article where the author made a value statement, that it is bad (or good) that system calls are increasing. He was just pointing out that the trend is not towards simplicity in this area.

    I would also point out, that ext4 is very new and ntfs may not be new but never has been quite completed so active feature development could explain away the upward curves in their call counts, though not the absolute values.

  • Re:not following (Score:5, Insightful)

    by doshell (757915) on Wednesday February 11, 2009 @06:24PM (#26819971)

    I have a good idea to get the drivers while still eliminating the bloat.

    Have an option to compile the kernel during installation, based on detected devices.

    And then whenever you buy a new webcam, replace your graphics card, or whatever, the kernel must be recompiled. People will love that.

    Also: the Linux kernel is modular. This means you don't actually hold in memory, at every time, the drivers to all the hardware devices known to man. Only those your machine actually needs to run. The remaining are just files sitting on your hard disk, ready to be loaded should the need arise. This is an adequate way to keep down the bloat while not inconveniencing the user every time a new piece of hardware pops up.

  • Goofy metric, too. (Score:5, Insightful)

    by Ungrounded Lightning (62228) on Wednesday February 11, 2009 @07:24PM (#26820741) Journal

    Unless I've misread it, TFA's definition of "size" for a filesystem is "how many distinct external/kernel subroutines does it call?"

    That seems to be a very strange metric. Seems to me that a well-designed filesystem will have little code and make extensive (re)use of well-defined services elsewhere (rather than reinventing those wheels). This "slims" the filesystem's own unique code down to its own core functionality.

    Now maybe, if the functions are hooks in the OS for the filesystem, a larger number of them implies a poorer abstraction boundary. But it's not clear to me, absent more explanation, that this is what TFA is claiming.

  • Re:Yes/no (Score:4, Insightful)

    by hackerjoe (159094) on Wednesday February 11, 2009 @08:21PM (#26821315)

    What's your point? Processors can pipeline across branches just fine, and the main effect of cache is to give a performance boost to smaller code -- code that separates and reuses functions rather than inlining them willy-nilly.

    Inlining can still be a win sometimes, but compilers will do that for you automatically anyway...

  • Re:At least Reiser (Score:3, Insightful)

    by this great guy (922511) on Thursday February 12, 2009 @02:01AM (#26823757)
    At least he is doing better than his wife.
  • Re:Yes/no (Score:4, Insightful)

    by ultranova (717540) on Thursday February 12, 2009 @09:00AM (#26826063)

    Funny thing? A hundred increments runs within 1% of the speed of 100 calls to a function to do the increment. And yes I unrolled those calls to isolate the cost of what I was measuring. So... rather surprisingly, the cost of these function calls is as close as doesn't matter, to exactly zero.

    Not at all surprisingly, since 100 function calls and 100 integer additions will take so little time on a modern processor - and, I suspect, would even on an 8088 - that they amount to a rounding error. The machine's clock doesn't have sufficient resolution to measure them. You'd need a hundred million for a meaningful comparison.

  • Re:At least Reiser (Score:1, Insightful)

    by Anonymous Coward on Thursday February 12, 2009 @10:38AM (#26827085)

    Prison is punishment for misdeeds as well as a place for rehabilitation. The whole point of prison is to make it an undesirable place so people won't want to go there. Otherwise, for the more antisocial that only understand cause-and-effect ethics as opposed to normal ethics, what is the deterrent factor to keep them from breaking the law?

    If Hans Reiser wanted to spend the rest of his life hacking on Linux, he should have spent less time hacking his wife.

Computers will not be perfected until they can compute how much more than the estimate the job will cost.

Working...