The Hairy State of Linux Filesystems 187
RazvanM writes "Do the OSes really shrink? Perhaps the user space (MySQL, CUPS) is getting slimmer, but how about the internals? Using as a metric the number of external calls between the filesystem modules and the rest of the Linux kernel I argue that this is not the case. The evidence is a graph that shows the evolution of 15 filesystems from 2.6.11 to 2.6.28 along with the current state (2.6.28) for 24 filesystems. Some filesystems that stand out are: nfs for leading in both number of calls and speed of growth; ext4 and fuse for their above-average speed of growth and 9p for its roller coaster path."
Is this a story? (Score:2, Insightful)
The briefness of the article and lack of actual functional analysis make me think this should have been a comment on the original /. article rather than a whole new article of its own.
Slow news day?
Yes/no (Score:4, Insightful)
There is little calling overhead from using multiple calls. Of course these interface changes are all done for a good reason: performance, stability, security.
not following (Score:5, Insightful)
What's your argument here? That filesystem code in the kernel shouldn't be growing more sophisticated over time?
This rings of the armchair-pundit argument that the kernel is getting more and more "bloated" and a breath later crying out that there still aren't Linux hardware drivers for every computing device ever made.
What does this even mean? (Score:2, Insightful)
Re:Yes/no (Score:3, Insightful)
The number of calls in the interface do matter because they increase complexity
Replacing 100 lines of in-driver code to one function call from a shared library?
Re:Where's NTFS ? (Score:5, Insightful)
Did you know that Linux has limited NTFS support? I usually have to create a FAT32 partition to copy files between Windows XP and Linux. NTFS is usually read only or not available.
Have you heard of NTFS-3G [ntfs-3g.org]?
The NTFS-3G driver is a freely and commercially available and supported read/write NTFS driver for Linux, FreeBSD, Mac OS X, NetBSD, Solaris, Haiku, and other operating systems. It provides safe and fast handling of the Windows XP, Windows Server 2003, Windows 2000, Windows Vista and Windows Server 2008 file systems.
Re:Yes/no (Score:5, Insightful)
Function calls are not free. Especially in kernel space. Everything costs time. You need to do the most you can with the least instructions. 100 lines of inline code will probably run faster than a function call.
"Size" and "simple" is not so easily measured. (Score:3, Insightful)
The author of the article takes the position that filesystem external calls == "operating system size", and then proceeds to start measuring his new definition.
What he never mentioned or even tries to justify was his metric. Why does more external calls (or as someone more accurately pointed out, diversity of external calls) equate to "operating system size"? Why does this equate to an even more abstract concept of "simple"?
I don't see any reason to equate these measurements to such conclusions. "size" and "simple" are abstract concepts that involve a lot more than a simple count of external references.
Re:Do the number of calls really matter? (Score:3, Insightful)
Each unique external call represents a piece of code that has to be present to make the module work. Assuming the average size of the code referenced by an external function call doesn't change more unique calls would mean the module would need more code to support it. At least I believe that's what the author's thinking is.
Of course that's a pretty big assumption. If you have more external calls because the code being called is leaner and only half the size on average then you could have a 50% increase in the number of function calls and still reduce footprint. Also if all of your calls go to modules that are highly utilized (i.e. most of the code in them is called) you could have a seriously reduced footprint over fewer external calls that are spread out among a large number of lightly utilized modules.
And all of this, of course, ignores the fact that if you are going to be using a version of the operating system for a device such as a cell phone you probably wouldn't chose one of the file systems like NFS but would go for one better suited to the small amount of memory available.
Bloat? I didn't see anything about that! (Score:2, Insightful)
I didn't see anything in the article where the author made a value statement, that it is bad (or good) that system calls are increasing. He was just pointing out that the trend is not towards simplicity in this area.
I would also point out, that ext4 is very new and ntfs may not be new but never has been quite completed so active feature development could explain away the upward curves in their call counts, though not the absolute values.
Re:not following (Score:5, Insightful)
And then whenever you buy a new webcam, replace your graphics card, or whatever, the kernel must be recompiled. People will love that.
Also: the Linux kernel is modular. This means you don't actually hold in memory, at every time, the drivers to all the hardware devices known to man. Only those your machine actually needs to run. The remaining are just files sitting on your hard disk, ready to be loaded should the need arise. This is an adequate way to keep down the bloat while not inconveniencing the user every time a new piece of hardware pops up.
Goofy metric, too. (Score:5, Insightful)
Unless I've misread it, TFA's definition of "size" for a filesystem is "how many distinct external/kernel subroutines does it call?"
That seems to be a very strange metric. Seems to me that a well-designed filesystem will have little code and make extensive (re)use of well-defined services elsewhere (rather than reinventing those wheels). This "slims" the filesystem's own unique code down to its own core functionality.
Now maybe, if the functions are hooks in the OS for the filesystem, a larger number of them implies a poorer abstraction boundary. But it's not clear to me, absent more explanation, that this is what TFA is claiming.
Re:Yes/no (Score:4, Insightful)
What's your point? Processors can pipeline across branches just fine, and the main effect of cache is to give a performance boost to smaller code -- code that separates and reuses functions rather than inlining them willy-nilly.
Inlining can still be a win sometimes, but compilers will do that for you automatically anyway...
Re:At least Reiser (Score:3, Insightful)
Re:Yes/no (Score:4, Insightful)
Not at all surprisingly, since 100 function calls and 100 integer additions will take so little time on a modern processor - and, I suspect, would even on an 8088 - that they amount to a rounding error. The machine's clock doesn't have sufficient resolution to measure them. You'd need a hundred million for a meaningful comparison.
Re:At least Reiser (Score:1, Insightful)
Prison is punishment for misdeeds as well as a place for rehabilitation. The whole point of prison is to make it an undesirable place so people won't want to go there. Otherwise, for the more antisocial that only understand cause-and-effect ethics as opposed to normal ethics, what is the deterrent factor to keep them from breaking the law?
If Hans Reiser wanted to spend the rest of his life hacking on Linux, he should have spent less time hacking his wife.