Slashdot Log In
The Hairy State of Linux Filesystems
Posted by
timothy
on Wed Feb 11, 2009 04:09 PM
from the when-shrinkage-is-what-you-want dept.
from the when-shrinkage-is-what-you-want dept.
RazvanM writes "Do the OSes really shrink? Perhaps the user space (MySQL, CUPS) is getting slimmer, but how about the internals? Using as a metric the number of external calls between the filesystem modules and the rest of the Linux kernel I argue that this is not the case. The evidence is a graph that shows the evolution of 15 filesystems from 2.6.11 to 2.6.28 along with the current state (2.6.28) for 24 filesystems. Some filesystems that stand out are: nfs for leading in both number of calls and speed of growth; ext4 and fuse for their above-average speed of growth and 9p for its roller coaster path."
Related Stories
[+]
Technology: The Incredible Shrinking Operating System 345 comments
snydeq writes "The center of gravity is shifting away from the traditional, massive operating systems of the past, as even the major OSes are slimming their footprint to make code bases easier to manage and secure, and to increase the variety of devices on which they can run, InfoWorld reports. Microsoft, for one, is cutting down the number of services that run at boot to ensure Windows 7 will run across a spectrum of hardware. Linux distros such as Ubuntu are stripping out functionality, including MySQL, CUPS, and LDAP, to cut footprints in half. And Apple appears headed for a slimmed-down OS X that will enable future iPhones or tablet devices to run the same OS as the Mac. Though these developments don't necessarily mean that the browser will supplant the OS, they do show that OS vendors realize they must adapt as virtualization, cloud computing, netbooks, and power concerns drive business users toward smaller, less costly, more efficient operating environments."
[+]
Linux: Kernel Hackers On Ext3/4 After 2.6.29 Release 316 comments
microbee writes "Following the Linux kernel 2.6.29 release, several famous kernel hackers have raised complaints upon what seems to be a long-time performance problem related to ext3. Alan Cox, Ingo Molnar, Andrew Morton, Andi Keen, Theodore Ts'o, and of course Linus Torvalds have all participated. It may shed some light on the status of Linux filesystems. For example, Linus Torvalds commented on the corruption caused by writeback mode, calling it 'idiotic.'"
[+]
Linux: A Visual Expedition Inside the Linux File Systems 85 comments
RazvanM writes "This is an attempt to visualize the relationships among the Linux File Systems through the lens of the external symbols their kernel modules use. We took an initial look a few months back but this time the scope is much broader. This analysis was done on 1377 kernel modules from 2.6.0 to 2.6.29, but there is also a small dip into the BSD world. The most thorough analysis was done on Daniel Phillips's tree, which contains the latest two disk-based file systems for Linux: tux3 and btrfs. The main techniques used to establish relationships among file systems are hierarchical clustering and phylogenetic trees. Also presented are a set of rankings based on various properties related to the evolution of the external symbols from one release to another, and complete timelines of the kernel releases for Linux, FreeBSD, NetBSD, and OpenBSD. In all there are 78 figures and 10 animations."
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
Do the number of calls really matter? (Score:5, Interesting)
Re:Do the number of calls really matter? (Score:5, Informative)
Two things of note with NFS...
1. NFSv4 support was added. v4 is complex and has a lot of authentication stuff in it that wasn't in v3.
2. SunRPC is "part" of the NFS tree, but is effectively just a transport layer. It is completely abstracted, hence the numbers of symbols. It could be used for other stuff, so it pushes up that number too.
Parent
Re:Do the number of calls really matter? (Score:5, Funny)
Hi. I'm the infamous Anonymous Coward, and it's time we had a talk.
For years now, I've been enhancing the discussion on Slashdot through interesting interjections and humorous anecdotes (often about homosexual African Americans), but I feel things just aren't working out.
It takes me an awful lot of time, researching composing and spell chekcing the many hundreds of valuable posts I make a day, and although I don't request anything in return all I ever see is abuse. You moderate my comments down for absolutely no good reason.
I've had enough.
From this point on I'm just not going to bother. It's over.
I've been feeling this way for a while, slowly I've put less and less effort in my posts, repeating the same ideas over and over and, now, even started repeating whole posts verbatim.
It's been fun, Slashdot, but I'm disillusioned. You broke my heart, and I am never doing to give you the benefit of my insight again.
Be happy.
Love and regrets,
Anon.
Parent
Re:Do the number of calls really matter? (Score:5, Funny)
Hahaha disregard that! I suck cocks!
Parent
Yes/no (Score:4, Insightful)
There is little calling overhead from using multiple calls. Of course these interface changes are all done for a good reason: performance, stability, security.
Parent
Re: (Score:3, Insightful)
The number of calls in the interface do matter because they increase complexity
Replacing 100 lines of in-driver code to one function call from a shared library?
Re:Yes/no (Score:5, Insightful)
Function calls are not free. Especially in kernel space. Everything costs time. You need to do the most you can with the least instructions. 100 lines of inline code will probably run faster than a function call.
Parent
Re:Yes/no (Score:5, Informative)
Parent
Re:Yes/no (Score:4, Insightful)
What's your point? Processors can pipeline across branches just fine, and the main effect of cache is to give a performance boost to smaller code -- code that separates and reuses functions rather than inlining them willy-nilly.
Inlining can still be a win sometimes, but compilers will do that for you automatically anyway...
Parent
Re:Yes/no (Score:4, Interesting)
Simplicity of code is nearly always better than premature and not necessarily useful optimizations.
Parent
Re:Yes/no (Score:5, Informative)
Function calls are not free. Especially in kernel space. Everything costs time. You need to do the most you can with the least instructions. 100 lines of inline code will probably run faster than a function call.
Never having been one to accept unsupported claims at face value, I just tested that assertion on a Pentim-M here, with a small C program that either calls a function to increment a counter, or directly increments the counter a number of times. I compiled with O0 to be sure gcc does not change around my code at all. Just the instructions, thanks. Funny thing? A hundred increments runs within 1% of the speed of 100 calls to a function to do the increment. And yes I unrolled those calls to isolate the cost of what I was measuring. So... rather surprisingly, the cost of these function calls is as close as doesn't matter, to exactly zero.
Loops on the other hand... cost a huge amount. I won't get into details. But Intel clearly does something to optimize function calls in microcode, or probably even hardware. Function calls just don't cost what you think they do. In many cases, the function call will cost less by not trashing as much of that incredibly valuable L1 instruction cache.
Parent
Re:Yes/no (Score:4, Interesting)
I believe I read somewhere or other that branch predictors need a certain number of instructions between the branch instruction and the branch target in order to do a good job. If the only instruction in the loop is a single increment, that might explain the problem. Unrolling the loop so it has more instructions might fix it.
Parent
Re:Yes/no (Score:4, Interesting)
So... rather surprisingly, the cost of these function calls is as close as doesn't matter, to exactly zero.
If the compiler knows the relative address of the function ahead of time, they are really fast.
Try replacing your direct function call with a function pointer instead. Assign the function pointer the address of your function during runtime. It will be many orders of magnitude slower.
Not sure why this is; just something I discovered the hard way.
Parent
Re: (Score:3, Interesting)
Try replacing your direct function call with a function pointer instead. Assign the function pointer the address of your function during runtime. It will be many orders of magnitude slower.
It goes faster as an indirect functional call if anything. Go figure.
Anyway... orders of magnitude difference? Under some other rules of physics maybe. It would probably be a good idea to compile and time your program, as I did.
Re:Yes/no (Score:4, Insightful)
Not at all surprisingly, since 100 function calls and 100 integer additions will take so little time on a modern processor - and, I suspect, would even on an 8088 - that they amount to a rounding error. The machine's clock doesn't have sufficient resolution to measure them. You'd need a hundred million for a meaningful comparison.
Parent
Re:Yes/no (Score:5, Interesting)
> The number of calls in the interface do matter because they increase complexity.
That is only true, if a similar functionality is provided and the function-calls are of similar complexity (e.g. number of parameters, complexity of arguments.
To my limited knowledge, over work has been done to extract more common functionality from file-systems. Should that be the the case, it would increase the number of function calls, but reduce the overall complexity.
Parent
Re:Do the number of calls really matter? (Score:4, Interesting)
Yes, that sounds like "slimming down" to me. At least, I can understand what teh article is trying to get at. It seems like we went through a period of early operating system development over the past few decades where the stress was on throwing everything in, including the kitchen sink. It's at least interesting that Linux distros are putting in some amount of effort into pulling excess functionality out of the default installation while computers continue to become bigger, faster, stronger.
And I think it is pointing at something similar to what is going on with OSX, and it is a trend. We've hit some kind of a milestone, I think, where most of our computer functionality is "good enough" for most of what we actually use them for. Something about the development of computer systems right now reminds me of... whenever it was... 10 years ago?... when people were using their computers mostly for word-processing, and their computers were good enough for that, so there wasn't a huge drive to accomplish a particular thing. Then people discovered that they could rip CDs into MP3s and share them, and there grew this whole new focus on multimedia and the Internet.
Now we have those things handled, and it seems like the answer to "what's next?" is making both hardware and software smaller and less bloated. We're getting smart phones that are becoming something more like a real portable computer, and we're getting things like netbooks. I predict you're also going to start seeing better use of embedded systems, like maybe DVRs are just going to be built into TVs soon. Not sure on that one, but I think you're going to see things shrinking, devices being consolidated, and a renewed focus on making things more efficient and refined.
Meh. It's rambling time...
=Smidge=
Parent
Re:Do the number of calls really matter? (Score:5, Interesting)
Ever since.... well, the first abstraction there's been a holy flamewar of abstractions versus spaghetti code. The one side of the war claims that by building enough layers each layer is simple, well-understood with well-defined interactions and thus fairly bugfree. The other side claims that abstractions wrap things in so many layers that the whole code is like an onion without substance, separating cause from effect so it's difficult to grasp and that these layers seriously hurt performance. The answer is usually to do is simple if possible, complex if necessary. Of calls went up and performance went up it's probably necessary, but isolated an increase in cross calls would be a bad thing.
Parent
Re: (Score:3, Insightful)
Each unique external call represents a piece of code that has to be present to make the module work. Assuming the average size of the code referenced by an external function call doesn't change more unique calls would mean the module would need more code to support it. At least I believe that's what the author's thinking is.
Of course that's a pretty big assumption. If you have more external calls because the code being called is leaner and only half the size on average then you could have a 50% increase in
Is this a story? (Score:2, Insightful)
The briefness of the article and lack of actual functional analysis make me think this should have been a comment on the original /. article rather than a whole new article of its own.
Slow news day?
Re:Is this a story? (Score:5, Informative)
I don't like that it was restricted to just Linux FSes, comparing it against ones available for other OSes, would have given it at least some context. Based upon the article, it sounds like Linux is being trounced. But, one doesn't really know because there isn't a comparison to other OSes to have any clue whatsoever.
Parent
Goofy metric, too. (Score:5, Insightful)
Unless I've misread it, TFA's definition of "size" for a filesystem is "how many distinct external/kernel subroutines does it call?"
That seems to be a very strange metric. Seems to me that a well-designed filesystem will have little code and make extensive (re)use of well-defined services elsewhere (rather than reinventing those wheels). This "slims" the filesystem's own unique code down to its own core functionality.
Now maybe, if the functions are hooks in the OS for the filesystem, a larger number of them implies a poorer abstraction boundary. But it's not clear to me, absent more explanation, that this is what TFA is claiming.
Parent
At least Reiser (Score:5, Funny)
got to make one call...
Re:At least Reiser (Score:5, Informative)
Off topic, but just in case anyone is curious as to how Hans Reiser is doing in prison...
Not particularly well so far: http://www.kcbs.com/pages/3634907.php [kcbs.com]?
Parent
Re: (Score:3, Insightful)
Re:At least Reiser (Score:4, Interesting)
Being dead doesn't sound too bad to me. The process of dying almost always sucks and I don't want to be dead, but once I am dead I can guarantee you I won't give a shit about it.
Parent
Re: (Score:3, Funny)
Don't be so sure about that...
People often shit their pants [internatio...roject.org] right after they're dead, eg if they're hanged or electrocuted.
Re:At least Reiser (Score:4, Funny)
Parent
What? (Score:5, Interesting)
While OSes may be "sliming down" as the article says, what does the removal of standard db packages from Ubuntu have to do with filesystem-related kernel calls?
The article doesn't seem to mention the possiblity that more functionality may be pushed into the kernel from userspace, which might make sense in other situations, but I don't think that argument would hold up here.
I am struggling to make the connection between the summary and the so-called article. The fact that they are not stripping/locking fs functionality means that OSes aren't shrinking? That's the hypothesis?
Where's NTFS ? (Score:5, Funny)
You are kidding arent you ?
Are you saying that this linux can run on a computer without windows underneath it, at all ? As in, without a boot disk, without any drivers, and without any services ?
That sounds preposterous to me.
If it were true (and I doubt it), then companies would be selling computers without a windows. This clearly is not happening, so there must be some error in your calculations. I hope you realise that windows is more than just Office ? Its a whole system that runs the computer from start to finish, and that is a very difficult thing to acheive. A lot of people dont realise this.
Microsoft just spent $9 billion and many years to create Vista, so it does not sound reasonable that some new alternative could just snap into existence overnight like that. It would take billions of dollars and a massive effort to achieve. IBM tried, and spent a huge amount of money developing OS/2 but could never keep up with Windows. Apple tried to create their own system for years, but finally gave up recently and moved to Intel and Microsoft.
Its just not possible that a freeware like the Linux could be extended to the point where it runs the entire computer fron start to finish, without using some of the more critical parts of windows. Not possible.
I think you need to re-examine your assumptions.
Re:Where's NTFS ? (Score:5, Funny)
Dude Microsoft is giving up on NTFS for WinFS with Windows 7.0. Get your facts straight before you start to character assassinate an operating system. WinFS was to be a part of Vista, but Microsoft removed it before the retail version in order to meet deadlines.
Did you know that Linux has limited NTFS support? I usually have to create a FAT32 partition to copy files between Windows XP and Linux. NTFS is usually read only or not available. Pfffssssttt!
Just like wine, Microsoft will not release a finished product before its time.
Parent
Re:Where's NTFS ? (Score:5, Insightful)
Did you know that Linux has limited NTFS support? I usually have to create a FAT32 partition to copy files between Windows XP and Linux. NTFS is usually read only or not available.
Have you heard of NTFS-3G [ntfs-3g.org]?
The NTFS-3G driver is a freely and commercially available and supported read/write NTFS driver for Linux, FreeBSD, Mac OS X, NetBSD, Solaris, Haiku, and other operating systems. It provides safe and fast handling of the Windows XP, Windows Server 2003, Windows 2000, Windows Vista and Windows Server 2008 file systems.
Parent
Re: (Score:3)
Re:Where's NTFS ? (Score:5, Informative)
1. The AC was a satire. In fact, I remember reading those exact lines at least once before. Its actually quite funny, so props to the original troll for making something really nice to read.
2. ntfs-3g should be all you need to handle read/writes in Linux these days. I think its nested on top of fuse, so you'll probably need it as well. (Side note, glad Linus finally caved on allowing fuse into his kernel releases)
3. WinFS is a meta-layer on top of NTFS, so not in itself a disk file-system.
Parent
Re:Where's NTFS ? (Score:5, Funny)
In Linux, the open office might be the default for editing your wordfiles, and you might prefer ubuntu brown over the grassy knoll of the windows desktop, but mark my words young man - without the windows drivers sitting below the visible surface, allowing the linus to talk to the hardware, it is without worth.
And so, by choosing your linux as an alternative to windows on the desktop, you still need a windows licence to run this operating system through the windows drivers to talk to the hardware. Linux is only a code, it cannot perform the low level function.
My point being, young man, that unless you intend to pirate and steal the Windows drivers and services, how is using the linux going to save money ? Well ? It seems that no linux fan can ever provide a straight answer to that question !
May as well just stay legal, run the Windows drivers, and run Office on the desktop instead of the linus.
Parent
Re:Where's NTFS ? (Score:5, Interesting)
Sometimes I wish there was a way to make my own meta-mod, like "don't include mods from the people that modded this up ever again". The same copy-paste has been in tons of stories now, and it's not funny anymore because it's the EXACT same thing. I'd even rather hear one more variation on our insensitive clod overlords from Soviet Russia.
Parent
Original reference for this post ? (Score:3, Informative)
Sorry to disrupt the trolling copy-pasta, but :
Is this post on ZDNet's forum [zdnet.com] the original form of this troll ?
Or is this troll older, and jerryleecooper was already copying it from somewhere else ?
I'm just curious to know where this fine piece of humorous trolling was originally born.
Thoughts (Score:5, Informative)
Thoughts:
- This is measuring, I believe, calls to different functions; a call to one function from multiple places is only counted once. So it's really a measure of the diversity of external calls.
- Size and complexity aren't necessarily the same thing. It's actually possible that as common functionality is abstracted out of filesystems, they get smaller but make more external calls. There was a point a few years ago when this was happening at quite a rapid pace in the fs code, I don't know if it is still true.
- Journalled filesystems and networked filesystems are pretty complex creatures by their nature, the quoted numbers don't seem unreasonable. NFS in particular implements (IIRC) protocol versions 2, 3 and 4, and 4 had a lot of new stuff.
Re:Thoughts (Score:4, Interesting)
In fact, if you think about it, the greater the number of different functions a filesystem driver uses, the less functionality it needs to have within itself. I also don't think the number of external calls is a significant measure of anything related to the size or performance, really. It all depends on what calls are being made and for what purpose.
If anything, as you imply, it's a measure of complexity. But even that might not really be the case if you stop and think about it. As more stuff is abstracted out, the less code goes into the filesystem code, the simpler, really, not more complex that filesystem driver becomes.
I think this was a really poor choice of metric and that almost renders this entire article moot.
Parent
Re: (Score:3, Informative)
Yeah. There was a stage (starting in the 2.3 days, I think) where the kernel gradually grew a very complete "framework" which the filesystems just plugged into, basically filling in the gaps. Straightforward, unixy filesystems became ridiculously simple to implement, and even the more complex ones got a lot of non-essential complexity combed out. Of course, there were a fair few specialized callbacks and utility functions made available to the fs code as part of this, and that may have pushed the unique c
not following (Score:5, Insightful)
What's your argument here? That filesystem code in the kernel shouldn't be growing more sophisticated over time?
This rings of the armchair-pundit argument that the kernel is getting more and more "bloated" and a breath later crying out that there still aren't Linux hardware drivers for every computing device ever made.
Re:not following (Score:5, Insightful)
And then whenever you buy a new webcam, replace your graphics card, or whatever, the kernel must be recompiled. People will love that.
Also: the Linux kernel is modular. This means you don't actually hold in memory, at every time, the drivers to all the hardware devices known to man. Only those your machine actually needs to run. The remaining are just files sitting on your hard disk, ready to be loaded should the need arise. This is an adequate way to keep down the bloat while not inconveniencing the user every time a new piece of hardware pops up.
Parent
The state (Score:5, Funny)
The state of Linux filesystems may be in disarray, but it's nothing to kill your wife over...
*rimshot*
What does this even mean? (Score:2, Insightful)
Re: (Score:3, Informative)
Unfortunately all "speed of growth" is referring to is the rate of increase of the number of filesystem kernel calls of a particular filesystem from version 2.6.11 to 2.6.28 of the Linux kernel.
Nothing to do with any sort of performance metric.
Check out Tux3 (Score:5, Informative)
While Tux3 [tux3.org] is not yet ready to run on your desktop, and won't be for a good many months, it is relatively trim at around 6K lines, and is expected to be somewhere around 10K complete with versioning, recovery and proper code comments. Of course, that will still be significant growth in a few months, and nothing says it won't just keep growing. But Tux3 is starting much smaller than its peers, and already has a pretty good range of "big filesystem" features. One of our guiding principles is to keep it tight, therefore leaving fewer places for bugs to hide.
Re:Check out Tux3 (Score:4, Funny)
Let us count the number of fast, slim projects have been sucked down this way...
/ext2fs fanatic
//I can shredses the files... yes I can, and I know it will workses...
Programmer: "This shit is bloated. I'm starting a new project that will be slim and fast"
<type type type>
<build build make>
User 1: "This is really nice and fast, but I need feature X"
<add add add>
Users 2 & 3: "I'd use it, but I really love $OTHER_PROGRAM's Y"
Programmer: "Grrr..."
<add add add>
User 4: "I've heard that Z is doing $SPIFFY, why doesn't this do that?"
<type type add add add add build>
User 5: "This is all big and slow and bloated... I'm going with N instead..."
Programmer: "Fuck you! Fuck you all motherfuckers!"
Not to say that Tux3 will go or is going this way. Indeed, as long as people stick around who remember the guiding principle of keeping it small it shouldn't. Best of luck!
Parent
"Size" and "simple" is not so easily measured. (Score:3, Insightful)
The author of the article takes the position that filesystem external calls == "operating system size", and then proceeds to start measuring his new definition.
What he never mentioned or even tries to justify was his metric. Why does more external calls (or as someone more accurately pointed out, diversity of external calls) equate to "operating system size"? Why does this equate to an even more abstract concept of "simple"?
I don't see any reason to equate these measurements to such conclusions. "size" and "simple" are abstract concepts that involve a lot more than a simple count of external references.
Remember: filesystems are optionnal (Score:3, Interesting)
I think that you can compile only the filesystem you want in the kernel..
So the only complexity which matter to an user is the one of the filesystem they select to compile in the kernel!
Re: (Score:3, Interesting)
Nevermind COMPILING stuff. You can just plain choose not to USE stuff.
Don't want the "bloat" of NFS or ext4, then don't bloody use them.
Yeah, the spiffy new things or inherently complex things might
show that complexity in the code. Imagine that. The source for
Halo looks bigger than the source for Pacman.
There is no news here.
As an nvidia user, ATI can make their Linux drivers as bad and
as bloated as they want. I don't care. It really doesn't effect
me.