Please create an account to participate in the Slashdot moderation system


Forgot your password?
Trust the World's Fastest VPN with Your Internet Security & Freedom - A Lifetime Subscription of PureVPN at 88% off. Also, Slashdot's Facebook page has a chat bot now. Message it for stories and more. ×
Operating Systems Hardware Linux

Linux May Need a Rewrite Beyond 48 Cores 462

An anonymous reader writes "There is interesting new research coming out of MIT which suggests current operating systems are struggling with the addition of more cores to the CPU. It appears that the problem, which affects the available memory in a chip when multiple cores are working on the same chunks of data, is getting worse and may be hitting a peak somewhere in the neighborhood of 48 cores, when entirely new operating systems will be needed, the report says. Luckily, we aren't anywhere near 48 cores and there is some time left to come up with a new Linux (Windows?)."
This discussion has been archived. No new comments can be posted.

Linux May Need a Rewrite Beyond 48 Cores

Comments Filter:
  • by eldavojohn ( 898314 ) * <> on Thursday September 30, 2010 @11:48AM (#33748820) Journal

    It appears that the problem, that affect the available memory in a chip when multiple cores are working on the same chunks of data, is getting worse and may be hitting a peak somewhere in the neighborhood of 48 cores, when entirely new operating systems will be needed, the report says.

    Seriously? You picked that over my submission?

    I submitted this earlier this morning I guess my submission was lacking []. But if you're interested in the original MIT article [] and the actual paper [] (PDF):

    eldavojohn writes "Multicore (think tens or hundreds of cores) will come at a price for current operating systems. A team at MIT found that as they approached 48 cores their operating system slowed down []. After activating more and more cores in their simulation, a sort of memory leak occurred whereby data had to remain in memory as long as a core might need it in its calculations. But the good news is that in their paper [] (PDF), they showed that for at least several years Linux should be able to keep up with chip enhancements in the multicore realm. To handle multiple cores, Linux keeps a counter of which cores are working on the data. As a core starts to work on a piece of data, Linux increments the number. When the core is done, Linux decrements the number. As the core count approached 48, the amount of actual work decreased and Linux spent more time managing counters. But the team found that 'Slightly rewriting the Linux code so that each core kept a local count, which was only occasionally synchronized with those of the other cores, greatly improved the system's overall performance.' The researchers caution that as the number of cores skyrockets [], operating systems will have to be completely redesigned [] to handle managing these cores and SMP []. After reviewing the paper, one researcher is confident Linux will remain viable for five to eight years without need for a major redesign."

    I don't know, guess I picked a bad title or something?

    Luckily we aren't anywhere near 48 cores and there is some time left to come up with a new Linux (Windows?).

    Again, seriously? What does "(Windows?)" even mean? As you pass a certain number of cores, modern operating systems will need to be redesigned to handle extreme SMP. It's going to differ from OS to OS but we won't know about Windows until somebody takes the time to test it.

    • by VorpalRodent ( 964940 ) on Thursday September 30, 2010 @11:54AM (#33748932)

      What does "(Windows?)" even mean?

      I read that as saying "Windows is the new Linux!". Clearly the submitter is trying to incite violence in the Slashdot community.

    • by Dragoniz3r ( 992309 ) on Thursday September 30, 2010 @11:54AM (#33748940)
      Oh look, CmdrTaco published yet another story with a poorly-written, hypersensationalist summary! Par for the course.
    • by klingens ( 147173 ) on Thursday September 30, 2010 @11:55AM (#33748962)

      Yes it is lacking: it's too long for a /. "story". Editors want small, easily digested soundbites, not articles with actual information.

    • by eudaemon ( 320983 ) on Thursday September 30, 2010 @11:56AM (#33748964)

      I just laughed at the "we aren't anywhere near 48 cores" comment - there are already commercial products with more than 48 cores now. I mean even a crappy old T5220 pretends to have 64 CPUs due to the 8 CPU, 8 thread design.

      • by WinterSolstice ( 223271 ) on Thursday September 30, 2010 @12:06PM (#33749116)

        Got a pile of AIX servers here like that: []

        I was kind of wondering about the "modern operating systems" comment... I think he meant "desktop operating systems".
        Many of the big OS vendors (IBM, DEC (now HP), CRAY, etc) are well beyond this point. Even OS/2 could scale to 1024 processors if I recall correctly.

        • Re: (Score:3, Informative)

          by Anonymous Coward

          OS/2's SMP support is a joke. I'm sure that somewhere in that tangle is a comment like "up to 1024 processors". But it's as relevant as a sticker on a Ford Cortina warning not to exceed the speed of sound.

          Officially the SMP version of OS/2 "Warp Server" supported 64 processors. In practice anything other than an embarrassingly parallel task would see rapidly diminishing returns after just a couple of CPUs. The stuff that this article is moaning about, that Linux doesn't do well enough on 48 CPUs? OS/2 doesn

          • Re: (Score:3, Funny)

            by Old97 ( 1341297 )
            Wow, you've convinced me. I'm canceling all my plans to migrate to OS/2. Thanks.
        • by drsmithy ( 35869 ) <drsmithy&gmail,com> on Thursday September 30, 2010 @01:09PM (#33750212)

          I was kind of wondering about the "modern operating systems" comment... I think he meant "desktop operating systems".

          What's a "desktop operating system" these days ? The only mainstream OS that hasn't seen extensive use and development in SMP server environments for a decade plus is OS X. For all the others, "desktop" vs "server" is just a matter of the bundled software and kernel tuning.

          Even OS/2 could scale to 1024 processors if I recall correctly.

          Yeah. Just like those old PPC Macs were "up to twice as fast" as a PC.

      • Re: (Score:3, Informative)

        by Skal Tura ( 595728 )

        nevermind quite an standard server, a dual xeon 6core HT... total reported CPUs is 24, and it's quite a lot used and nothing special.

      • by Perl-Pusher ( 555592 ) on Thursday September 30, 2010 @12:08PM (#33749182)
        Core !=CPU
        • Re: (Score:3, Insightful)

          by Unequivocal ( 155957 )

          Elaborate please. I'm ignorant and curious.

          • Re: (Score:3, Informative)

            by bberens ( 965711 )
            A CPU can contain multiple cores which share Level 2 cache. Conversely a multi-CPU system has multiple complete CPUs which do not share their L2 cache.
      • by TheRaven64 ( 641858 ) on Thursday September 30, 2010 @12:12PM (#33749234) Journal

        And it's worth noting that the most common application for that kind of machine is to partition it and run several different operating systems on it. Solaris has already had some major redesign work for scaling that well. For example, the networking stack is partitioned both horizontally and vertically. Separate connections are independent except at the very bottom of the stack (and sometimes even then, if they go via different NICs), and each layer in the stack communicates with the ones above it via message passing and runs in a separate thread.

        However, it sounds like this paper is focussing on a very specific issue: process accounting. To fairly schedule processes, you need to work out how much time they have spent running already, relative to others. I'm a bit surprised that Linux actually works as they seem to be describing, since their 'change' was to make it work in the same way as pretty much every other SMP-aware scheduler that I've come across; schedule processes on cores independently and periodically migrate processes off overloaded cores and onto spare ones.

        There are lots of potential bottlenecks. The one I was expecting to hear about was cache contention. In a monolithic kernel, there are some data structures that must be shared among each core and every tim you do an update on one core you must flush the caches on all of them, which can start to hurt performance when you have lots of concurrent updates. A few important data structures in the Linux kernel were rewritten in the last year to ensure that unrelated portions of them ended up in different cache lines, to help reduce this.

        Even then, it's not a problem that's easy to solve at the software level. Hardware transactional memory would go a long way towards helping us scale to 128+ processors, but the only chip I know of to implement it (Sun's Rock) was cancelled before it made it into production.

        • by joib ( 70841 ) on Thursday September 30, 2010 @03:35PM (#33752388)

          Unfortunately, the summary as well as the short articles on the web were more or less completely missing the point. The actual paper ( [] ) explains what was done.

          Essentially they benchmarked a number of applications, figured out where the bottlenecks were, and fixed them. Some of the things they fixed where done by introducing "sloppy counters" in order to avoid updating a global counter. Others were to switch to more fine-grained locking, switching to per-cpu data structures, and so forth. In other words, pretty standard kernel scalability work. As an aside, a lot of the VFS scalability work seems to clash with the VFS scalability patches by Nick Piggin that are in the process of being integrated into the mainline kernel.

          And yes, as the PDF article explains, the Linux cpu scheduler mostly works per-core, with only occasional communication with schedulers on other cores.

      • by monkeySauce ( 562927 ) on Thursday September 30, 2010 @12:51PM (#33749892) Journal
        The article is about cores per chip, not cores per system.

        You're trying to compare a 48-cylinder engine with a bunch of 4-cylinder engines working together.
    • Re: (Score:2, Informative)

      by Anonymous Coward

      I don't know, guess I picked a bad title or something?

      No. Your summary was too long.

      Seriously, the purpose of a summary is not to include every last fact and detail mentioned in the article; it's to give the reader enough information to decide whether reading the full article is worth it. Don't try to put everything in there.

      • Re: (Score:3, Informative)

        by Dahamma ( 304068 )

        the purpose of a summary is not to include every last fact and detail mentioned in the article; it's to give the reader enough information to decide whether reading the full article is worth it.

        If you think a summary can actually help get a /. reader to RTFA, you must be new here...

      • by BeardedChimp ( 1416531 ) on Thursday September 30, 2010 @12:16PM (#33749300)
        The purpose of an editor is to edit any submissions to make them ready for print.

        If the summary was too long, the editor should have got off his arse rather than wait for the summary that fits the word count to come along.
      • by Wonko the Sane ( 25252 ) on Thursday September 30, 2010 @12:25PM (#33749414) Journal

        Your summary was too long.

        Yes, but the submission that got accepted has a bullshit headline.

        Of course "Linux May Need to Continue Making Incremental Changes Like It Has Been Doing For The Last Several Years To Scale Beyond 48 Cores" doesn't draw in as many clicks.

      • Re: (Score:3, Informative)

        by X0563511 ( 793323 )

        I've seen longer stories about lamer things get published...

      • Re: (Score:3, Funny)

        it's to give the reader enough information to decide whether reading the full article is worth it.

        We are supposed to read the articles? Why didn't anyone tell me about this before?!!

    • (Windows?)

      I thought he was implying that we will also need to come up with a new Windows.

    • by Skal Tura ( 595728 ) on Thursday September 30, 2010 @12:04PM (#33749100) Homepage

      Scare piece.

      Your submission wasn't scaring enough. From your submission, it seems that it's not that big of a deal and rather easy solution. This submission makes it sound like linux kernel needs a complete rewrite ground-up, as in starting from scratch.
      Plus yours was a bit long and lots of details.

      • by h4rr4r ( 612664 ) on Thursday September 30, 2010 @12:36PM (#33749612)

        Linux supposedly scales to 1024 or something like that. This is not what they supposedly scale to, but the performance impact of actually trying to use that many cores.

        • by TheNetAvenger ( 624455 ) on Thursday September 30, 2010 @02:47PM (#33751660)

          The point isn't that NT Scales to 256 cores, the point is how efficient it is when scaling to this many processors. The NT Kernel in Win7 was adjusted so that systems with 64 or 256 CPUs have a very low overhead handling the extra processors.

          Linux in theory (just like NT in theory) can support several thousand processors, but there is a level that this becomes inefficient as the overhead of managing the additional processors saturates a single system. (Hence other multi-SMP models are often used instead of a single 'system')

          Just simply Google/Bing: windows7 256 Mark Russinovich

          You can find nice articles and even videos of Mark talking about this in everyday terms to make it easy to understand.

          • Re: (Score:3, Informative)

            by walshy007 ( 906710 )

            The point is the article dealing with a simulated theoretical cpu with 48+ cores on a single die with shared l2 cache.

            The changes made are incremental and I imagine will be dealt with long before this actually becomes an issue when (or if) we get cpus with that many cores on a single die.

            multi socket systems are already immune to this the way it is setup, you could have an 8 socket system with each cpu having 8 cores and it would not show the problems shown in the article.

            In other words, business as usual,

    • Re: (Score:3, Insightful)

      I don't know, guess I picked a bad title or something?

      Slashdot: dramatically overstated news for nerds... since that seems to be the evolution of news services for some reason?

      I'm working on a submission: Fox news just had a bit about the internet, I'm assuming that their headline is something like "WILL USING OBAMANET 'IPv6' KILL YOU AND MAKE YOUR CHILDREN TERRORISTS?"

    • Re: (Score:3, Informative)

      by aywwts4 ( 610966 )

      If it is any consolation this straw is the one that broke the RSS feed's back.

      I have unsubscribe from Slashdot today due to the trend typified in your article VS the one published. (No this is not a new trend, but I'm fed up and finished with it.) See you on Reddit's Science/Linux/Everything else

    • Re: (Score:3, Interesting)

      by Lumpy ( 12016 )

      you are not in the club of liked submitters. Honestly the number of crap submissions that get picked over well thought out and very well cited ones is nuts to the point that I simply stopped submitting stories here. Its a waste of time.

    • Re: (Score:3, Insightful)

      by GooberToo ( 74388 )

      Completely agree.

      Of course, this all ignores the fact that Linux already scales well beyond 48 cores. Even more so, it appears the group is confusing bus contention for OS scalability. The problem is, using modern CPUs (cores), they are sharing caching, which is all too frequently the real problem. The shared cache leads to cache contention.

      Linux, right now, is capable of scaling well beyond 128 cores (err...cpus)...and more... Its just not standard code because the overhead is less optimal for 99.999% of t

  • by Chirs ( 87576 ) on Thursday September 30, 2010 @11:52AM (#33748882)

    SGI has some awfully big single-system-image linux boxes.

    I saw a comment on the kernel mailing list about someone running into problems with 16 terabytes of RAM.

    • It's not the case of not being able to do such, but instead about where there are performance regressions. Of course it's possible to run Linux on multiple hundreds of cores, but it seems that after 48 cores there is a performance regression and thus all those cores don't benefit as much as they could. That is the issue here.

      • by DrgnDancer ( 137700 ) on Thursday September 30, 2010 @12:15PM (#33749274) Homepage

        I thought this as well, but after more carefully reading the article, I *think* I see what the problem is. It's not really a problem with large numbers of cores in a system, so much as a problem with large numbers of cores on a chip. Since the multicore chips share caches (level 2 cache is shared, level 1 cache isn't IIRC, but I could be wrong) it's actually cache memory where the issue lies. I've worked on single system image SGI systems with 512 cores, but those systems were actually 256 dual core chips. That works fine, and assuming well written SMP code performance scales as you'd expect with number of cores.

        • Re: (Score:3, Interesting)

          by Gaygirlie ( 1657131 )

          Since the multicore chips share caches (level 2 cache is shared, level 1 cache isn't IIRC, but I could be wrong) it's actually cache memory where the issue lies.

          That's what I thought too, but after thinking it a bit more I'd dare to claim it's both a hardware and software issue. Too small cache of course does cause issues like the researchers noticed but it's mostly because the method how memory accesses and cache is handled in software that makes it such a big issue. Rethinking the approach how kernel hand

    • by TheRaven64 ( 641858 ) on Thursday September 30, 2010 @12:18PM (#33749324) Journal

      SGI has some awfully big single-system-image linux boxes.

      Not really. SGI has big NUMA machines, with a single Linux kernel per node (typically under 8 processors), some support for process / thread migration between nodes, and a very clever memory controller for automatically handle accessing and caching remote RAM. Each kernel instance is only responsible for a few processes. They also have a lot of middleware on top of the kernel that handles process distribution among nodes.

      It's an interesting design, and the SGI guys have given a lot of public talks about their systems so it's easy to find out more, but it is definitely not an example of Linux scaling to large multicore systems.

  • by El_Muerte_TDS ( 592157 ) on Thursday September 30, 2010 @11:52AM (#33748898) Homepage

    They have an one-off error in their math, it's actually 9 times a 6 core CPU. So, at 42 cores a rewrite is needed.

  • Dunno... I am typing this on a system with 12 cores and 24 virtual cores. And the GPU has somewhere around 1600 cores... Other systems I've worked with have hundreds to thousands of cores so I think we are pretty close...

    Seriously though, these issues have been known for a while but will have to trickle down to desktop OSs to deal with caching and shared memory.

  • Enough (Score:3, Funny)

    by wooferhound ( 546132 ) <tim@ w o o f e r h o u n> on Thursday September 30, 2010 @11:54AM (#33748928) Homepage
    640 cores ought to be enough for anybody . . .
  • by pclminion ( 145572 ) on Thursday September 30, 2010 @11:55AM (#33748948)
    Can somebody please explain what the fuck they are actually talking about? They've dumbed down the terminology to the point I have no idea what they are saying. Is this some kind of cache-related issue? Inefficient bouncing of processes between cores? What?
    • by jd ( 1658 ) <imipak@yaho[ ]om ['o.c' in gap]> on Thursday September 30, 2010 @12:32PM (#33749556) Homepage Journal

      What they are talking about really reduces to a variant of Ahmdals Law, but simply put scaling is always non-linear. There will be overheads per core for communication (why is why SMP over 16 CPUs is such a headache) and overheads per core within the OS for housekeeping (knowing what core a specific thread is running on, whether it is bound to that core, etc, and trying to schedule all threads to make best use of the cores available).

      The more cores you have, the more state information is needed for a thread and the more possible permutations the scheduler must consider in order to be efficient. Which, in turn, means the scheduler is going to be bulkier.

      (Scheduling is a variant of the box-packing problem, which is an NP-Complete problem, but it has the added catch that you only get a very short time to pack the threads in and scheduling policies - such as realtime and core-binding - must also be satisfied in addition to packing all the threads in.)

      The more of this extra data you need, the slower task-switching becomes and the more of the cache you are hogging with stuff not actually tied to whatever the threads are actually doing. At some point, the degradation in performance will exactly equal the increase in performance for the extra cores. The claim is that this happens at 48 cores for modern OS'. This is plausible but it is unclear if it is an actual problem. Those same OS' are used on supercomputers of 64+ cores, by segregating the activities in each node. MOSIX, Kerrighd and other such mechanisms have allowed Linux kernels to migrate tasks from one node to another transparently. (ie: You don't know or care where the code runs, the I/O doesn't change at all.) The only reason Linux doesn't have clustering as standard is that Linus is waiting for cluster developers to produce a standard mechanism for process migration that also fits within the architectural standards already in use.

      If you clustered a couple of hundred nodes, each with 48 cores, you're looking at having around 2000+ on the system. It wouldn't take a "rewrite" per-se, merely a few hooks and a standard protocol. To support a single physical node with more than 48 cores, you might need to split it into virtual nodes with 48 or fewer cores in each, but Linux already has support for virtualization so that's no big deal either.

  • Only Linux? (Score:4, Interesting)

    by Ltap ( 1572175 ) on Thursday September 30, 2010 @11:57AM (#33748990) Homepage
    It looks like TFS was written by a Windows fanboy; why mention Linux specifically when it is a general problem? Why try to half-assedly imply that Windows is more advanced than Linux?
    • Re:Only Linux? (Score:4, Insightful)

      by Attila Dimedici ( 1036002 ) on Thursday September 30, 2010 @12:08PM (#33749178)
      Having read eldavojohn's post that summarizes the article, it appears that the reason to pick out Linux specifically is because that is the OS that the writers of the paper actually tested. Since Windows uses a different system for keeping track of what various cores are doing it is likely that Windows will run into this problem at a different number of cores. However, until someone conducts a similar test using Windows we will not know if that number is more or less than 48.
  • UNIX and C were great in their days. But perhaps not in the meg-core era.
  • 64 cores (Score:3, Interesting)

    by hansamurai ( 907719 ) <> on Thursday September 30, 2010 @11:58AM (#33749000) Homepage Journal

    At my last job we had a bunch of Sun T5120s which housed 64 cores. So yeah, we are "anywhere near 48".

  • Jaguar? (Score:2, Insightful)

    Cray [] seems to have addressed this problem, yes?
  • 48 cores? (Score:4, Funny)

    by drunkennewfiemidget ( 712572 ) on Thursday September 30, 2010 @12:00PM (#33749036) Homepage

    I'm still waiting for Windows to work well on ONE.

  • by r00t ( 33219 ) on Thursday September 30, 2010 @12:05PM (#33749102) Journal

    No kidding. SGI's Altix is a huge box full of multi-core IA-64 processors. 512 to 2048 cores is more normal, but they were reaching 10240 last I checked. This is SMP (NUMA of course), not a cluster. I won't say things work just lovely at that level, but it does run.

    48 cores is nothing.

  • by Punto ( 100573 ) <puntob@ g m a i l . com> on Thursday September 30, 2010 @12:20PM (#33749354) Homepage

    Nobody's every going to need more than 640 cores

  • how is this news? (Score:4, Insightful)

    by dirtyhippie ( 259852 ) on Thursday September 30, 2010 @12:22PM (#33749382) Homepage

    We've known about this problem for ... well, as long as we've had more than one core - actually as long as we've had SMP... You increase the number of cores/CPUs, you decrease available memory thruput per core, which was already the bottleneck anyway. Am I missing something here?

  • Patches available (Score:4, Informative)

    by diegocg ( 1680514 ) on Thursday September 30, 2010 @12:30PM (#33749512)

    So, they found scalability problems in some microbenchmarks. Well, some of the scalability paths cited in the paper [] will be fixed when Nick Piggin's VFS scalability patchset gets merged. But it's not like you need to rewrite every operative system to scale beyond 48 cores, it's just the typical scalability stuff, and the kind of scalability issues found these days are mostly corner cases (Piggin's VFS being an exception).

  • by Todd Knarr ( 15451 ) on Thursday September 30, 2010 @12:31PM (#33749528) Homepage

    What they're saying is basically two things:

    First, there's a bottleneck in the on-chip caches. When a core's working on data it needs to have it in it's cache. And if two cores are working on the same block of memory (block size being determined by cache line size), they need to keep their copies of the cache synchronized. When you get a lot of cores working on the same block of memory, the overhead of keeping the caches in sync starts to exceed the performance gains from the additional cores. That's not new, we've known that in multi-threaded programming for decades: when you've got a lot of threads dependent on the same data items, the locking overhead's going to be the killer. And we've known the solution for just as long: code to avoid lock contention. The easiest is to make it so you don't have multiple threads (cores) working on the same (non-read-only) memory at the same time, that just requires some thinking on the part of the developers.

    Second, you only gain from additional cores if there's workload to spread to them usefully. If you've got 8 threads of execution actually running at any given time, you won't gain from having more than 8 cores. And on modern computers often we don't have more than a few threads actually using CPU time at any given moment. The rest are waiting on something and don't need the CPU and, as long as we aren't thrashing execution contexts too badly, they can be ignore from a performance standpoint. To take advantage of truly large numbers of cores, we need to change the applications themselves to parallelize things more. But often applications aren't inherently multi-threaded. Games, yes. Computation, yes. But your average word processor or spreadsheet? It's 99% waiting on the human at the keyboard. You can do a few things in the background, file auto-save and such, but not enough to take advantage of a large number of cores. The things that really take advantage of lots of cores are things like Web servers where you can assign each request to it's own core. And no, browsers don't benefit the same way. On the client side there are so (relatively) few requests and network I/O's so slow relative to CPU speed that you can handle dozens of requests on a single core and still have cycles free assuming you use an efficient I/O model. But it all boils down to the developers actually thinking about parallel programming, and I've noticed a lot of courses of study these days don't go into the brain-bending skull-sweat details of juggling large numbers of threads in parallel.

  • by compudj ( 127499 ) on Thursday September 30, 2010 @12:36PM (#33749616) Homepage

    The K42 project [] at IBM Research investigated the benefit of a complete OS rewrite with scalability to very large SMP systems in mind. This is an open source operating system supporting Linux-compatible API and ABI.

    Their target systems, "next generation SMP systems", back in 2003 seems to have become the current generation of SMP/multi-core systems in the meantime.

  • Tilera? (Score:3, Informative)

    by Anonymous Coward on Thursday September 30, 2010 @12:44PM (#33749766)

    Tilera Corp. already has CPU architecture with 16-100 cores per chip.
    TILE-Gx family []

    Support for these is already being included in the mainline kernel.

  • Slashdot (Score:4, Funny)

    by carrier lost ( 222597 ) on Thursday September 30, 2010 @12:53PM (#33749942) Homepage

    ...there is some time left to come up with a new Linux (Windows?).

    Windows, the new Linux.

    You read it here first...

  • by Fallen Kell ( 165468 ) on Thursday September 30, 2010 @01:41PM (#33750702)
    I have 34 systems which have 48 cores already in the server room. These are quad socket systems with 4 AMD 12-core CPU's. So I call BS to the guys who think we have plenty of time, because there are plenty of people deploying these things already.

Possessions increase to fill the space available for their storage. -- Ryan