Follow Slashdot blog updates by subscribing to our blog RSS feed

Not All Cores Are Created Equal 183

Posted by kdawson on Monday December 22, 2008 @09:56PM from the working-out-the-kinks dept.

joabj writes "Virginia Tech researchers have found that the performance of programs running on multicore processors can vary from server to server, and even from core to core. Factors such as which core handles interrupts, or which cache holds the needed data can change from run to run. Such resources tend to be allocated arbitrarily now. As a result, program execution times can vary up to 10 percent. The good news is that the VT researchers are working on a library that will recognize inefficient behavior and rearrange things in a more timely fashion." Here is the paper, Asymmetric Interactions in Symmetric Multicore Systems: Analysis, Enhancements and Evaluation (PDF).

This discussion has been archived. No new comments can be posted.

Not All Cores Are Created Equal

Load All Comments

Search 183 Comments Log In/Create an Account

Comments Filter:

unsurprising. (Score:5, Interesting)

by Anonymous Coward writes: on Monday December 22, 2008 @10:03PM (#26207751)

Anyone who thinks computers are predictably deterministic hasn't used a computer. There are so many bugs in hardware and software that cause it to behave differently than expected, documented, designed. Add to that inevitable manufacturing defects, no matter how microscopic, and it's unimaginable to find otherwise.
It's like discovering "no two toasters toast the same. Researches found some toasters browned toast up to 10% faster than others."

Share
twitter facebook
- Re:unsurprising. (Score:5, Funny)
  
  by Rod Beauvex ( 832040 ) writes: on Monday December 22, 2008 @10:03PM (#26207757)
  
  It's those turny knobs. They lie.
  
  Parent Share
  twitter facebook
  - Re:unsurprising. (Score:5, Funny)
    
    by symbolset ( 646467 ) writes: on Monday December 22, 2008 @10:11PM (#26207795) Journal
    
    You have to buy the one that goes to 11. You know how 10 makes the toast almost totally black? Well, what if you want your toast just a little bit more crispy? What if you want just that little bit more? That's what 11 is for. Those other toasters only go to 10, but this one goes to 11.
    
    Parent Share
    twitter facebook
    - Re:unsurprising. (Score:5, Funny)
      
      by MightyYar ( 622222 ) writes: on Monday December 22, 2008 @10:13PM (#26207811)
      
      I had a Pentium that DEFINITELY went to 11.
      
      Parent Share
      twitter facebook
      - Re:unsurprising. (Score:5, Funny)
        
        by RuBLed ( 995686 ) writes: on Monday December 22, 2008 @10:31PM (#26207937)
        
        mine only went up to 10.998799799
        
        Parent Share
        twitter facebook
        
        Re: (Score:2, Funny)
        
        by Anonymous Coward writes:
        
        Wow, a joke from 1995. It's true, Slashdot is at the forefront of cutting-edge humor.
        
        Re:unsurprising. (Score:5, Funny)
        
        by raynet ( 51803 ) writes: on Tuesday December 23, 2008 @05:21AM (#26209809) Homepage
        
        I am sure you mean to say; Wow, a joke from 1994.995994999.
        
        Parent Share
        twitter facebook
        
        Re: (Score:3, Funny)
        
        by Rakshasa Taisab ( 244699 ) writes:
        
        That joke is so badly done it's not even funny.
        1994.995994999
        If you look carefully at this number, it's clearly one constructed by a human. The first '5' might be random, but the proceeding numbers do not have any specific reason to be weighted towards higher digits!!!
        Thus, a more realistic semi-random number would be:
        1994.995974983
        
        Re: (Score:3, Funny)
        
        by raynet ( 51803 ) writes:
        
        Actually, your number looks more like random number string by a human as human try to avoid using long chains of same numbers when writing random numbers. But you are right, my number was made by randomly punching multiple number keys on my keyboard and those happened to register. I did then edit it so that the first digit after to dot was 9.
        
        Re: (Score:2)
        
        by PitaBred ( 632671 ) writes:
        
        Well, yeah. But his LOOKS more random because we have an implicit assumption that randomness will make things different, rather than select the same thing every time. We're hard-wired as humans to recognize patterns ;)
        
        Re: (Score:2)
        
        by ArsonSmith ( 13997 ) writes:
        
        Yes but the probability of a string of random numbers looking similar to the first is far less then that of a string of numbers looking similar to the second. Just like it may be the same chance to have 9999999 as to have 8675309, one is very consistent and can be seen to have a pattern while the other may not. Unless you're Jenny anyway.
      - Re: (Score:2)
        
        by Fluffeh ( 1273756 ) writes:
        
        Was it one of those PII Celeron 300A's [wikipedia.org] that just ran and ran and ran even if you pushed them up from 300 mhz to 4509 mhz?
        
        Those things were HAWT!
        
        Re: (Score:2)
        
        by Kent Recal ( 714863 ) writes:
        
        Hell yeah, that one was a bargain.
        I had mine clocked at at 400MHz and iirc saved about $200 over an equivalent "real" PII.
      - Re: (Score:2)
        
        by kimvette ( 919543 ) writes:
        
        I had an Abit motherboard (VP6) that went to 11. Unfortunately it ended with a little fireworks show. :( Stupid bad caps, lousy Abit QC.
    - - Re: (Score:2, Funny)
        
        by Anonymous Coward writes:
        
        The review for "Not All Cores Are Created Equal" was merely a two word review which simply read "Shit Sandwich".
- Re: (Score:3, Insightful)
  
  by ElectricTurtle ( 1171201 ) writes:
  
  Mod parent to 5, seriously, it's so true. There are more than a few times after working support for decade when I've had to say, 'that should be impossible' but a symptom nonetheless exists.
  - Re:unsurprising. (Score:5, Interesting)
    
    by aaron alderman ( 1136207 ) writes: on Monday December 22, 2008 @11:53PM (#26208369) Homepage
    
    Impossible like "xor eax, eax" returning a non-zero value and crashing windows? [msdn.com]
    
    Parent Share
    twitter facebook
    - Re:unsurprising. (Score:5, Funny)
      
      by $RANDOMLUSER ( 804576 ) writes: on Tuesday December 23, 2008 @12:04AM (#26208421)
      
      Moral of the story: There's a lot of overclocking out there, and it makes Windows look bad.
      
      Oh. So that's what's been doing it.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by sorak ( 246725 ) writes:
        
        Moral of the story: There's a lot of overclocking out there, and it makes Windows look bad.
        Oh. So that's what's been doing it.
        Yeah, Vista says my proc should actually be a vacuum tube.
    - Re: (Score:2)
      
      by Anthony_Cargile ( 1336739 ) writes:
      
      Very interesting story, wish I had some mod points right now :). I think I found a new blog to subscribe to, only this one has a purpose!
      
      Oh, and mod the comment above me up as well - that was just funny.
    - Re: (Score:2, Insightful)
      
      by Ant P. ( 974313 ) writes:
      
      If overclocking is the cause of so many of these problems, why hasn't Intel or AMD got a mechanism to tell the OS that the hardware's being run out of spec? The blame for these crashes should be directed where it belongs - with the -funroll-loops ricers.
  - Reporter bias (Score:2, Insightful)
    
    by symbolset ( 646467 ) writes:
    
    Often, an issue presents that isn't reproducible in the presence of a tech support person who knows what he's doing.
    Sometimes it's a user error they don''t want to admit, and so they won't reproduce it in front of somebody who knows they should not have done that.
    Sometimes it's just a glitch. Regardless, the best thing to do is smile and say "The bug must be afraid of me" and close the ticket.
  - Re: (Score:3, Interesting)
    
    by Mr2cents ( 323101 ) writes:
    
    There is a very interesting channel on youtube called googletechtalks. There, you can find a lecture called "We have it easy, but do we have it right" about performance measures that really made me worry. Basically you can't just easilly compare performance by measuring the cpu time, because there are a lot of factors that determine performance. E.g.: by adding a environment variable before running a program, this can cause page allignments to change (even if the environment variable isn't used by the progr
    - Re: (Score:3, Interesting)
      
      by TheRaven64 ( 641858 ) writes:
      
      Yup, I found some interesting effects of this when doing my PhD. I tweaked my supervisor's code to add an abstraction layer in the middle before making changes, and found that this actually made things faster, even though it was doing more work (it was only meant to make things faster when I wrote something else on the other side of the abstraction layer). It was an entirely deterministic improvement though, even with different data sets, so most likely due to better instruction cache layout with the new
- Re: (Score:3, Interesting)
  
  by $RANDOMLUSER ( 804576 ) writes:
  
  I remember HP-UX on PA-RISC from at least ten years ago making efforts to reassign a swapped out process to the processor that it had been running on before it was swapped out, on the notion that some code and data might still be in the cache. SMP makes for some interesting OS problems.
  - Re: (Score:3, Informative)
    
    by Majik Sheff ( 930627 ) writes:
    
    Processor affinity is still a nasty corner of OS design. It was one of the outstanding issues with the BeOS kernel that was not resolved before the company tanked.
    - Re: (Score:2)
      
      by ckaminski ( 82854 ) writes:
      
      FWIW, Windows NT, 3.5 I think, had a huge problem with process migration that killed performance.
    - Re: (Score:3, Interesting)
      
      by TheRaven64 ( 641858 ) writes:
      
      Processor affinity is even harder on modern CPUs. You often have 2 or so contexts sharing execution units and L1 cache in a core, then a few cores sharing L2 cache in a chip. Deciding whether to move a process is tricky. There's a penalty for moving, because you increase the cache misses proportionally to the distance you move it (if you move it to a context that shares the same L1 cache, it's not as bad as if you move it to one that shares only the L2 cache, for example), but there's also a cost for not
- Re: (Score:2)
  
  by ClosedSource ( 238333 ) writes:
  
  Actually, the PC was designed to be non-deterministic. No software bugs, hardware bugs or manufacturing defects needed.
  On the other hand, many early home computers were quite deterministic. In fact the Atari 2600 game machine was deterministic down to a single CPU cycle. Many 2600 games would not have worked if it were otherwise.
- Re: (Score:2)
  
  by TapeCutter ( 624760 ) writes:
  
  "It's like discovering "no two toasters toast the same. Researches found some toasters browned toast up to 10% faster than others."
  
  What we need is a toaster with an IQ of around 4000.
- Re:unsurprising. (Score:5, Interesting)
  
  by zappepcs ( 820751 ) writes: on Tuesday December 23, 2008 @12:06AM (#26208433) Journal
  
  Actually, (sorry no link) there was a researcher that was using FPGAs and AI code to create simple circuits, but the goals was to have the AI design it. What he found is that due to minor manufacturing defects, the code that was built by AI was dependent on the FPGA it was tested on and would not work on just any FPGA of that specification. After 600 iterations, you'd think it would be good. One experiment went for a long time, and in the end when he analyzed the AI generated code, there were 5 paths/circuits inside that did nothing. If he disabled any or all of the 5 the overall design failed. Somehow, the AI found that creating these do nothing loops/circuits caused a favorable behavior in other parts of the FPGA that made for overall success. Naturally that code would not work on any other FPGA of the specified type. It was an interesting read, sorry that I don't have a link.
  
  Parent Share
  twitter facebook
  - Re: (Score:2, Insightful)
    
    by paulgrant ( 592593 ) writes:
    
    Damn it, get one!
    At least a name for christs sake!
    - Re:unsurprising. (Score:4, Informative)
      
      by johnw ( 3725 ) writes: on Tuesday December 23, 2008 @04:15AM (#26209603)
      
      A simple Google search for "fpga genetic algorithm" shows up references quite quickly - e.g.
      http://biology.kenyon.edu/slonc/bio3/AI/GEN_ALGO/gen_algo.html [kenyon.edu]
      The only part of the GP story I haven't seen before (and can't find a reference for) is the bit about the design not working on other FPGAs of the same specification. The closest story is that of Adrian Thompson at the University of Sussex who got a circuit with unconnected elements which nonetheless seem to be needed in order for the whole thing to achieve its goal. Nothing about the design only working on specific instances of the FPGA.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by zappepcs ( 820751 ) writes:
        
        This article [susx.ac.uk] mentions how it won't work on only the FPGA it was developed on.
    - Re: (Score:2)
      
      by oPless ( 63249 ) writes:
      
      Sounds like John Koza ( http://en.wikipedia.org/wiki/John_Koza [wikipedia.org] ) or someone following his research
  - FPGA programming (Score:2)
    
    by sshore ( 50665 ) writes:
    
    One experiment went for a long time, and in the end when he analyzed the AI generated code, there were 5 paths/circuits inside that did nothing. If he disabled any or all of the 5 the overall design failed. Somehow, the AI found that creating these do nothing loops/circuits caused a favorable behavior in other parts of the FPGA that made for overall success.
    The author took the unusual step of disconnecting the clock for the FPGA, taking advantaged of undefined behavior that depended on the unique electrical characteristics of the FPGA he used. Had he left the clock connected he'd likely have more portable results, however he may not have arrived at the same results since he'd be depending on discrete logic and not the unspecified, non-linear analog behavior.
    - Re: (Score:2)
      
      by zappepcs ( 820751 ) writes:
      
      That's correct. My mind was fuzzy last night. Rereading it makes it very appropriate to this story though as it points out the minute variations in silicon/computers that is ignored by most software etc. as used today because of clocks etc. If the clock is not quite right, weird things can happen. Skynet was a clock failure?
  - - Re: (Score:2)
      
      by Detritus ( 11846 ) writes:
      
      Sometimes you see stuff like that due to compiler bugs. The ugly code is a way of not triggering the bug. Simplify it at your peril.
    - Re:unsurprising. (Score:4, Insightful)
      
      by sowth ( 748135 ) writes: on Tuesday December 23, 2008 @04:38AM (#26209653) Journal
      
      They probably put in the if(1) lines because they were testing various aspects of the program, or maybe some like to turn off various aspects of the program, but don't want to be arsed to write the proper code to select options. I commonly do that in POVray (3d raytracing) scripts when testing, so I don't have to wait for long renders--fog, radiosity, lots of light and such take orders of magnitude more time.
      As for the AI adding crap, it is probably more trying random code than truly thinking about how the code should work. This leads to the useful code intertwined with lots of crap code. Unfortunately, there are programmers who write like this too... (cue funny mod)
      As for the code not working on other FPGAs, maybe the researcher should not use real chips to check the iterations. A simulated one which conforms to the spec exactly and upon where quirks and such are expected, dies or sends a signal back to the AI program. Testing after the fact on real chips to verify the AI didn't exploit bugs in the simulator would be more proper procedure.
      Maybe I have too much of a background in theory, but I am not completely sure why the FPGAs would be so different. Is it race time conditions? Or is the FPGA being used in some analog way? Or does the circuit depend on the exact timing of some input, so the speed / capacitance of each component make a huge difference? Or was the poster talking about FPGAs with different specs?
      Crazy things happen when you enter the real world. I remember back when I was in electronics assembly. One would first assume all the solder would wick onto the metal, but the boards would always have tonnes of solder bridges, and we had to carefully examine every component and correct them. Friggin' microprocessors had countless tiny legs too!
      
      Parent Share
      twitter facebook
    - Re: (Score:2)
      
      by Frozen Void ( 831218 ) writes:
      
      Its for debugging.The code works on release versions(if (1)),but for debugging people need the ability to turn on/off certain parts of code(if (!1)).
- Re: (Score:2)
  
  by Beardo the Bearded ( 321478 ) writes:
  
  That's entirely incorrect.
  Computers are predictably deterministic -- the problem is that the number of variables used is neither known nor accounted for.
  Most code is crap, because most code isn't important. The stuff that is important is written to specific acceptable levels of error. The problem is when you get alphabet-soup diploma holders getting a little experience at a random startup then going off to write vital code. Then you get problems because you continue bad practices. The venerable K&R C bi
- Re: (Score:2)
  
  by w0mprat ( 1317953 ) writes:
  
  The underlying assumption made at every point in hardware and software development is that computers are deterministic.
- Reminds me of an issue... (Score:3, Interesting)
  
  by GameboyRMH ( 1153867 ) writes:
  
  ...I had with an Asterisk VOIP server. Under certain conditions, calls transferred from one of two receptionist's phones were bouncing back and ending up at the wrong voicemail. Since only two phones had a problem I suspected it was something specific to these phones. After checking the configuration and even hardware on the phones, I checked the server. I narrowed the problem down to one macro (a macro in asterisk is basically a user-defined function) that allows a "fallback line" to ring if the first is b
- - Re: (Score:3, Funny)
    
    by Lost Race ( 681080 ) writes:
    
    ... spherical frictionless inelastic computer at 0 Kelvin ...
who would've guessed... (Score:4, Insightful)

by Eto_Demerzel79 ( 1011949 ) writes: on Monday December 22, 2008 @10:03PM (#26207755)

...programs not designed for multi-core systems don't use them efficiently.

Share
twitter facebook
- Re:who would've guessed... (Score:5, Insightful)
  
  by timeOday ( 582209 ) writes: on Tuesday December 23, 2008 @12:07AM (#26208439)
  
  No, the programs are not the problem. The programmer should not have to worry about manually assigning processes to cores or switching a process from one core to another - in fact, there's no way the programmer could do that, since it would require knowing what the system load is, what other programs are running, and physical details (such as cache behavior) of processors not even invented yet. This is all the job of the OS.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by PhrostyMcByte ( 589271 ) writes:
    
    The OS can only do so much. Most programs have downright horrible scaling on just 4 cores, let alone the 64 cores of 5 years from now. If you want to be scalable, you need to learn how to do it and design your app for it from the start.
  - Re: (Score:2)
    
    by Splab ( 574204 ) writes:
    
    Actually it is the job of the programmer to make sure his program is cache friendly, that should work on all architectures.
    Also you should in a multi-core/-CPU environment make sure data needed is close to where you are, that means fetching it from whatever storage it is in (ram, hdd, other core) as early as possible and non-blocking if possible so you can complete other tasks while waiting.
    While the OS can help you with some tasks, there is no way for the OS to know what data you need next, so if you want
multicore dev is fun... much like prison rape! (Score:4, Interesting)

by Shadowruni ( 929010 ) writes: on Monday December 22, 2008 @10:12PM (#26207809) Journal

The current state of dev reminds me sort of the issues that Nintendo had with the N64.... a beautiful piece of hardware with (at the time) a God-like amount of raw power, but *REALLY* hard to code for. Hence the really interesting titles for it either came from Rare who developed on SGI machines (a R10000 drive that beast) or Nintendo, who built the thing.
/yeah yeah, I know the PS1 and Sega Saturn had optical media and that the media's storage capacity which lead to better and more complex were truly what killed the N64.
//bonus capt was arrestor

Share
twitter facebook
- Re: (Score:2)
  
  by aliquis ( 678370 ) writes:
  
  Could you point me at some direction for more information about the problems of developing for the N64? I knew developers didn't liked the Sega Saturn or whatever it was which had multiple cores but I don't remember reading anything about N64.
  - Re: (Score:2, Informative)
    
    by Ironchew ( 1069966 ) writes:
    
    http://en.wikipedia.org/wiki/N64#Programming_difficulties [wikipedia.org]
    The amount of video memory for textures was way too small.
- Re:multicore dev is fun... much like prison rape! (Score:5, Interesting)
  
  by carlzum ( 832868 ) writes: on Monday December 22, 2008 @10:51PM (#26208049)
  
  I believe the biggest problem with multi-core development is a lack of maturity in the tools and libraries available. Taking advantage of multiple cores requires a lot of thread management code, which is great for highly optimized applications but deters run-of-the-mill business and user app developers. There was a recent opinion piece [ddj.com] in Dr Dobbs discussing the benefits a concurrency platforms I found interesting. The article is clearly promoting the author's company (Clik Arts), but I agree with his argument that the complexities of multi-core development need to be handled in a framework and not applications.
  
  Parent Share
  twitter facebook
  - Yup (Score:2)
    
    by coryking ( 104614 ) writes:
    
    The libraries and the languages currently make threading harder then it needs to be.
    How about a "parallel foreach(Thing in Things)" ?
    I realize there are locking issues and race conditions, but really I think the languages could go a some ways to making things like this more hidden. Oh wait, does that mean I'm advocating for making programming languages more user friendly? I guess so. You know why people use Ruby, C# or Java? Cause those are way more user friendly than C++ or COBOL.
    The usability of a pro
    - Re:Yup (Score:4, Informative)
      
      by cetialphav ( 246516 ) writes: on Tuesday December 23, 2008 @12:48AM (#26208741)
      
      How about a "parallel foreach(Thing in Things)" ?
      That is easy. If your application can be parallelized that easily, then it is considered embarrassingly parallel. OpenMP exists today and does just this. All you have to do (in C) is add a "#pragma" above the for loop and you have a parallel program. OpenMP is commonly available on all major platforms.
      The real problem is that most desktop applications just don't lend themselves to this type of parallelism and so the threads have lots of data sharing. This data sharing causes the problem because the programmer must carefully use synchronization primitives to prevent race conditions. Since the programmer is using parallelism to boost performance, they only want to introduce synchronization when they absolutely have to. When in doubt, they leave it out. Since it is damn near impossible to test the code for race conditions, they have no indication when they have subtle errors. This is what makes concurrent programming so difficult. One researcher says that using threads makes programs "wildly nondeterministic".
      It is hard to blame the programmers for being aggressive in seeking performance gains because Amdahl's Law [wikipedia.org] is a real killer. If you have 90% of the program parallelized, the theoretical maximum performance gain is 10X no matter how many cores you can throw at the problem.
      
      Parent Share
      twitter facebook
      - But that is ugly (Score:2)
        
        by coryking ( 104614 ) * writes:
        
        And OpenMP isn't "standard" as far as I'm concerned. Plus it makes you think about threading and it only works in low-level languages like C.
        I'm talking about this highly useful code (which is written in a bastardized version of C#, Perl and Javascript for your reading pleasure):
        List pimpScores = PimpList.ThreadedMap(function(aPimp){ # score how worthy this guy is at pimpin' if(aPimp.Hoes > 10) { return String.Format("Damn brot
    - Re: (Score:2)
      
      by gfody ( 514448 ) writes:
      
      what you're asking for is pretty much already that easy
      foreach(Thing in Things)
      new Thread(Thing.DoStuff);
      - Close (Score:3, Interesting)
        
        by coryking ( 104614 ) * writes:
        
        But you have to think about it too much.
        How about:
        Things.ParallelEach(function(thing){ Console.Write("{0} is cool, but in parallel", thing); # serious business goes here });
        There are lots of stupid loop structures that are used in desktop apps that are just begging to be run in parallel, but the current crop of languages dont make it braindead easy to do so. Make it so every loop structure has a trivial and non ugly (OpenMP pragmas) way of doing it.
        Also, IMHO, not enough languages do stuff lik
        
        Re: (Score:2)
        
        by Marillion ( 33728 ) writes:
        
        I agree. I've ranted about this before. 99% of languages implement multi-threading through function calls. Class method calls, in this case, are merely glorified function calls. Multi-threading should be handled at the same level as other flow control statements because that's what is most like.
    - - Only less ugly :-) (Score:2)
        
        by coryking ( 104614 ) * writes:
        
        And for those who say "what what about all the weird race conditions and stuff". I'm not a computer science major, so I'm jumping off an edge asking this, but what if we actually use some of this new CPU power in our IDEs and our JIT compilers, couldn't our languages watch out for most of the nasty ways we can shoot ourselves in the food? Like if I do a Array.ThreadedEach(function(element){}) and I'm changing some shared data, couldn't the compiler or IDE let me know at compile time or while I'm writing t
        
        Re: (Score:2)
        
        by xenocide2 ( 231786 ) writes:
        
        There's something called Turing completeness that blows "solve it with smarter compilers" idea out of the water in the general sense (even though it might work 95 percent of the time).
        Threaded stuff isn't super hard. Getting threaded stuff to run FAST is hard. There's a billion tradeoffs handled by what are traditionally different parts of the system. In your silly parallelize loops idea (aka MapReduce) the challenge is clear (How many items do you need before setting up parallelization is worth the extra c
- Re: (Score:2, Offtopic)
  
  by Fallingcow ( 213461 ) writes:
  
  The N64 was killed?
  Best "party game" system of that generation, easily.
  4 controller capability out of the box, 007 Goldeneye, Perfect Dark, Mario Kart, all the good wrestling games (hey, they were fun at the time...) etc.
  The PS1 was only good for racing games and RPGs, IMO. Oh, and Bushido Blade 1 and 2.
  Kind of like the Wii vs. 360/PS3. Any time we plug in a PS3 at a get-together, it's to ooh and ah over the graphics and maybe take turns playing the single player mode of a cool game (Need for Speed or som
Linux and Windows (Score:4, Insightful)

by WarJolt ( 990309 ) writes: on Monday December 22, 2008 @10:29PM (#26207923)

I don't know if Linux or Windows has an automatic mechanism to schedule task priority based on processor caches, but the study didn't even mention Windows. Seeing that the scheduling and managing the caches are OS problems this seems kind of important.
The other thing that seems odd is they were using a 2.6.18 Kernel and in 2.6.23 they added the Completely Fair Scheduler which could potentially change their results. It doesn't seem logical to base a cutting edge study on stuff that was released years ago.

Share
twitter facebook
- Re: (Score:2)
  
  by Anthony_Cargile ( 1336739 ) writes:
  
  I agree, and seeing this in the standard C/C++ libraries down the road would be nice. I would say Java would have framework-esque multicore support first, but then again Sun is in trouble and Java is just now getting video and 64-bit support. I don't use .NET enough to know, but it would be interesting to know if .NET has decent native multicore support and if Mono implements it correctly, although this all depends on MSIL versioning/limitations I'm sure.
  
  In a nutshell, we need more portable multicore sol
- Re: (Score:2)
  
  by nategoose ( 1004564 ) writes:
  
  Last time I read anything about it (which was years ago) the Linux cache aware scheduling consisted of trying to get task scheduled on the same processor as they were scheduled on previously. This works well for a lot of things, but you lose a lot of benefit when multiple simultaneous tasks are working on the same data since those tasks would be spread across the processors to take advantage of concurrency.
  This is just an engineering trade off.
- Re: (Score:2)
  
  by nabsltd ( 1313397 ) writes:
  
  I don't know if Linux or Windows has an automatic mechanism to schedule task priority based on processor caches, but the study didn't even mention Windows. Seeing that the scheduling and managing the caches are OS problems this seems kind of important.
  I'm not sure why this article isn't tagged "duh".
  It's pretty obvious from looking at the CPU graphs of my VMware ESX servers that their code does some optimization to keep processes on the same core, or at the very least on the same CPU.
  This data is from a dual-socket quad-core AMD (8 total cores), which means a NUMA [wikipedia.org] architecture, so running the code on the same CPU means you have faster memory access.
  So, some commercial code that has been around for nearly 4 years takes advantage of the "discoveries" in an
  - Re:Linux and Windows (Score:4, Informative)
    
    by swb ( 14022 ) writes: on Monday December 22, 2008 @11:57PM (#26208391)
    
    They mentioned this in an ESX class I took. I seem to remember it in the context of setting a processor affinity or creating multi-CPU VMs and how either the hypervisor was smarter than you (eg, don't affinity) or that multi-CPU VMs could actually slow other VMs because the hypervisor would try to keep multi-CPU VMs on the same socket, thus deny execution priority to other VMs (eg, don't assign SMP VMs because you can unless you have the CPU workload).
    
    Parent Share
    twitter facebook
Linux schedules better than this (Score:4, Informative)

by bluefoxlucid ( 723572 ) writes: on Monday December 22, 2008 @10:30PM (#26207927) Journal

Last I checked, Linux was smart enough to try to keep programs running on cores where cache contained the needed data.

Share
twitter facebook
- Re: (Score:3, Interesting)
  
  by HRbnjR ( 12398 ) writes:
  
  Helpful reading list :)
  http://www.google.com/search?q=linux+%22scheduler+domain%22+%22multi+core%22 [google.com]
- Re:Linux schedules better than this (Score:5, Interesting)
  
  by nullchar ( 446050 ) writes: on Monday December 22, 2008 @10:41PM (#26207977)
  
  Possibly... but it appears an SMP kernel treats each core as a separate physical processor.
  Take an Intel Core2 Quad machine and start a process that takes 100% of one CPU. Then watch top/htop/gnome-system-monitor/etc where you can watch the process hop around all four cores. It makes sense that the process might hop between two cores -- the two that share L2 cache -- but all four cores doesn't make sense to me. Seems like the L2 cache is wasted when migrating between each core2 package.
  
  Parent Share
  twitter facebook
  - Re:Linux schedules better than this (Score:4, Interesting)
    
    by Krishnoid ( 984597 ) * writes: on Monday December 22, 2008 @11:08PM (#26208169) Journal
    
    Wasn't there an article recently about this describing that if only one core was working at peak capacity that the die would heat unevenly, causing problems?
    
    Parent Share
    twitter facebook
    - Re: (Score:3, Informative)
      
      by PitaBred ( 632671 ) writes:
      
      I thought that Intel specifically did that, that if one core were loaded it would overclock that core and downclock the others to get a speed boost...
      Yup, I thought I remembered correctly [tomshardware.com].
  - Re: (Score:2)
    
    by ILongForDarkness ( 1134931 ) writes:
    
    My understanding is that was one of the features of the Xeon chips and presumably got transferred over to the core 2 world. The idea is that the work load gets moved around to distribute the heat better on the die. More even heat leads to more efficient cooling.
    You have a point when it comes to cache locality. It can be somewhat mitigated by smart timing of the core switching. For example long time on each core (as you probably would notice with your system monitor), or doing something like switching on e
  - Re: (Score:2)
    
    by slash.duncan ( 1103465 ) writes:
    
    How do you have your SMP configured, and do you have NUMA and is it enabled?
    I'm running a now older 2xx series dual Opteron 290, so dual sockets, dual-cores each, physically configured with four gig memory hanging off each one. The AMD 8xxx chipset has the rest of the system (all the PCI-X channels and AGP, it's pre-PCI-E) hanging off socket-0. In the kernel, I have SMP set, SMT (multi-thread, this would be closer affinity than multi-core but of course the AMDs don't use it) unset, SCHED_MC (multi-core, l
  - - Re: (Score:2)
      
      by RAMMS+EIN ( 578166 ) writes:
      
      ``And Motorola is probably still at 500MHz.''
      Actually, they gave up on the desktop CPU market. They spun off their chip division into Freescale Semiconductor [freescale.com], which now makes embedded processors.
- Re: (Score:2)
  
  by Anthony_Cargile ( 1336739 ) writes:
  
  The article uses a kernel version that predates the completely fair scheduler, that would be why. If they aim to test something like this, they need to test the most recent version.
  - Re: (Score:2)
    
    by bluefoxlucid ( 723572 ) writes:
    
    Your PAUSE() function will spin indefinitely instead of continuing.
    - Re: (Score:2)
      
      by Anthony_Cargile ( 1336739 ) writes:
      
      Exactly. If Slashdot gave me more room, I would have put the rest of the joke on there:
      
      void PAUSE(){ printf("\nPress any key to continue. . ."); while(1) getch(); } // Enforce the 'any' key
      
      Whats even worse is that this line of code was used in a fake cmd.exe [anthonycargile.info] I made for a prank on my friend's computer. Tricky to install due to having to point the COMSPEC env. variable to a backed up version of the real cmd.exe and tinkering with the dllcache directory, but it was priceless to see his reaction to the fa
      - Re: (Score:2)
        
        by bluefoxlucid ( 723572 ) writes:
        
        You should have added "Guru Meditation" to the stop error ;)
- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  Last I checked, Linux was smart enough to try to keep programs running on cores where cache contained the needed data.
  As if simply giving each process affinity for a given core solves the problem. But then you have interrupt handling, job loads with more than one process per core, multi-threaded programs - all sharing memory space yet with different memory access patterns - and different processors with e.g. different cache architectures. The task-switching OS is 50 years old and we still haven't settle
NUMA NUMA (Score:4, Informative)

by Gothmolly ( 148874 ) writes: on Monday December 22, 2008 @10:41PM (#26207979)

Linux can already deal with scheduling tasks to processors where the necessary resources are "close". It may not be obvious to the likes of PC Magazine, but its trivially obvious that even multithreaded programs running on a non-location aware kernel are going to take a hit. This is a kernel problem, not an application library problem.

Share
twitter facebook
This isn't news (Score:5, Informative)

by nettablepc ( 1437219 ) writes: on Monday December 22, 2008 @10:43PM (#26207989)

Anyone who has been doing performance work should have known this. The tools to adjust things like core affinity and where interrupts are handled have been available in Linux and Windows for a long time. These effects were present in 1980s mainframes. DUH.

Share
twitter facebook
- Re:This isn't news (Score:5, Insightful)
  
  by Clover_Kicker ( 20761 ) writes: <clover_kicker@yahoo.com> on Monday December 22, 2008 @10:58PM (#26208095)
  
  80s mainframe tech is NEW and EXCITING to a depressing number of tech people, look at how excited everyone got when someone remembered and re-implemented virtualization.
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by nullchar ( 446050 ) writes:
  
  I don't know if they've been in the default kernel for "a long time", but they are there now.
  read: http://www.alexandersandler.net/smp-affinity-and-proper-interrupt-handling-in-linux [alexandersandler.net]
- - Re: (Score:2)
    
    by ion.simon.c ( 1183967 ) writes:
    
    Meh.
    That doesn't excuse the *rest* of the entire industry forgetting *everything* the mainframe folks learned. :/
it's the affinity (Score:2, Informative)

by non-e-moose ( 994576 ) writes:

It's just an Insel Intide thing. DAAMIT processors are more predictable. Or not. If you don't use numactl (1) to force socket (and memory) affinity, you get exactly what you ask for (randomly selected sockets, and unpredictable performance)
not a surprise (Score:5, Insightful)

by Eil ( 82413 ) writes: on Monday December 22, 2008 @11:00PM (#26208119) Homepage Journal

Here's an exercise: Take 2 brand-new systems with identical configurations and start them at the same time doing some job that takes a few hours and utilizes most of the hardware to some significant degree. Say, compiling some huge piece of code like KDE or OpenOffice. System administrators who do exactly this will tell you that you'll almost never see the two machines complete the job at precisely the same time. Even though the CPU, memory, hard drive, motherboard, and everything else is the same, the system as a whole is so complex that minute differences in timing somewhere compound into larger ones. Sometimes you can even reboot them and repeat the experiment and the results will have reversed. It shouldn't come as a surprise that adding more complexity (in the form of processor cores) would enhance the effect.

Share
twitter facebook
- Re:not a surprise (Score:5, Interesting)
  
  by im_thatoneguy ( 819432 ) writes: on Tuesday December 23, 2008 @12:12AM (#26208459)
  
  We have this problem at work.
  We have a render farm of 16 machines. 12 of them are effectively identical but despite all of our coaxing one of them always runs about 30% slower. It's maddening. But "What can you do?". Hardware is the same. We Ghost the systems so the boot data is exactly the same... and yet... slowness. It's just a handicapped system.
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by visualight ( 468005 ) writes:
    
    Move processors around so you get a different boot proc, if you haven't tried that already.
    - Re: (Score:2)
      
      by Ethanol-fueled ( 1125189 ) * writes:
      
      Ahh, the trusty ol' cycle 'n' swap. It's funny how complex problems often have simple fixes. Kinda like how the car won't start unless you kick the fender before you turn the crank.
      
      Some people put together servers all day that way: swapping a bunch of intermittent crap in and out until the box runs long enough to install the OS :)
  - Re: (Score:2)
    
    by TheRaven64 ( 641858 ) writes:
    
    I encountered a similar issue on a cluster I used. One or two of the machines would suddenly become very slow. It turned out that the fans had partially failed. When the CPU got hot, it would be throttled back, without leaving anything in the error log. The technicians didn't expect this - with the old cluster CPUs that got too hot just failed and were replaced - but eventually tracked it down. You might want to check that the air flow around the slow machine is adequate. Recent Intel chips (and, I th
  - Re: (Score:2)
    
    by PitaBred ( 632671 ) writes:
    
    Check your power supply... that's almost always been the cause of any "weird" errors I've gotten. Jitter in power causes all kinds of fun, unpredictable stuff to happen.
Well known problem (Score:4, Insightful)

by sjames ( 1099 ) writes: on Tuesday December 23, 2008 @12:01AM (#26208411) Homepage Journal

The problem is a complex one. Every possible scheduling decision has pluses and minuses. For example, keeping a process on the same core for each timeslice maximizes cache hits, but can lose if it means the process has to wait TOO long for it's next slice. Likewise, if a process must wait for something, should it yield to another process or busy wait. SHould interrupts be balanced over CPUs or should one CPU handle them?
A lot of work has gone in to those questions in the Linux scheduler. For all of that, the scheduler only knows so much about a given app and if it takes TOO long to 'think' about it, it negates the benefits of a better decision.
For special cases where you're quite sure you know more than the scheduler about your app, you can use the isolcpus kernel parameter to reserve CPUS to run only the apps you explicitly assign to them.
You can also decide which CPU any given IRQ can be handled by (but not which core within a CPU as far as I know) wilt /proc/irq/*/smp_affinity.
Unless your system is dedicated to a single application and you understand it quite well, the most likely result of screwing with all of that is overall loss of performance.

Share
twitter facebook
- What if... (Score:2)
  
  by raftpeople ( 844215 ) writes:
  
  We added 4 more cores to perform this "thinking" about which core the process should run on, we should be able to get back that 10% we lost, right?
- Re: (Score:2)
  
  by little1973 ( 467075 ) writes:
  
  "You can also decide which CPU any given IRQ can be handled by (but not which core within a CPU as far as I know)"
  With the usage of IOAPIC you can redirect the IRQ to any cores. We have a in-house-developed commercial OS for telephony applications and we use the IOAPIC with a simple round-robin fashion. I do not know why linux does not do this.
This isn't hardware (Score:3, Informative)

by multimediavt ( 965608 ) writes: on Tuesday December 23, 2008 @02:19AM (#26209227)

Why is this article labeled as hardware? Sure they talk about different procs being ... well, different. Duh! The article is about the software Tom and others developed to run processes more efficiently in a multi-core (an possibly heterogenous) environment. Big energy savings as well as performance boost. Green computing. HELLO! Did you read page two?

Share
twitter facebook
- Re: (Score:2)
  
  by Shikaku ( 1129753 ) writes:
  
  Did you read page two?
  This is Slashdot
Did anyone else notice. . . (Score:2)

by MagusSlurpy ( 592575 ) writes:

. . .the tag "bang news" on a story involving researchers from Virginia Tech?
- Re: (Score:2)
  
  by aliquis ( 678370 ) writes:
  
  And this is useful info because?
  Isn't most of the point of using -j parameter that your machine can carry on compiling something else while whatever it did earlier get the resources it needed from disk or similar? Will it really help out with cache usage?
  Should more processes mean better or worse cache performance? Worse because cache is shared between them, better because if something is missing some other instruction can be done while the needed data is fetched from RAM?
- Re: (Score:2)
  
  by Anthony_Cargile ( 1336739 ) writes:
  
  I believe this allows make to make use of several cores, not the actual application being compiled. More specifically, -j means "jobs" and therefore not necessarily "cores" per se, but you could always manually tweak the affinity yourself if you're compiling something absolutely huge.
  - Re: (Score:3, Funny)
    
    by bob.appleyard ( 1030756 ) writes:
    
    It's OK. This is a Gentoo user. Getting make to work on multicore well has a significant impact on the usability of his computer.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

unsurprising. (Score:5, Interesting)

Re:unsurprising. (Score:5, Funny)

Re:unsurprising. (Score:5, Funny)

Re:unsurprising. (Score:5, Funny)

Re:unsurprising. (Score:5, Funny)

Re: (Score:2, Funny)

Re:unsurprising. (Score:5, Funny)

Re: (Score:3, Funny)

Re: (Score:3, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Funny)

Re: (Score:3, Insightful)

Re:unsurprising. (Score:5, Interesting)

Re:unsurprising. (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2, Insightful)

Reporter bias (Score:2, Insightful)

Re: (Score:3, Interesting)

Re: (Score:3, Interesting)

Re: (Score:3, Interesting)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:2)

Re: (Score:2)

Re:unsurprising. (Score:5, Interesting)

Re: (Score:2, Insightful)

Re:unsurprising. (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

FPGA programming (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:unsurprising. (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Reminds me of an issue... (Score:3, Interesting)

Re: (Score:3, Funny)

who would've guessed... (Score:4, Insightful)

Re:who would've guessed... (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

multicore dev is fun... much like prison rape! (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2, Informative)

Re:multicore dev is fun... much like prison rape! (Score:5, Interesting)

Yup (Score:2)

Re:Yup (Score:4, Informative)

But that is ugly (Score:2)

Re: (Score:2)

Close (Score:3, Interesting)

Re: (Score:2)

Only less ugly :-) (Score:2)

Re: (Score:2)

Re: (Score:2, Offtopic)

Linux and Windows (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Linux and Windows (Score:4, Informative)

Linux schedules better than this (Score:4, Informative)

Re: (Score:3, Interesting)

Re:Linux schedules better than this (Score:5, Interesting)

Re:Linux schedules better than this (Score:4, Interesting)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

NUMA NUMA (Score:4, Informative)