Forgot your password?
typodupeerror
Intel AMD Hardware

Not All Cores Are Created Equal 183

Posted by kdawson
from the working-out-the-kinks dept.
joabj writes "Virginia Tech researchers have found that the performance of programs running on multicore processors can vary from server to server, and even from core to core. Factors such as which core handles interrupts, or which cache holds the needed data can change from run to run. Such resources tend to be allocated arbitrarily now. As a result, program execution times can vary up to 10 percent. The good news is that the VT researchers are working on a library that will recognize inefficient behavior and rearrange things in a more timely fashion." Here is the paper, Asymmetric Interactions in Symmetric Multicore Systems: Analysis, Enhancements and Evaluation (PDF).
This discussion has been archived. No new comments can be posted.

Not All Cores Are Created Equal

Comments Filter:
  • unsurprising. (Score:5, Interesting)

    by Anonymous Coward on Monday December 22, 2008 @10:03PM (#26207751)

    Anyone who thinks computers are predictably deterministic hasn't used a computer. There are so many bugs in hardware and software that cause it to behave differently than expected, documented, designed. Add to that inevitable manufacturing defects, no matter how microscopic, and it's unimaginable to find otherwise.

    It's like discovering "no two toasters toast the same. Researches found some toasters browned toast up to 10% faster than others."

  • by Shadowruni (929010) on Monday December 22, 2008 @10:12PM (#26207809) Journal
    The current state of dev reminds me sort of the issues that Nintendo had with the N64.... a beautiful piece of hardware with (at the time) a God-like amount of raw power, but *REALLY* hard to code for. Hence the really interesting titles for it either came from Rare who developed on SGI machines (a R10000 drive that beast) or Nintendo, who built the thing.

    /yeah yeah, I know the PS1 and Sega Saturn had optical media and that the media's storage capacity which lead to better and more complex were truly what killed the N64.

    //bonus capt was arrestor

  • Re:unsurprising. (Score:3, Interesting)

    by $RANDOMLUSER (804576) on Monday December 22, 2008 @10:18PM (#26207861)
    I remember HP-UX on PA-RISC from at least ten years ago making efforts to reassign a swapped out process to the processor that it had been running on before it was swapped out, on the notion that some code and data might still be in the cache. SMP makes for some interesting OS problems.
  • by nullchar (446050) on Monday December 22, 2008 @10:41PM (#26207977)

    Possibly... but it appears an SMP kernel treats each core as a separate physical processor.

    Take an Intel Core2 Quad machine and start a process that takes 100% of one CPU. Then watch top/htop/gnome-system-monitor/etc where you can watch the process hop around all four cores. It makes sense that the process might hop between two cores -- the two that share L2 cache -- but all four cores doesn't make sense to me. Seems like the L2 cache is wasted when migrating between each core2 package.

  • by carlzum (832868) on Monday December 22, 2008 @10:51PM (#26208049)
    I believe the biggest problem with multi-core development is a lack of maturity in the tools and libraries available. Taking advantage of multiple cores requires a lot of thread management code, which is great for highly optimized applications but deters run-of-the-mill business and user app developers. There was a recent opinion piece [ddj.com] in Dr Dobbs discussing the benefits a concurrency platforms I found interesting. The article is clearly promoting the author's company (Clik Arts), but I agree with his argument that the complexities of multi-core development need to be handled in a framework and not applications.
  • by Krishnoid (984597) * on Monday December 22, 2008 @11:08PM (#26208169) Journal
    Wasn't there an article recently about this describing that if only one core was working at peak capacity that the die would heat unevenly, causing problems?
  • Re:unsurprising. (Score:5, Interesting)

    by aaron alderman (1136207) on Monday December 22, 2008 @11:53PM (#26208369) Homepage
    Impossible like "xor eax, eax" returning a non-zero value and crashing windows? [msdn.com]
  • Re:unsurprising. (Score:5, Interesting)

    by zappepcs (820751) on Tuesday December 23, 2008 @12:06AM (#26208433) Journal

    Actually, (sorry no link) there was a researcher that was using FPGAs and AI code to create simple circuits, but the goals was to have the AI design it. What he found is that due to minor manufacturing defects, the code that was built by AI was dependent on the FPGA it was tested on and would not work on just any FPGA of that specification. After 600 iterations, you'd think it would be good. One experiment went for a long time, and in the end when he analyzed the AI generated code, there were 5 paths/circuits inside that did nothing. If he disabled any or all of the 5 the overall design failed. Somehow, the AI found that creating these do nothing loops/circuits caused a favorable behavior in other parts of the FPGA that made for overall success. Naturally that code would not work on any other FPGA of the specified type. It was an interesting read, sorry that I don't have a link.

  • Re:not a surprise (Score:5, Interesting)

    by im_thatoneguy (819432) on Tuesday December 23, 2008 @12:12AM (#26208459)

    We have this problem at work.

    We have a render farm of 16 machines. 12 of them are effectively identical but despite all of our coaxing one of them always runs about 30% slower. It's maddening. But "What can you do?". Hardware is the same. We Ghost the systems so the boot data is exactly the same... and yet... slowness. It's just a handicapped system.

  • Close (Score:3, Interesting)

    by coryking (104614) * on Tuesday December 23, 2008 @01:48AM (#26209077) Homepage Journal

    But you have to think about it too much.

    How about:


    Things.ParallelEach(function(thing){
      Console.Write("{0} is cool, but in parallel", thing);
      # serious business goes here
    });

    There are lots of stupid loop structures that are used in desktop apps that are just begging to be run in parallel, but the current crop of languages dont make it braindead easy to do so. Make it so every loop structure has a trivial and non ugly (OpenMP pragmas) way of doing it.

    Also, IMHO, not enough languages do stuff like the Javascript Array.Each(function(element){}). Am I blind, or is this construct missing from C#?

  • Re:unsurprising. (Score:3, Interesting)

    by Mr2cents (323101) on Tuesday December 23, 2008 @04:56AM (#26209713)

    There is a very interesting channel on youtube called googletechtalks. There, you can find a lecture called "We have it easy, but do we have it right" about performance measures that really made me worry. Basically you can't just easilly compare performance by measuring the cpu time, because there are a lot of factors that determine performance. E.g.: by adding a environment variable before running a program, this can cause page allignments to change (even if the environment variable isn't used by the program), changing the performance dramatically in some cases. Same goes for changing the link order: performance can change by 20%.

    So much for determinism.

    http://www.youtube.com/watch?v=DKVRkfXrBpg [youtube.com]

  • Re:unsurprising. (Score:3, Interesting)

    by TheRaven64 (641858) on Tuesday December 23, 2008 @09:02AM (#26210575) Journal
    Yup, I found some interesting effects of this when doing my PhD. I tweaked my supervisor's code to add an abstraction layer in the middle before making changes, and found that this actually made things faster, even though it was doing more work (it was only meant to make things faster when I wrote something else on the other side of the abstraction layer). It was an entirely deterministic improvement though, even with different data sets, so most likely due to better instruction cache layout with the new code.
  • Re:unsurprising. (Score:3, Interesting)

    by TheRaven64 (641858) on Tuesday December 23, 2008 @09:07AM (#26210605) Journal

    Processor affinity is even harder on modern CPUs. You often have 2 or so contexts sharing execution units and L1 cache in a core, then a few cores sharing L2 cache in a chip. Deciding whether to move a process is tricky. There's a penalty for moving, because you increase the cache misses proportionally to the distance you move it (if you move it to a context that shares the same L1 cache, it's not as bad as if you move it to one that shares only the L2 cache, for example), but there's also a cost for not moving it if a lot of processes on a single context are suddenly doing a lot of work while those on another core are idle.

    Cache isn't the only problem though - with something like the AMD architecture, each core has its own memory, so if you allocate memory on one RAM chip then migrate the process to a different one then you end up with memory accesses being slower (and slowing down accesses on the other chip, since its memory controller is having to interleave remote requests with local ones).

  • by GameboyRMH (1153867) <[moc.liamg] [ta] [hmryobemag]> on Tuesday December 23, 2008 @03:41PM (#26215137) Journal

    ...I had with an Asterisk VOIP server. Under certain conditions, calls transferred from one of two receptionist's phones were bouncing back and ending up at the wrong voicemail. Since only two phones had a problem I suspected it was something specific to these phones. After checking the configuration and even hardware on the phones, I checked the server. I narrowed the problem down to one macro (a macro in asterisk is basically a user-defined function) that allows a "fallback line" to ring if the first is busy, it seemed to be getting an argument for this line when there should have been none. Soon it became evident that the variable was changing "mid-macro", apparently out of nowhere (there are variables with special names that are used in macros to receive arguments, nowhere was this variable changed, the macro's less than 30 lines long). I eventually got so frustrated I put debugging lines in between every single line of the macro to make it print the variables to the output log. Then I narrowed it down to one line - one where a Dial() command is executed (this is the function that actually places the call, this function isn't supposed to even be able to change anything in the macro that called it, and there are no other problems like this). Now that had me totally stumped. I could demonstrate exactly what was happening but I couldn't figure out why. Stranger still, the results changed slightly with the debugging lines in place, as if it's a race condition of some sort.

    The problem still exists to this day :(

A rock store eventually closed down; they were taking too much for granite.

Working...