Forgot your password?
typodupeerror
Intel AMD Hardware

Not All Cores Are Created Equal 183

Posted by kdawson
from the working-out-the-kinks dept.
joabj writes "Virginia Tech researchers have found that the performance of programs running on multicore processors can vary from server to server, and even from core to core. Factors such as which core handles interrupts, or which cache holds the needed data can change from run to run. Such resources tend to be allocated arbitrarily now. As a result, program execution times can vary up to 10 percent. The good news is that the VT researchers are working on a library that will recognize inefficient behavior and rearrange things in a more timely fashion." Here is the paper, Asymmetric Interactions in Symmetric Multicore Systems: Analysis, Enhancements and Evaluation (PDF).
This discussion has been archived. No new comments can be posted.

Not All Cores Are Created Equal

Comments Filter:
  • by bluefoxlucid (723572) on Monday December 22, 2008 @10:30PM (#26207927) Journal
    Last I checked, Linux was smart enough to try to keep programs running on cores where cache contained the needed data.
  • by Ironchew (1069966) on Monday December 22, 2008 @10:40PM (#26207967)

    http://en.wikipedia.org/wiki/N64#Programming_difficulties [wikipedia.org]
    The amount of video memory for textures was way too small.

  • NUMA NUMA (Score:4, Informative)

    by Gothmolly (148874) on Monday December 22, 2008 @10:41PM (#26207979)

    Linux can already deal with scheduling tasks to processors where the necessary resources are "close". It may not be obvious to the likes of PC Magazine, but its trivially obvious that even multithreaded programs running on a non-location aware kernel are going to take a hit. This is a kernel problem, not an application library problem.

  • This isn't news (Score:5, Informative)

    by nettablepc (1437219) on Monday December 22, 2008 @10:43PM (#26207989)

    Anyone who has been doing performance work should have known this. The tools to adjust things like core affinity and where interrupts are handled have been available in Linux and Windows for a long time. These effects were present in 1980s mainframes. DUH.

  • it's the affinity (Score:2, Informative)

    by non-e-moose (994576) on Monday December 22, 2008 @10:52PM (#26208063)
    It's just an Insel Intide thing. DAAMIT processors are more predictable. Or not. If you don't use numactl (1) to force socket (and memory) affinity, you get exactly what you ask for (randomly selected sockets, and unpredictable performance)
  • Re:Linux and Windows (Score:4, Informative)

    by swb (14022) on Monday December 22, 2008 @11:57PM (#26208391)

    They mentioned this in an ESX class I took. I seem to remember it in the context of setting a processor affinity or creating multi-CPU VMs and how either the hypervisor was smarter than you (eg, don't affinity) or that multi-CPU VMs could actually slow other VMs because the hypervisor would try to keep multi-CPU VMs on the same socket, thus deny execution priority to other VMs (eg, don't assign SMP VMs because you can unless you have the CPU workload).

  • Re:unsurprising. (Score:3, Informative)

    by Majik Sheff (930627) on Tuesday December 23, 2008 @12:23AM (#26208543) Journal

    Processor affinity is still a nasty corner of OS design. It was one of the outstanding issues with the BeOS kernel that was not resolved before the company tanked.

  • Re:Yup (Score:4, Informative)

    by cetialphav (246516) on Tuesday December 23, 2008 @12:48AM (#26208741)

    How about a "parallel foreach(Thing in Things)" ?

    That is easy. If your application can be parallelized that easily, then it is considered embarrassingly parallel. OpenMP exists today and does just this. All you have to do (in C) is add a "#pragma" above the for loop and you have a parallel program. OpenMP is commonly available on all major platforms.

    The real problem is that most desktop applications just don't lend themselves to this type of parallelism and so the threads have lots of data sharing. This data sharing causes the problem because the programmer must carefully use synchronization primitives to prevent race conditions. Since the programmer is using parallelism to boost performance, they only want to introduce synchronization when they absolutely have to. When in doubt, they leave it out. Since it is damn near impossible to test the code for race conditions, they have no indication when they have subtle errors. This is what makes concurrent programming so difficult. One researcher says that using threads makes programs "wildly nondeterministic".

    It is hard to blame the programmers for being aggressive in seeking performance gains because Amdahl's Law [wikipedia.org] is a real killer. If you have 90% of the program parallelized, the theoretical maximum performance gain is 10X no matter how many cores you can throw at the problem.

  • Re:This isn't news (Score:1, Informative)

    by Anonymous Coward on Tuesday December 23, 2008 @01:13AM (#26208911)

    80s mainframe tech is NEW and EXCITING to a depressing number of tech people, look at how excited everyone got when someone remembered and re-implemented virtualization.

    Ummm, that's re-implemented virtualization on x86 with very little performance overhead and at a very reasonable cost. That was new and exciting.

    And while I did use CICS and MVS back in the day, I don't think IBM had technology (maybe they did, but I never heard of it) like VMware's vMotion, where you can take a running virtual machine and move it from one host to another.

    Processor affinity isn't new. Quite a few applications have settings for that, even Microsoft Sql Server 2000.

  • Re:not a surprise (Score:1, Informative)

    by Anonymous Coward on Tuesday December 23, 2008 @02:12AM (#26209201)

    There are a number of possibilities. Make sure the CPU family/model/stepping is the same between the slow and normal effectively identical machine. Check that the DIMMs are exactly the same and installed in the same slots as the other machines. You might even try plain swapping memory with a known good machine. Another thing to check is the PCI bus. If you have a card in one slot in one machine and in a different slot in another machine, it might make a difference as to how the BIOS allocates interrupts for other devices (which may affect how Linux's lame interrupt mapping sets priorities). If this render farm machine talks on the network, it could be its own ethernet adapter is having problems or the switch port to which it is connected. Check for errors logged on both sides (ifconfig eth0) -- also make sure the ports are running full duplex.

  • This isn't hardware (Score:3, Informative)

    by multimediavt (965608) on Tuesday December 23, 2008 @02:19AM (#26209227)

    Why is this article labeled as hardware? Sure they talk about different procs being ... well, different. Duh! The article is about the software Tom and others developed to run processes more efficiently in a multi-core (an possibly heterogenous) environment. Big energy savings as well as performance boost. Green computing. HELLO! Did you read page two?

  • Re:unsurprising. (Score:4, Informative)

    by johnw (3725) on Tuesday December 23, 2008 @04:15AM (#26209603)

    A simple Google search for "fpga genetic algorithm" shows up references quite quickly - e.g.

    http://biology.kenyon.edu/slonc/bio3/AI/GEN_ALGO/gen_algo.html [kenyon.edu]

    The only part of the GP story I haven't seen before (and can't find a reference for) is the bit about the design not working on other FPGAs of the same specification. The closest story is that of Adrian Thompson at the University of Sussex who got a circuit with unconnected elements which nonetheless seem to be needed in order for the whole thing to achieve its goal. Nothing about the design only working on specific instances of the FPGA.

  • by PitaBred (632671) <slashdotNO@SPAMpitabred.dyndns.org> on Tuesday December 23, 2008 @01:34PM (#26213345) Homepage

    I thought that Intel specifically did that, that if one core were loaded it would overclock that core and downclock the others to get a speed boost...

    Yup, I thought I remembered correctly [tomshardware.com].

FORTRAN is for pipe stress freaks and crystallography weenies.

Working...