Forgot your password?
typodupeerror
AMD Hardware

Smarter Thread Scheduling Improves AMD Bulldozer Performance 196

Posted by Soulskill
from the almost-up-to-par dept.
crookedvulture writes "The initial reviews of the first Bulldozer-based FX processors have revealed the chips to be notably slower than their Intel counterparts. Part of the reason is the module-based nature of AMD's new architecture, which requires more intelligent thread scheduling to extract optimum performance. This article takes a closer look at how tweaking Windows 7's thread scheduling can improve Bulldozer's performance by 10-20%. As with Intel's Hyper-Threading tech, Bulldozer performs better when resource sharing is kept to a minimum and workloads are spread across multiple modules rather than the multiple cores within them."
This discussion has been archived. No new comments can be posted.

Smarter Thread Scheduling Improves AMD Bulldozer Performance

Comments Filter:
  • by Antisyzygy (1495469) on Friday October 28, 2011 @01:37PM (#37871088)
    AMD servers are way cheaper, and there are no performance issues most admins can't handle. What do you mean by performance? If you mean slower, then yes, but if you mean reliability than they are about the same. Why else do Universities almost exclusively use AMD processors in their clusters for cutting edge research? I can see your point if you are only buying 1-3 servers but you start saving shitloads of money when its a server farm.
  • by QuantumRiff (120817) on Friday October 28, 2011 @01:55PM (#37871308)

    A dell R815 with 2 twelve-core AMD processors (although they were not bulldozer ones) 256GB of ram, and a pair of hard drives was $8k cheaper than a similarly configured Dell R810 with 2 10-core Intel Processors when we ordered a few weeks ago. That difference in price is enough to buy a nice Fusion-IO drive, which will make much, much more of a performance impact than a small percentage higher CPU speed

  • by Animats (122034) on Friday October 28, 2011 @02:00PM (#37871408) Homepage

    This is really more of an OS-level problem. CPU scheduling on multiprocessors needs some awareness of the costs of an interprocessor context switch. In general, it's faster to restart a thread on the same processor it previously ran on, because the caches will have the data that thread needs. If the thread has lost control for a while, though, it doesn't matter. This is a standard topic in operating system courses. An informal discussion of how Windows 7 does it [blogspot.com] is useful.

    Windows 7 generally prefers to run a thread on the same CPU it previously ran on. But if you have a lot of threads that are frequently blocking, you may get excessive inter-CPU switching.

    On top of this, the Bulldozer CPU adjusts the CPU clock rate to control power consumption and heat dissipation. If some cores can be stopped, the others can go slightly faster. This improves performance for sequential programs, but complicates scheduling.

    Manually setting processor affinity is a workaround, not a fix.

  • Re:So basically... (Score:4, Informative)

    by washu_k (1628007) on Friday October 28, 2011 @02:24PM (#37871790)
    No, It's because AMD is lying to the OS. The "8 core" BD is not really 8, core. It only has 4 cores with some duplicated integer resources. Basically a better version of hyper-threading, but not a proper 8 core design.

    The problem is that the BD says to Windows "I have 8 cores" and thus Windows schedules assuming that is true. If BD said "I have 4 cores with 8 threads" then Windows would schedule it just like it does with Intel CPUs and performance would improve just like in the FA.

    There shouldn't need to be any OS level tweaks because Windows already knows how to schedule for hyper-threading optimally. If BD reported it's true core count properly then no OS level changes would be needed.
  • by Anonymous Coward on Friday October 28, 2011 @03:29PM (#37872558)

    Here's Agner Fog's page about this issue. [agner.org]

    The Intel compiler (for many years and many versions) has generated multiple code paths for different instruction sets. Using the lame excuse that they don't trust other vendors to implement the instruction set correctly, the generated executables detect the "GenuineIntel" CPU vendor string and deliberately cripple your program's performance by not running the fastest codepaths unless your CPU was made by Intel. So e.g. if you have an SSE4-capable AMD CPU, it will run the SSE2 codepath instead of the SSE4 codepath that comparable Intel chips will run.

    Over the years, MANY libraries (including several from Intel) have been compiled and shipped with this compiler, with the result that the applications compiled with those libraries including many benchmarks, also suffer from the same performance sabotage.

The meta-Turing test counts a thing as intelligent if it seeks to devise and apply Turing tests to objects of its own creation. -- Lew Mammel, Jr.

Working...