Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
AMD Hardware

Smarter Thread Scheduling Improves AMD Bulldozer Performance 196

crookedvulture writes "The initial reviews of the first Bulldozer-based FX processors have revealed the chips to be notably slower than their Intel counterparts. Part of the reason is the module-based nature of AMD's new architecture, which requires more intelligent thread scheduling to extract optimum performance. This article takes a closer look at how tweaking Windows 7's thread scheduling can improve Bulldozer's performance by 10-20%. As with Intel's Hyper-Threading tech, Bulldozer performs better when resource sharing is kept to a minimum and workloads are spread across multiple modules rather than the multiple cores within them."
This discussion has been archived. No new comments can be posted.

Smarter Thread Scheduling Improves AMD Bulldozer Performance

Comments Filter:
  • that's the truth. unless i can buy an AMD server for a lot cheaper i'm not going to try and take on the risk of performance issues

    • by h4rr4r ( 612664 )

      Depends on what you mean by a lot cheaper. If you need lots of cores but don't need them fast, like for a VM host then AMD servers can be quite a bit cheaper once we are talking about getting 128GB+ of RAM.

      Risk of performance issues makes no sense if you don't know what app you want to run.

    • by Antisyzygy ( 1495469 ) on Friday October 28, 2011 @12:37PM (#37871088)
      AMD servers are way cheaper, and there are no performance issues most admins can't handle. What do you mean by performance? If you mean slower, then yes, but if you mean reliability than they are about the same. Why else do Universities almost exclusively use AMD processors in their clusters for cutting edge research? I can see your point if you are only buying 1-3 servers but you start saving shitloads of money when its a server farm.
      • Agreed. To further expound upon parent's point, unless you really know your performance needs and requirements, where the initial extra cost of Intel chips is lower than the revenue that is gained with that extra couple percent of performance, then go Intel. Otherwise, it's usually a cost versus preference piss fest. And last I checked in a down economy, cost is king.
      • by Kjella ( 173770 ) on Friday October 28, 2011 @01:38PM (#37871960) Homepage

        Well, it doesn't seem to apply when you get up to supercomputing levels at least. I checked the TOP500 list [top500.org] and it's 76% Intel, 13% AMD. As for Bulldozer, it has serious performance/watt issues even though the performance/price ratio isn't all that bad for a server. On the desktop, Intel hasn't even bothered to make a response except to quietly add a 2700K to their pricing table, with the 2600K left untouched. On the business side (where after all margins fund future R&D) then Sandy Bridge's 216mm2 is much smaller than Bulldozer's 315mm2. Intel can produce almost 50% more in the same die area, in practice the yields probably favor Intel more because the risk of critical defects go up with size. Honestly, I don't think Intel has felt less challenged since the AMD K5 days...

        • When you can save 8000 per server then invest it in something else it becomes a different issue. I am not trying to say AMD processors are superior, I am just saying factoring in all costs including power and and the lifespan of the unit, AMD wins a lot of the time. Every computer cluster at every University I have ever had access to used AMD processors (with the exception of some NVidia units), and this was for their CS departments. I suspect part of the issue is its easier to justify power budgets and not
          • by cost-effective computer I meant cluster! Also, I would like to add I had high hopes for the bulldozer, so it was disappointing it was all marketing hype.
          • Fast memory bus, nothing special needed to use ECC RAM, good work/watt, and low prices all help win AMD for most clusters.

            If you're aiming for a Top-500 slot and you have server money but not real estate money, then Intel is the logical choice.

      • by blair1q ( 305137 )

        >Why else do Universities almost exclusively use AMD processors in their clusters

        Because when your budget is fixed and N is the number of nodes you can afford and M is the performance per node, and N1*M1 > N2*M2, you buy P1 over P2 even if M1 > N2 in this case because proprietor 1 has a lot of trouble selling its units to individuals and turns to massively discounting its products when sold in bulk to HPC OEMs.

      • by Idbar ( 1034346 )

        Why else do Universities almost exclusively use AMD processors in their clusters for cutting edge research

        [citation needed]

        Not that I question your argument, but I want to see you backing up your claims. Last time I checked, that was not the case.

        • Thats a tough one to find a citation for. Essentially, at the three universities I have worked and two of them I have collaborated with they had between 6/10 and 8/10 clusters running AMD processors. Some had NVidia clusters as well. Some had ones running Intel but they were older systems or were in use by departments other than Math/CS. This evidence may be anecdotal, but two of the universities are larger ones with large research budgets. Exclusive was a bit of a exaggeration in hind sight.
          • by Idbar ( 1034346 )
            As you said, it's probably a market strategy. Every vendor focuses on certain companies/universities to sell their products. The one I worked for (what I guess is a 2nd-tier ranked university) received a lot of discounts depending on the vendors, and used to get lots of Intel based processor CPUs. Perhaps AMD is targeting more aggressively Tier 1 universities, while others take a wider range. That's why I asked.

            Of course many are interested in seeing their products advertised at top universities, while ot
      • by yuhong ( 1378501 )

        And slower will be I think solved with Interlagos, and Intel will have only Westmere-EX (Xeon E7) to compete since Sandy Bridge-EP is not even released yet. Now compare the already-released pricing [softpedia.com] of Opteron 6200 CPUs with Intel's current Xeon 7500/E7 pricing, and guess what will happen.

    • by QuantumRiff ( 120817 ) on Friday October 28, 2011 @12:55PM (#37871308)

      A dell R815 with 2 twelve-core AMD processors (although they were not bulldozer ones) 256GB of ram, and a pair of hard drives was $8k cheaper than a similarly configured Dell R810 with 2 10-core Intel Processors when we ordered a few weeks ago. That difference in price is enough to buy a nice Fusion-IO drive, which will make much, much more of a performance impact than a small percentage higher CPU speed

      • by 0123456 ( 636235 )

        Clearly AMD should be charging $4k more for their CPUs if they're leaving that big a gap between their price and Intel's.

        • by Surt ( 22457 )

          They're fighting reputation. If it was $4k more, they would probably lose too many sales to make up the price difference.

          • Maybe their reputation would be better if their processors would cost the same.
            Some people just think that something must be worse when it is cheaper.
            • Those kinds of people are very vulnerable to an optimistic young techie destroying their rep as a purchaser, or so my last two years of sales would suggest. I displaced someone who would only buy "the best", which in his view meant something 5x more expensive, and where every tech dispatch was accompanied by a sales guy, to work the purchaser while techie was busy installing the goods.

              If AMD can deliver better performance per $ and per watt in the server room, I'll consider them, and so will my clients if

        • by yuhong ( 1378501 )

          And AMD has also be trotting the death of the 4P tax on their blogs in the Opteron 6100 era, and there is no indication they will going to change that with Opteron 6200 anyway [softpedia.com].

      • A dell R815 with 2 twelve-core AMD processors (although they were not bulldozer ones) 256GB of ram, and a pair of hard drives was $8k cheaper than a similarly configured Dell R810 with 2 10-core Intel Processors when we ordered a few weeks ago.

        The Westmere-EX CPUs on the Dell R810 are recently released, and as such are very pricey. They are also much, much faster than any other Intel or AMD chip on a per-clock basis. Because the E7-88xx Xeons have nearly twice the cache (30MB "smart" vs. 24MB total L2 plus L3), are hyper-threaded, and run faster clock-for-clock, a heavily parallel task will likely finish faster on a single CPU Westmere-EX than on a dual CPU Magny-Cours.

        Because of this, the R810 is a much, much more powerful system than the R815

        • by yuhong ( 1378501 )

          Yes, but I have wondered for a while what will happen to the quad-socket market if AMD sticks to the same pricing policy with Interlagos. Remember that Intel is one generation behind with Westmere-EX, and Sandy Bridge-EP is not even released yet right now.

        • by yuhong ( 1378501 )

          And remember that Interlagos will be drop-in replacement for Magny-Cours.

      • by afidel ( 530433 )
        Apples to apples they cost difference between an R810 and R815 should be on the order of $200, not $8,000.
      • A 10-core Westmere-EX vs a 12-core Magny-Cours is much more than a "small percentage higher" - probably 30 or 40%, potentially higher depending on workload.
      • by blair1q ( 305137 )

        I highly doubt the price difference was because of the processors. More likely it was because Dell is having trouble moving those boxes because they're slower.

  • Perhaps I'm remembering incorrectly, but I thought part of the Bulldozer hype was that it had two 'real' cores and not hyperthreading, with only a few resources shared? Yet now it turns out that you have to treat it like a hyperthreading CPU or performance sucks.

    I still don't understand why AMD didn't just set the hyperthreading bit in the CPU flags, so Windows would presumably just treat it like a hyperthreading CPU in the first place.

    • by Sloppy ( 14984 )

      I thought part of the Bulldozer hype was that it had two 'real' cores and not hyperthreading,

      No, the hype is that it blurs the distinction between cores and hyperthreading. It's both and neither.

    • by laffer1 ( 701823 )

      It's not like hyper threading. For integer operations, the AMD chips are much better. What AMD doesn't have is two floating point units so that's what gets bogged down. There are two instruction decoders and two units to handle integer math, but one floating point unit per component.

      AMD's approach is faster for some workloads. The problem is that they didn't design it around how most people currently write software.

      I would have preferred AMD to implement hyper threading as it would have greatly simplifi

      • by 0123456 ( 636235 )

        It's not like hyper threading. For integer operations, the AMD chips are much better. What AMD doesn't have is two floating point units so that's what gets bogged down. There are two instruction decoders and two units to handle integer math, but one floating point unit per component.

        Ah, so this benchmark is floating point and that's why it's faster across multiple cores?

        I can't really see AMD convincing Microsoft to invest a lot of effort into dynamically tracking which threads use floating point and which don't and reassigning them appropriately. Maybe a flag on the thread to say whether it's using floating point or not at creation time would be viable, but then app developers won't bother to set it.

      • To me, Bulldozer's shared-FPU design looks rather like they wanted some of the specialized-workload advantage of the UltraSPARC T-series CPUs; but with somewhat less extreme trade-offs(The T1 had a single FPU shared between 8 physical cores, which proved to be a little too extreme and was beefed up in the T2). There are a fair number of server tasks that are FPU light; but have lots of threads, often do well with a lot of RAM, and are fairly cost sensitive.

        Not at all a good recipe for a workstation or sc
        • by DamonHD ( 794830 )

          A T1 is still working well for me: at most about 1 thread on my entire Web server system is doing any FP at all, and in places I switched to some light-weight integer fixed-point calcs instead. That now serves me well with the came code running on a soft-float (ie no FP h/w) ARMv5.

          So, for applications where integer performance and threading is far more important than FP, maybe AMD (and Sun) made the right decision...

          Rgds

          Damon

      • It's not like hyper threading. For integer operations, the AMD chips are much better. What AMD doesn't have is two floating point units so that's what gets bogged down. There are two instruction decoders and two units to handle integer math, but one floating point unit per component.

        It's a lot closer to hyper-threading than you think. The BD chips do *NOT* have two instruction decoders per module, just one. The only duplicated parts are the integer execution units and the L1 Data caches. The Instruction fetch+decode, L1 Instruction Cache, Branch prediction, FPU, L2 Cache and Bus interface are all shared.

        It's pretty obvious how limited each BD "core" really is given these benchmarks. AMD should have presented the CPU as having hyper-threading to the OS.

      • It's not like hyper threading. For integer operations, the AMD chips are much better. What AMD doesn't have is two floating point units so that's what gets bogged down. There are two instruction decoders and two units to handle integer math, but one floating point unit per component.

        The decoders are a shared resource in the Bulldozer core. That can be a significant bottleneck that affects integer code. Also, those integer sub-cores are still sharing a single interface to the L2 and higher up the memory hierarchy. So it's not all roses for integer apps.

        Speaking of memory hierarchy, the FX parts are, like FX parts of the past, just server chips slapped into a consumer package. So the cores being studied here all have pretty substantial L3s. One of the claimed benefits of putting rel

      • I would have preferred AMD to implement hyper threading as it would have greatly simplified things for OS developers. It's getting to a point where kernels have to know about CPU families in order to get the performance they need. They also have to know the workload.

        This an architecture designed for a ten year run, much like the original P6, which underwhelmed everyone with (at most) half a brain.

        Just how long do you think the OS can remain task agnostic as we head down the road to eight and sixteen core pr

        • by laffer1 ( 701823 )

          And how would the OS know the workload ahead of time? It's not like there are hints in the binary that it's going to be doing floating point work or that it's going to be CPU bound.

          Remember that the more complex we make scheduling, the slower it is. Schedulers have to be fast. There's only so much the OS can do to help out. As a programmer, we're taught that the hardware is a black box. We're supposed to assume it works correctly most of the time. There's a big difference between seeing a hyper thread

  • The article basically says "if your schedule threads to use less modules, dynamic turbo will clock those modules up, giving you a performance boost.

    so... anybody who is already clocking their entire cpu at top stable clock speed isn't going to get a boost out of thread scheduler modifications.

  • by robot256 ( 1635039 ) on Friday October 28, 2011 @12:34PM (#37871058)

    Sure, the scheduling change improves performance by 10-20% for certain tasks, but that still makes it 30-50% slower than an i7, and with more power consumption.

    I can't fault AMD for not having full third-party support for their custom features, since Intel had a head-start with hyperthreading, but if it will still be an inferior product even after support is added then I'm not going to buy it.

    • by h4rr4r ( 612664 )

      30% slower at what percentage of the cost?
      If it costs 50% as much as an i7 that might then be fine.

      • by AdamJS ( 2466928 )
        They generally cost between 8% less and 20% MORE than their closest performance equivalents (hard to use that word since the gap is still pretty noticeable). That's sort of part of the problem.
      • An i2600k is only 15% more expensive has a 25% lower tdp and blows away the fx-8150 in most of the benchmarks. Even with this tweak it'll still barely compete and the 2600k has half as many real cores and a lower clock speed.

  • by Animats ( 122034 ) on Friday October 28, 2011 @01:00PM (#37871408) Homepage

    This is really more of an OS-level problem. CPU scheduling on multiprocessors needs some awareness of the costs of an interprocessor context switch. In general, it's faster to restart a thread on the same processor it previously ran on, because the caches will have the data that thread needs. If the thread has lost control for a while, though, it doesn't matter. This is a standard topic in operating system courses. An informal discussion of how Windows 7 does it [blogspot.com] is useful.

    Windows 7 generally prefers to run a thread on the same CPU it previously ran on. But if you have a lot of threads that are frequently blocking, you may get excessive inter-CPU switching.

    On top of this, the Bulldozer CPU adjusts the CPU clock rate to control power consumption and heat dissipation. If some cores can be stopped, the others can go slightly faster. This improves performance for sequential programs, but complicates scheduling.

    Manually setting processor affinity is a workaround, not a fix.

  • Windows is not exactly known for its multi-processor (multi-core) scalability.

    Repeat the test with a real OS (Linux, Solaris...) and I'll be interested, especially Solaris x86 since it is known to be the best at scaling on parallel hardware.

  • by unity100 ( 970058 ) on Friday October 28, 2011 @01:10PM (#37871612) Homepage Journal
    applications, like photosop cs5 or truecrypt, including some more :

    http://www.overclock.net/amd-cpus/1141562-practical-bulldozer-apps.html [overclock.net]

    also, if you set your cpuid to genuineintel in some of the benchmark programs, you will get suprising results :

    try changing cpuid=genuineintel for +47% INCREASE IN SCORES.

    changing cpuid to GenuineIntel nets 47.4% increase in performance:
    [url]http://www.osnews.com/story/22683/Intel_Forced_to_Remove_quot_Cripple_AMD_quot_Function_from_Compiler_[/url]

    PCMark/Futuremark rigged bentmark to favor intel:
    [url]http://www.amdzone.com/phpbb3/viewtopic.php?f=52&t=135382#p139712[/url] [url]http://arstechnica.com/hardware/reviews/2008/07/atom-nano-review.ars/6[/url]

    intel cheating at 3DMark vantage via driver: [url]http://techreport.com/articles.x/17732/2[/url]

    relying on bentmarks to "measure performance" is a fool's errand. dont go there.

    • by yuhong ( 1378501 )

      It is time for some reverse engineering of the benchmark programs I think to see what exactly is happening.

      • by Anonymous Coward

        Here's Agner Fog's page about this issue. [agner.org]

        The Intel compiler (for many years and many versions) has generated multiple code paths for different instruction sets. Using the lame excuse that they don't trust other vendors to implement the instruction set correctly, the generated executables detect the "GenuineIntel" CPU vendor string and deliberately cripple your program's performance by not running the fastest codepaths unless your CPU was made by Intel. So e.g. if you have an SSE4-capable AMD CPU, it will

        • by yuhong ( 1378501 )

          As a note, I remember seeing one of Intel's libraries used in MSHTML.DLL in MS's own IE9 when I was disassembling it with IDA.

    • by blair1q ( 305137 )

      >relying on bentmarks to "measure performance" is a fool's errand. dont go there.

      And yet, that's what you're doing.

      The correct phrase is: Relying on benchmarks that are not relevant to your application is a fool's errand.

    • by makomk ( 752139 )

      One fun side note: notice how that link says "it will fail to recognize future Intel processors with a family number different from 6". Intel have conspicuously kept the family number reported by CPUID at 6 on their new processors in order not to trigger a fallback to the non-Intel pathway that AMD processors get to use, presumably because they know how much that'll harm them in benchmarks and how bad the reviews will look.

Life is a whim of several billion cells to be you for a while.

Working...