Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Xeons, Opterons Compared in Power Efficiency

Posted by Zonk on Fri Dec 15, 2006 09:56 AM
from the nothing-like-some-friday-morning-power-analysis dept.
Bender writes "The Tech Report has put Intel's 'Woodcrest' and quad-core 'Clovertown' Xeons up against AMD's Socket F Opterons in a range of applications, including widely multithreaded tests from academic fields like computational fluid dynamics and proteomics. They've also attempted to quantify power efficiency in terms of energy use over over time and energy use per task, with some surprising results." From the article: "On the power efficiency front, we found both Xeons and Opterons to be very good in specific ways. The Opteron 2218 is excellent overall in power efficiency, and I can see why AMD issued its challenge. Yes, we were testing the top speed grade of the Xeon 5100 and 5300 series against the Opteron 2218, but the Opteron ended up drawing much less power at idle than the Xeons ... We've learned that multithreaded execution is another recipe for power-efficient performance, and on that front, the Xeons excel. The eight-core Xeon 5355 system managed to render our multithreaded POV-Ray test scene using the least total energy, even though its peak power consumption was rather high, because it finished the job in about half the time that the four-way systems did. Similarly, the Xeon 5160 used the least energy in completing our multithreaded MyriMatch search, in part because it completed the task so quickly. "
+ -
story
This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by Salvance (1014001) * on Friday December 15 2006, @09:59AM (#17255084) Homepage Journal
    AMD needs to deliver some real quad core chips (or 8 core chips) that will beat Intel's performance. If they don't soon, AMD will quickly get kicked back to the 2nd rate Intel cloner that everyone knew them prior to their groundbreaking AMD 64s and dual core chips briefly took the performance lead from Intel. I'm keeping my fingers crossed that AMD will deliver, I've always liked (and bought) their chips as long as the performance is similar to Intel.
    • by msobkow (48369) on Friday December 15 2006, @11:15AM (#17256500) Journal

      I know of and have worked with too many organizations that figure it's just a matter of slapping all the computers in an air-conditioned room. Every watt of waste heat adds to the A/C bill.

      Old fashioned water-cooled mainframes and big iron (for it's time) often recirculated the wasted heat into the heating systems of the surrounding buildings. We've known all along how to be more energy efficient, if companies and management would only place the emphasis on the environment in their budgets.

    • Re: (Score:3, Insightful)

      Evidently you didn't read the review. Intel has serious problems for large scale computing. It does not scale up. It's fine as a thread engine for processing small transactions, but for the kind of problems that people like Google and NCAR are doing -- and it is people like that who drive some very large CPU buys -- the external MMU bites their ass every time. Is the current generation of Opterons a gamer buy? No. AMD probably won't dominate the gamer market until a high-end GPU is integrated on die a
  • AMD's path (Score:4, Insightful)

    by homey of my owney (975234) on Friday December 15 2006, @10:01AM (#17255120)
    AMD needs to do what they have been doing - thinking independently and coming up with original solutions.
  • by pla (258480) on Friday December 15 2006, @10:05AM (#17255200) Journal
    the Opteron ended up drawing much less power at idle than the Xeons
    ...
    the Xeon 5160 used the least energy in completing our multithreaded MyriMatch search, in part because it completed the task so quickly.

    So what does this mean for people shopping for servers?

    If your servers constantly tick along at nearly 100% CPU use, you might do better going with the Xeon system. If your machines basically sit idle most of the time with an occasional spike for a few seconds when it actually does something, the AMD would save you more on electricity.

    Of course, this raises a third possibility - Would running a number of virtual servers on one large Xeon machine waste more energy than it saves, or give a net gain?
    • Re: (Score:3, Insightful)

      Although some people will pipe in with their number crunching sever stories, are there any normal usage servers that really come in at 100% CPU usage? For the 20 odd servers I run few ever run at that rate for more than 30 minutes a day or so - and usually doing backups for that matter. Other system components often keep you from reaching that target, and most 24-7 servers I've seen do most of their work during a certain period then spend the rest of their time twiddling their thumbs.
      • Best Practices (Score:5, Insightful)

        by killmenow (184444) on Friday December 15 2006, @10:47AM (#17255956)
        It has always been my understanding that best practices dictate a server running at a constant 100% CPU utilization is underpowered and needs upgraded. Normal, every day, steady CPU utilization should hover no higher than around 50% (closer to 75%, if you like living on the edge) leaving enough CPU to handle peak loads. Very few functions require a system that maintains a constant CPU utilization and never peaks over it.
      • Re: (Score:2, Insightful)

        by Anonymous Coward
        Any server running at that rate for more than a few short peaks a day is under capacity. Ideally, you'd like to keep them at 100% but you don't control scheduling of server demand. It's too ad-hoc. You trend then build enough excess capacity to handle projected peak loads. Of course, this depends on the level of service you want to deliver. Most server "customers" expect the server to be always as responsive as it can be, regardless of load. (expectation of IT is always 100% all the time). So server
      • are there any normal usage servers that really come in at 100% CPU usage?

        The anti-spam filters at my place of employment (two machines, each with a single 2.6GHz Xeon). That's why we are replacing them with two machines, each using two dual-core Xeons, for 4x the CPU power.

      • Re: (Score:3, Interesting)

        Although some people will pipe in with their number crunching sever stories, are there any normal usage servers that really come in at 100% CPU usage?

        For capacity planning purposes, most of my clients target 40-50% CPU utilization on servers. If it starts creeping above 60% on a consistent basis (or is forecasted to do so soon), they begin the acquisition process to either upgrade or add servers.

        Queuing theory (M/M/1) shows that while the average response time doesn't increase that much, the standard

    • 4 core opteron x2 vs. 8 core xeon x1

      Fron the article, the idle power consumption of the 8 core xeon is ~230W. 4 core opteron us ~120W.

      Which means, at idle, the single 8 way xeon is better than 2 4 way opterons. Given that the efficiency of the 8 way under load is better than the 4-way, I would think that stacking on the 8-way is better.

      Of course, having two 4 way independant systems is better redundancy. On the other hand, the 8 way can be utilized to solve SMP multithread problems (without the expense of h
      • Re: (Score:3, Interesting)

        If I'm do General Purpose computing I would trade the 10W difference in power consumption for the redundancy and flexibility of the 4-way Opteron. With two 4 way boxes you can use one as the failover for the other, or load balance between them keeping low CPU use on each. General purpose computing really doesn't need the power of an 8-way SMP solution even with 1000's of users. You can virtualize either the 4 way or the 8 way with VMWare or Zen or Solaris Containers so that (IMHO) is a wash.

        It's really back
    • "If your machines basically sit idle most of the time with an occasional spike for a few seconds when it actually does something, the AMD would save you more on electricity."

      More importantly, I think, is that power consumption translates to heat output. If you have mostly idle servers with occasional spikes, you can either cool them for less or put more in the same space depending on what you need. And don't forget that you actually save money twice with the AMD since you have to pay to power and cool the
    • by rbanffy (584143) on Friday December 15 2006, @11:47AM (#17257130) Homepage
      Well... If you have a couple servers that idle most of the time, I suggest that, instead of AMD, you buy VMWare.

      Or go Xen, OpenVz or whatever does the trick.

      But, most important, get rid of the idling boxes.
  • This just in! (Score:5, Insightful)

    by gentimjs (930934) on Friday December 15 2006, @10:10AM (#17255286) Journal
    Apples compared to Oranges: Our findings on the page after the banner adds!
    .. nothing to see here, move along...
  • "The eight-core Xeon 5355 system managed to render our multithreaded POV-Ray test scene using the least total energy, even though its peak power consumption was rather high, because it finished the job in about half the time that the four-way systems did. Similarly, the Xeon 5160 used the least energy in completing our multithreaded MyriMatch search, in part because it completed the task so quickly."

    Presumably, the article tests power consumption because businesses are concerned with how much running each o

    • "Presumably, the article tests power consumption because businesses are concerned with how much running each of these systems will cost them. If the Xeons managed to win in power consumption because they completed the task in half the time, that has other cost-saving benefits even beyond power consumption. "

      The benchmarks chosen have very little to do with the real business world.
      They mostly demonstrate the effect of Intel's larger CPU caches on performance.

      Choose a series of applications(p

      • Sounds like you're talking about server use while they tested workstation use. It looks like they called it "server/workstation" class, whatever that means.
  • oracle datacenter (Score:4, Informative)

    by chap_hyd (717718) on Friday December 15 2006, @10:39AM (#17255834) Homepage
    one friend who works for oracle, in their datacenter, told me that they are swaping the dell intel xeon server with Sun AMD Opteron servers. the main reason behind this server swap is power efficiency of the new sun servers. So that means big corps already had their eye on AMD cpus :)
    • it doesn't make any sense to swap out a working and functional server running intel chips with one running AMD purely for power saving, because electricity is a relatively small of the lifetime cost of a server, until

      • the server no longer has adequate spare capacity and would be upgraded
      • you're beginning to overload your power or cooling grid, and its cheaper to regrade your servers (which can be deployed elsewhere) than change the power grid or fix your air-con

      it's a similar problem for car users - for

      • it doesn't make any sense to swap out a working and functional server running intel chips with one running AMD purely for power saving, because electricity is a relatively small of the lifetime cost of a server, until

        • the server no longer has adequate spare capacity and would be upgraded
        • you're beginning to overload your power or cooling grid, and its cheaper to regrade your servers (which can be deployed elsewhere) than change the power grid or fix your air-con

        it's a similar problem for car users - for an average vehicle doing 25mpg, about half the energy of its lifetime of making, using, and recycling/scrap is consumed when making.. environmentally it's best to fix up an old car so it runs properly with minimal emissions than generate a lot of scrap metal & plastics and incur the environmental costs of mining/refining metals, drilling for oil for plastics, manufacture etc of a new car.

        Considering that Xeons have been around for years now, for all the parent stated these could be old 1Ghz or slower Xeon based servers. Rather than upgrading to the latest, they decided to switch platforms, which would meet your criteria.

        However, I disagree with your statement that the cost to power a server is a small fraction of its cost. A basic server, costing about $4k (nothing fancy), running 24x7x365.25 at about 300Watts, will use 18408.6 KWH in one year. At $0.07/KWH, thats $1288.60 per year just

  • I'd like to see these efficiency curves plotted against 100%, the maximum theoretical efficiency of the transfer function through the semiconductors. Anyone know how to calculate the minimum W:b (watts per bit) necessary for these real-world tasks? Or is that just way too complex a stat to compute without melting the datacenter at which it's computed?
  • With the intel chip set there is only 2 x8 pci-e lanes coming out of the north bridge and sas / sata-2 , pci-x, networking, as well as the pci-e slots on the board have to share them.

    So with a lot of network use and disk use you can choke up that bus.
  • Here is one test that needs to be done take a duel amd opteron workstation with 2 Quadro cards in sli and also put in a raid 5 sas or sata setup also do some networking at the same time. There are duel and quad amd opteron boards with nForce Professional chip sets. some have 4 pci-e slots x16 x8 x8 x16 with each half coming from a HTT link.

    Also take a duel intel workstation and try to do the same thing the best that you can find is x8 x8

    Use hacked sli drivers is ok.

    I think that the amd system will do better
  • http://techreport.com/reviews/2006q4/xeon-vs-opte r on/index.x?pg=7 [techreport.com]

    Very interesting. The benchmark uses a database and is the only one I've seen that seems to test the limits of the CPU cache with a database.. and low and behold, at 8 threads, performance degrades for the 5355 and it's actually slower than the opteron 2218.

    Or it could just be that this benchmark isn't coded well - it might use a global lock frequently so as you add more threads there's more contention. In any case someone with more time than
  • by Splork (13498) on Friday December 15 2006, @01:51PM (#17259094) Homepage
    See http://electricrain.com/greg/opteron-powersave.txt [electricrain.com].

    All AMD K8 (Opteron and Athlon 64) CPUs have the ability to run the clock and an extra slow speed when in HLT (idle) mode saving a bunch more power. Many (most?) BIOSes are not smart enough to enable this. A simple setpci command will turn it on under linux.

    find out if its on:

      setpci -d 1022:1103 87.b

    If that returns 00, its off. To turn on clock-divide-in-hlt to div by 512 mode use:

      setpci -d 1022:1103 87.b=61

    (see the above URL for links to the AMD documentation on the PMM7 register; other values can work).
  • Up here in The Great White North, there is a second important feature (mostly for desktop and deskside systems) -- and that's efficiency as a space heater. When these boxes are running at full bore, how many BTUs do they generate, and how many BTUs/watt do they generate. How many Zeons or K7s would it take to heat the average house?
    More importantly, how does that compare to a dedicated space-heater?
    • /me hugs his ultrasparc system
      Couldnt agree more. Oh wait, something's sending an Int. Req. , cant type have; to see what it wants.....
    • Looks like Cell and Power are our only hope.

      80x86 may be ugly, but it's cheap for the processing power and has an entrenched economy of scale. It sucks. Even Apple switched from PowerPC and is now making glorified Wintel clone boxes (though with a pretty nifty feature set).

      -b.

    • Re: (Score:3, Interesting)

      Aren't newer x86 processors essentially CISC that convert the instructions down to RISC? And RISC processors, like G4/G5, that use instruction sets such as Altivec are actually using some aspects of CISC?

      That was my understanding, after reading articles like this one on Ars Technica [arstechnica.com]. If true, it would make fighting over CISC vs. RISC not make a lot of sense.
      • Its not just risc vs cisc ... the whole x86 system is based around resource-fumbling bus sybsystems. When you get down to it, the whole motto of x86 really could be "get in line, and wait" .. its 1970s era crap.

        The fact that the CPU now runs at 324236GHz and can chew the math nice and fast doesnt alter the fact that the -rest- of the system (A20 gateway stuck on the KB controller and such.. ahem..) deserves to go the way of Wang...
        I've always been a fan of systems like MIPS and Ultrasparc: Engineered r
        • So uh, this memory-mapped IO that I'm using instead of emulated PIO, and these programmable DMA controllers, and the cascading interrupt muliplexer, and this hybercube bus with cache coherency... that all is just a figment of my imagination.

          Meanwhile my Sun has OH LOOK, a crossbar, and MY GOD! this newfangled PCI bus. WHAT HATH SCIENCE DONE?
      • You are correct.
        It does:

        push BP
        mov BP,SP
        sub SP, 10

        and

        mov SP,BP
        pop BP

        internally very quickly as RISC instructions. It's still 5 bus cycles.
    • bizzaro CISC instruction set
      1994 called, they want their architecture debate back.
    • by Ancil (622971) on Friday December 15 2006, @10:36AM (#17255778)
      bizzaro CISC instruction set piece of shite
      I guess you didn't get the memo. Turns out RISC wasn't the good idea everyone thought it would be in the 1990's.

      RISC worked well when speed of memory and CPU's were at parity. The simplified instructions let the CPU be clocked a lot faster, not to mention their shallow pipelines made it less costly when branch prediction failed. The tradeoff was that it usually took more instructions to accomplish a given task.

      But as CPU's have spent more and more time waiting for memory, CISC has really come into its own. Think of CISC as a compression algrorithm: An x86 instruction which fits in 16-32 bits might take 4 or 5 instructions on a RISC processor, weighing in at 96-128 bits. It's no surprise why CISC processors have destroyed RISC in the past decade.
      • Re: (Score:3, Insightful)

        What I'm really referring to here is the extreme non-orthogonality of the ISA and the register set. I'm certainly not a purist when it comes to what individual instructions are allowed to do, but there's a lot to be said for having instructions all be the same width.
        • Re: (Score:2, Insightful)

          by Anonymous Coward
          This is foolish. Variable-width instructions provide higher instruction throughput by having lower memory bandwidth requirements and consuming less cache space. You want to code your instructions so that the most-frequently used instructions are as small as possible. This has been an active area of research for tailoring ISAs to workloads, but even an ad-hoc scheme that improves those two areas in the general case is better than none at all.

          This coding is more complicated than fixed-width instructions, but
          • You're forgetting the basic formula from Hennessy and Patterson:

            WorkPerSec = WorkPerInstruction * InstructionsPerCycle * CyclesPerSecond

            Yes, CISC has better work per instruction, except for one glaring issue I'll get to in a moment, but - for various reasons explained throughout H&P - it loses on the other two and thus overall. That's why nobody's making new processors that are CISC internally any more; they just couldn't hit the issue widths and clock speeds are achievable with a RISC core (even if that core has a CISC ISA bolted on the front). What's missing here is that not all work is useful work. As anyone who has accidentally coded an infinite loop knows, executing lots of instructions is not necessarily a good thing. The glaring issue I mentioned earlier is that a lot of the instructions executed on a register-poor architecture like x86 are not doing useful work. Register thrashing means i-cache bandwidth is wasted fetching instructions which are then used to waste d-cache bandwidth, which more than outweighs any advantage from variable-length instructions.

            So, you say, wouldn't variable-length instructions on a register-rich processor be the best of both worlds? Not so fast. A regular instruction set makes superscalar execution easier because it means that multiple instructions can be fetched literally at the same time without having to examine the first one to figure out where the second one begins and so on. It also makes deeper pipelines easier because it allows many internal activities (e.g. register allocation, hazard detection) to start after a simple pre-decode stage, in parallel with the remainder of decode. Either way, regular instruction sets allow for more parallelism - and parallelism in some form is the generally the key to CPU performance. If you're willing to give up performance by eschewing most modern processor-design techniques, which might be the case for a deeply embedded system with extreme size and/or power requirements, then variable-width instructions might still be a reasonable choice. In that case you might as well use an older architecture; there are plenty to choose from. For new processor designs, though, variable-width instructions are almost invariably a way to lose.

      • You've accepted a fallacy of false dichotomy. While the 90s posed a dilemma of RISC vs. CISC, modern hardware architectures are more akin to VLIW. The ISA may be a stack machine, much to the dismay of compiler writers everywhere, but that is flattened into a superscalar VLIW microcode stream.
        • How exactly is it VLIW? We're still only issuing one instruction per clock per core. The fact that it breaks down into micro-ops, some of which may be executed in parallel, still doesn't make it VLIW.
      • An x86 instruction which fits in 16-32 bits might take 4 or 5 instructions on a RISC processor,

        Do you have evidence to back that up? From the limited amount that I've seen, the opposite seems to be true - one or two instructions on an ARM or MIPS processor can neatly do what takes several instructions of fumbling on an i386. Partly this is because of more registers accessible at the instruction level, and partly because of a more orthogonal instruction set.

        You could compare the size of object code spat ou

      • Re: (Score:3, Insightful)

        It's no surprise why CISC processors have destroyed RISC in the past decade.

        Sorry but CISC, specifically x86 and children, has won simply by being the architecture for which most software was written. The dominance of CISC is similar to (but not the same, trying to stave off an off-topic rant) story as the dominance of Windows -- backward compatability is King.

        The RISC makers knew this too. Back when RISC was the hot new thing in the early 90s, they were touting that RISC would be so much faster than CISC
    • ***It's almost 2007 and we're still hanging bags on the side of the 8080. No matter how many cores, caches or pipelines, no matter the clock rate, it's still the same-old same-old single-accumulator, bizzaro CISC instruction set piece of shite.***

      Please don't get the idea that I'm defending the Intel x86 instruction set. When I first saw it in the early 1980s, I thought it was the most gawdawful mess I'd seen in 25 years in the business (I wrote my first assembler code in 1960). It hasn't improved any w

      • [My candidate for the best microcomputer instruction set from the programmer's POV -- hands down, the MC6809]

        Amen, brother! While I haven't been coding for quite as long as you (for me, it was 1976 when I started), I've used a hefty number of instruction sets and designed a handful myself. The 6809 was always my favorite. I still have a well-worn copy of the 6800 instruction set manual in my library; so clear, so beautiful. This was back when instruction set design was based purely on merit (what is the
      • Complex instructions reduce the overall code size), reducing the need for code cache and RAM. Especially with 64 bit architectures this makes a big difference. Instead of 8 byte RISC instructions, the average instruction size is probably closer to 3 or 4 bytes (not including immediate values, which of course in 80x86 can be smaller than the machine word size). Obviously RISC chips can be designed with small instruction word sizes, and for instance a pretty good RISC instruction set could live in 32 bit word
    • It looks to me that the Instruction Set War (CISC vs RISC) is pretty much lost. Nobody cares about the instruction set. Microsoft is not the culprit. CISC processors just got fast, much faster than many RISC processors. These days what makes a CPU fast is what there's inside, not the instruction set.
    • Actually, don't rule out "something completely different" from Intel now that Apple is a partner. Intel has been trying for more than a decade to break out of the boring beige box business that Microsoft drug them into. Sure, it's been VERY profitable up to this point, but there's a curve in the road and something must be done. I strongly believe that Intel and Apple will come up with a hardware solution that will clearly differentiate the Mac from other Intel-based products. Don't know when this might
    • It's not going anywhere. Intel actually wanted to replace it though it's arguable if their replacement was better or worse but AMD won out the 64-bit round with x86-64. That's what Linux uses, that's what Windows uses, it's a done deal.

      Now personally to me you sound like someone who's spent a little too much time in a computer science architecture class soaking up theories about ISAs and too little time actually looking at how chips are made these days and what works. When you get right down to it, x86 work