Slashdot Log In
Xeons, Opterons Compared in Power Efficiency
Posted by
Zonk
on Fri Dec 15, 2006 09:56 AM
from the nothing-like-some-friday-morning-power-analysis dept.
from the nothing-like-some-friday-morning-power-analysis dept.
Bender writes "The Tech Report has put Intel's 'Woodcrest' and quad-core 'Clovertown' Xeons up against AMD's Socket F Opterons in a range of applications, including widely multithreaded tests from academic fields like computational fluid dynamics and proteomics. They've also attempted to quantify power efficiency in terms of energy use over over time and energy use per task, with some surprising results." From the article: "On the power efficiency front, we found both Xeons and Opterons to be very good in specific ways. The Opteron 2218 is excellent overall in power efficiency, and I can see why AMD issued its challenge. Yes, we were testing the top speed grade of the Xeon 5100 and 5300 series against the Opteron 2218, but the Opteron ended up drawing much less power at idle than the Xeons ... We've learned that multithreaded execution is another recipe for power-efficient performance, and on that front, the Xeons excel. The eight-core Xeon 5355 system managed to render our multithreaded POV-Ray test scene using the least total energy, even though its peak power consumption was rather high, because it finished the job in about half the time that the four-way systems did. Similarly, the Xeon 5160 used the least energy in completing our multithreaded MyriMatch search, in part because it completed the task so quickly. "
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Full
Abbreviated
Hidden
Loading... please wait.
AMD needs to get back in the game, quick (Score:5, Insightful)
Business needs to pay attention (Score:5, Insightful)
I know of and have worked with too many organizations that figure it's just a matter of slapping all the computers in an air-conditioned room. Every watt of waste heat adds to the A/C bill.
Old fashioned water-cooled mainframes and big iron (for it's time) often recirculated the wasted heat into the heating systems of the surrounding buildings. We've known all along how to be more energy efficient, if companies and management would only place the emphasis on the environment in their budgets.
Parent
Re: (Score:3, Insightful)
AMD's path (Score:4, Insightful)
Hmm, so which better reflects real-world usage? (Score:5, Interesting)
the Xeon 5160 used the least energy in completing our multithreaded MyriMatch search, in part because it completed the task so quickly.
So what does this mean for people shopping for servers?
If your servers constantly tick along at nearly 100% CPU use, you might do better going with the Xeon system. If your machines basically sit idle most of the time with an occasional spike for a few seconds when it actually does something, the AMD would save you more on electricity.
Of course, this raises a third possibility - Would running a number of virtual servers on one large Xeon machine waste more energy than it saves, or give a net gain?
Re: (Score:3, Insightful)
Best Practices (Score:5, Insightful)
Parent
Re: (Score:2, Insightful)
Re: (Score:2)
The anti-spam filters at my place of employment (two machines, each with a single 2.6GHz Xeon). That's why we are replacing them with two machines, each using two dual-core Xeons, for 4x the CPU power.
Re: (Score:3, Interesting)
For capacity planning purposes, most of my clients target 40-50% CPU utilization on servers. If it starts creeping above 60% on a consistent basis (or is forecasted to do so soon), they begin the acquisition process to either upgrade or add servers.
Queuing theory (M/M/1) shows that while the average response time doesn't increase that much, the standard
Re: (Score:2)
Fron the article, the idle power consumption of the 8 core xeon is ~230W. 4 core opteron us ~120W.
Which means, at idle, the single 8 way xeon is better than 2 4 way opterons. Given that the efficiency of the 8 way under load is better than the 4-way, I would think that stacking on the 8-way is better.
Of course, having two 4 way independant systems is better redundancy. On the other hand, the 8 way can be utilized to solve SMP multithread problems (without the expense of h
Re: (Score:3, Interesting)
It's really back
Power = Heat (Score:3, Insightful)
More importantly, I think, is that power consumption translates to heat output. If you have mostly idle servers with occasional spikes, you can either cool them for less or put more in the same space depending on what you need. And don't forget that you actually save money twice with the AMD since you have to pay to power and cool the
Re:Hmm, so which better reflects real-world usage? (Score:5, Insightful)
Or go Xen, OpenVz or whatever does the trick.
But, most important, get rid of the idling boxes.
Parent
This just in! (Score:5, Insightful)
Conclusions converted to $$$ (Score:2, Interesting)
Presumably, the article tests power consumption because businesses are concerned with how much running each o
Re: (Score:2)
"Presumably, the article tests power consumption because businesses are concerned with how much running each of these systems will cost them. If the Xeons managed to win in power consumption because they completed the task in half the time, that has other cost-saving benefits even beyond power consumption. "
The benchmarks chosen have very little to do with the real business world.
They mostly demonstrate the effect of Intel's larger CPU caches on performance.
Choose a series of applications(p
Re: (Score:2)
oracle datacenter (Score:4, Informative)
Re: (Score:2)
it doesn't make any sense to swap out a working and functional server running intel chips with one running AMD purely for power saving, because electricity is a relatively small of the lifetime cost of a server, until
it's a similar problem for car users - for
Re: (Score:2)
it doesn't make any sense to swap out a working and functional server running intel chips with one running AMD purely for power saving, because electricity is a relatively small of the lifetime cost of a server, until
it's a similar problem for car users - for an average vehicle doing 25mpg, about half the energy of its lifetime of making, using, and recycling/scrap is consumed when making.. environmentally it's best to fix up an old car so it runs properly with minimal emissions than generate a lot of scrap metal & plastics and incur the environmental costs of mining/refining metals, drilling for oil for plastics, manufacture etc of a new car.
Considering that Xeons have been around for years now, for all the parent stated these could be old 1Ghz or slower Xeon based servers. Rather than upgrading to the latest, they decided to switch platforms, which would meet your criteria.
However, I disagree with your statement that the cost to power a server is a small fraction of its cost. A basic server, costing about $4k (nothing fancy), running 24x7x365.25 at about 300Watts, will use 18408.6 KWH in one year. At $0.07/KWH, thats $1288.60 per year just
Info Power (Score:2)
Chip sets for AMD are better (Score:2)
So with a lot of network use and disk use you can choke up that bus.
Test idea (Score:2)
Also take a duel intel workstation and try to do the same thing the best that you can find is x8 x8
Use hacked sli drivers is ok.
I think that the amd system will do better
The "MyriMatch" benchmark shows intel is slower (Score:2)
Very interesting. The benchmark uses a database and is the only one I've seen that seems to test the limits of the CPU cache with a database.. and low and behold, at 8 threads, performance degrades for the 5355 and it's actually slower than the opteron 2218.
Or it could just be that this benchmark isn't coded well - it might use a global lock frequently so as you add more threads there's more contention. In any case someone with more time than
HOWTO: save 20W/socket when idle on Opteron or A64 (Score:5, Informative)
All AMD K8 (Opteron and Athlon 64) CPUs have the ability to run the clock and an extra slow speed when in HLT (idle) mode saving a bunch more power. Many (most?) BIOSes are not smart enough to enable this. A simple setpci command will turn it on under linux.
find out if its on:
setpci -d 1022:1103 87.b
If that returns 00, its off. To turn on clock-divide-in-hlt to div by 512 mode use:
setpci -d 1022:1103 87.b=61
(see the above URL for links to the AMD documentation on the PMM7 register; other values can work).
What About Efficiency as a Space Heater (Score:3, Funny)
More importantly, how does that compare to a dedicated space-heater?
Re: (Score:3, Funny)
Couldnt agree more. Oh wait, something's sending an Int. Req. , cant type have; to see what it wants.....
Re: (Score:2)
80x86 may be ugly, but it's cheap for the processing power and has an entrenched economy of scale. It sucks. Even Apple switched from PowerPC and is now making glorified Wintel clone boxes (though with a pretty nifty feature set).
-b.
Re: (Score:3, Interesting)
That was my understanding, after reading articles like this one on Ars Technica [arstechnica.com]. If true, it would make fighting over CISC vs. RISC not make a lot of sense.
Re: (Score:2)
The fact that the CPU now runs at 324236GHz and can chew the math nice and fast doesnt alter the fact that the -rest- of the system (A20 gateway stuck on the KB controller and such.. ahem..) deserves to go the way of Wang...
I've always been a fan of systems like MIPS and Ultrasparc: Engineered r
Oh really? (Score:2)
Meanwhile my Sun has OH LOOK, a crossbar, and MY GOD! this newfangled PCI bus. WHAT HATH SCIENCE DONE?
Re: (Score:2)
It does:
push BP
mov BP,SP
sub SP, 10
and
mov SP,BP
pop BP
internally very quickly as RISC instructions. It's still 5 bus cycles.
Re: (Score:2, Funny)
Re:God, I'm sick of this architecture (Score:5, Informative)
RISC worked well when speed of memory and CPU's were at parity. The simplified instructions let the CPU be clocked a lot faster, not to mention their shallow pipelines made it less costly when branch prediction failed. The tradeoff was that it usually took more instructions to accomplish a given task.
But as CPU's have spent more and more time waiting for memory, CISC has really come into its own. Think of CISC as a compression algrorithm: An x86 instruction which fits in 16-32 bits might take 4 or 5 instructions on a RISC processor, weighing in at 96-128 bits. It's no surprise why CISC processors have destroyed RISC in the past decade.
Parent
Re: (Score:3, Insightful)
Re: (Score:2, Insightful)
This coding is more complicated than fixed-width instructions, but
Re:God, I'm sick of this architecture (Score:4, Interesting)
You're forgetting the basic formula from Hennessy and Patterson:
Yes, CISC has better work per instruction, except for one glaring issue I'll get to in a moment, but - for various reasons explained throughout H&P - it loses on the other two and thus overall. That's why nobody's making new processors that are CISC internally any more; they just couldn't hit the issue widths and clock speeds are achievable with a RISC core (even if that core has a CISC ISA bolted on the front). What's missing here is that not all work is useful work. As anyone who has accidentally coded an infinite loop knows, executing lots of instructions is not necessarily a good thing. The glaring issue I mentioned earlier is that a lot of the instructions executed on a register-poor architecture like x86 are not doing useful work. Register thrashing means i-cache bandwidth is wasted fetching instructions which are then used to waste d-cache bandwidth, which more than outweighs any advantage from variable-length instructions.
So, you say, wouldn't variable-length instructions on a register-rich processor be the best of both worlds? Not so fast. A regular instruction set makes superscalar execution easier because it means that multiple instructions can be fetched literally at the same time without having to examine the first one to figure out where the second one begins and so on. It also makes deeper pipelines easier because it allows many internal activities (e.g. register allocation, hazard detection) to start after a simple pre-decode stage, in parallel with the remainder of decode. Either way, regular instruction sets allow for more parallelism - and parallelism in some form is the generally the key to CPU performance. If you're willing to give up performance by eschewing most modern processor-design techniques, which might be the case for a deeply embedded system with extreme size and/or power requirements, then variable-width instructions might still be a reasonable choice. In that case you might as well use an older architecture; there are plenty to choose from. For new processor designs, though, variable-width instructions are almost invariably a way to lose.
Parent
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Do you have evidence to back that up? From the limited amount that I've seen, the opposite seems to be true - one or two instructions on an ARM or MIPS processor can neatly do what takes several instructions of fumbling on an i386. Partly this is because of more registers accessible at the instruction level, and partly because of a more orthogonal instruction set.
You could compare the size of object code spat ou
Re: (Score:3, Insightful)
Sorry but CISC, specifically x86 and children, has won simply by being the architecture for which most software was written. The dominance of CISC is similar to (but not the same, trying to stave off an off-topic rant) story as the dominance of Windows -- backward compatability is King.
The RISC makers knew this too. Back when RISC was the hot new thing in the early 90s, they were touting that RISC would be so much faster than CISC
Re: (Score:2)
Please don't get the idea that I'm defending the Intel x86 instruction set. When I first saw it in the early 1980s, I thought it was the most gawdawful mess I'd seen in 25 years in the business (I wrote my first assembler code in 1960). It hasn't improved any w
Re: (Score:2)
Amen, brother! While I haven't been coding for quite as long as you (for me, it was 1976 when I started), I've used a hefty number of instruction sets and designed a handful myself. The 6809 was always my favorite. I still have a well-worn copy of the 6800 instruction set manual in my library; so clear, so beautiful. This was back when instruction set design was based purely on merit (what is the
80x86 has the benefit of code size (Score:2)
Re: (Score:2)
Re: (Score:2)
Well too bad get used to it (Score:3, Interesting)
Now personally to me you sound like someone who's spent a little too much time in a computer science architecture class soaking up theories about ISAs and too little time actually looking at how chips are made these days and what works. When you get right down to it, x86 work
Re:Way to put the conclusion in the article summar (Score:2)
Re:Way to put the conclusion in the article summar (Score:2, Funny)
You must be new here...