SeaMicro Unveils 512 Atom-Based Server 183
1sockchuck writes "Stealthy startup SeaMicro has unveiled its new low-power server, which incorporates 512 Intel Atom CPUs, a load balancer and interconnection fabric into a 10u server. SeaMicro, which received a $9.3 million government grant from DOE to develop its technology, says its server uses less than 2 kilowatts of energy — suggesting that a single rack with four SeaMicro units and 2,048 CPUs could draw just 8 kilowatts of power. Check out the technical overview, plus additional coverage from Wired, GigaOm and VentureBeat."
What does a normal rack consume? (Score:4, Interesting)
Low power, really? (Score:3, Interesting)
In all of the benchmarks that I've seen, clock for clock a Core 2 gets about twice the score of an Atom, sometimes more. The Core 2 uses a bit more than twice as much power, but if you have two Atoms you also need twice as many north-bridge chips and this pushes the power usage up to over what the Core 2 will consume. The newer Xeons do even better.
The first benchmark results I found that compared the two were PassMark benchmarks, where a 2GHz Atom scored 386 and a Intel Xeon X5680 at 3.33GHz scored 10620. The fastest Atom, the D510 at 1.66GHz, scored 662. Even if your code scales linearly, you need more than 16 of the fastest Atom that you can buy to replace one Xeon. Or, to put it another way, this 512-Atom machine is about as powerful as a 32-CPU Xeon.
A single Atom D520 draws around 13W, so 16 of them draw 208W. The Xeon will draw 130W. Drawing under 2KW for 512 Atoms means that they probably aren't using the fastest available ones. Actually, it means that they're drawing under 4W per Atom, which means that they're probably using Z-series Atoms, getting about half the performance of the D-series ones, so you'd only need about 16 Xeons for the same performance.
For most workloads, if the server is not busy, you'll get much better power usage from the Xeon as well. Power usage drops off dramatically when the CPU is not 100% busy. Unless you are turning individual atoms off completely, you can't scale back power usage nearly as well with the Atoms, as single processes that would not be CPU-bound on the Xeon will require an Atom core to run at full speed.
In other words, it sounds a lot more like greenwashing than anything that's actually sensible.
Re:What's the "bang for the buck"? (Score:1, Interesting)
Re:What does a normal rack consume? (Score:5, Interesting)
Re:Vitual center (Score:4, Interesting)
Actually it is much more interesting to handle each of them as you would handle an individual virtual machine - so you have 512 nice low-powered virtual servers with each of them having a fixed and dedicated processor.
In fact such a load-out would be very useful for hosting companies - you can have a ton of small clients with minimal management or scheduling burden.
Re:What's the "bang for the buck"? (Score:5, Interesting)
1. double precision. Use a double, and the Atom will grind to a halt.
2. division. Use rcp + mul instead.
3. sqrt. Same as division.
All of those produce unacceptable stalls, and annihilate your performance immediately. So don't use them!
Now, you'd imagine those are insurmountable, but you'd be wrong. If you use the Intel compiler, restrict yourself to float or int based SSE instuctions only, avoid the list of things that kill performance, and make extreme use of OpenMP, they really can start punching above their weight. Sure they'll never come close to an i7, but they aren't *that* bad if you tune your code carefully. Infact, the biggest problem I've found with my Atom330 system is not the CPU itself, but good old fashioned memory bandwidth. The memory bandwidth appears to be about half that of Core2 (which makes sense since it doesn't support dual channel memory), and for most people that will cripple the performance long before the CPU runs out of grunt.
The biggest problem with them right now is that they are so different architecturally from any other x86/x64 CPU that all apps need to be re-compiled with relevant compiler switches for them. Code optimised for a Core2 or i7 performs terribly on the atom.
Re:What does a normal rack consume? (Score:3, Interesting)
Virtualize the system, not the CPU (Score:1, Interesting)
Yeah, at least someone gets it. The entire server is a virtualization host. It can allocate up to 512 VMs each with very deterministic QoS parameters. The CPUs are so cheap and plentiful, you don't need to share them between guests. The real hardware to virtualize is the I/O hardware, hence the features around sharing a big ass disk array and network interconnect via custom ASICs.
Re:Low power, really? (Score:3, Interesting)
Because you are thinking serial while they are thinking parallel.
How many simultaneous operations can do 512 atoms VS. say total 128 Xeon cores?
What happens when single operation is extremely small, but there are extremely high volume of them?
What happens to a CPU core while it's waiting for RAM or other I/O? Yea, that's right: It waits.
What happens to memory IOPS when you have 512 channels versus 128 dual-channels? Yup, it's vastly higher, but not actually just twice, but quadruple (dual channel is for dual bandwidth, not dual IOPS afaik)
Re:What's the "bang for the buck"? (Score:4, Interesting)
Then they screwed up, and they should have used ARMs, because a great deal of Atom's performance lies in its multimedia instruction set. Or in other words, if you're not pushing flops, you have a lot of hardware lying around unused. Atom delivers a lot of flops (or iops, for that matter) but doesn't shovel data any more efficiently than anyone else.
Re:Vitual center (Score:3, Interesting)
SMP will only bring you so far - i'll bet 8 VCPU VMs on Atoms will be beat by a 2 VCPU VM on a Core 2 Duo.
Perhaps not, depending on the other load the system is working on. Because of the way VCPUs are scheduled (at least in VMWare) that 8-vCPU VM won't get a time-slice until such time as there are 8 real cores available for the duration of that slice.
Not all virtualization systems have that limitation. In a modern VM, each VCPU gets scheduled separately on the physical CPUs, rather than using gang-scheduling, for exactly the reason you described. That way, if you have a pile of N-cpu VMs, each of which just has one or two CPUs waking up periodically rather than all doing intensive computation, they can all share relatively few hardware CPUs and run efficiently.
Re:Low power, really? (Score:2, Interesting)
Also, no matter how much power they claim to save with this beast, it's nothing compared to virtualizing a rack or more into 3 ESX hosts. With this Atom "solution", you'll just have tons of nodes on 0.00 load again like before, instead of having all the physical servers at a load you're comfortable with. To make things worse, if you run a job on the Atoms, it will run in slow motion because of the "performance" of that CPU - you cannot burst to the full 2/4/8 vCPU allocated, @x3 speed +turbo when needed. It's *much* harder to use 512 slow CPU's than fewer, faster ones.
And never mind running any commercial software on it. The per core/socket license will make it impossible.