A Three-Way AMD Opteron Server 137
Abdul tips a thin little review up at The Inquirer of the Themis Slice. "The Slice is a three socket Opteron machine with two PCIe slots and two Infiniband 4x ports... Why would you want three sockets rather than four? Easy, latency. Any CPU in a 3S system is one hop away from any other CPU. In a 4S system, you can be two hops away. This adds latency, and more importantly, you take a big hit on cache coherency latency. This kills performance."
nothing new (Score:4, Informative)
IBM System x3755 (Score:5, Informative)
The IBM System x3755 [ibm.com] has offered this feature since it came out as well. Instead of the fourth processor card you install a pass through card and it turns it into a three way. We've done a few benchmarks [lionbridge.com] (warning pdf) with the Pass Through card and what it could do between 3CPU and 4CPU operations.
pretty cool ability for a few things.
Re:Weird (Score:5, Informative)
Re:Weird (Score:5, Informative)
Re:Weird (Score:3, Informative)
Re:think three-dimensional (Score:1, Informative)
Re:4 way? (Score:1, Informative)
Re:Same latency with 4 processors (Score:5, Informative)
Check it out here. [realworldtech.com]
Re:Not as good as it sounds (Score:3, Informative)
Hmmm, now that I think about it, a three way box might be really interesting for some HPC loads as well. The low latency is a really big issue for some codes, and the three way could be more scalable (with some hand coding and profiling) than a 4 socket box with non-uniform latencies. The would apply to MPI code written and optimized for specific tasks- not the simple parallelization that some compilers can do. There's a significant number of HPC users who are happy running non-parallel code on hundreds of dual socket systems who might be able to scale fairly easily to 3 way systems. Actually, the code is parallel, to the extent that it runs on both cpus, but these particular users don't want the network latency for MPI code, even on fast networks. They could scale to three way with little loss of performance on one of these.
Hmmm, a third thought occurs to me. A 3 socket system might also be really,really useful for codes that are I/O intensive- let the traditional mpi code run on the first two cpus and let the third handle OS tasks, network operations and high performance filesystem operations. The latency is less of a value in this case, but simply keeping the OS from interrupting the 2 cpus running MPI could be a big win as well. Call it 2N+1 computing.
Ok, I admit it- I like options when it comes to designing systems to meet the needs of different users.
Tell it to a BMW or Jaguar driver (Score:3, Informative)
The smoothest piston automotive engines are in-line 6 cylinder engines or V-12 engines, which provide a power pulse with every 30 degrees of crankshaft rotation.
Anything else (3-, 4-, 5- cylinder in-line, V6, V8) has more widely-spaced power pulses and is less smooth. Most of these engines use a rotating counterweight (either an off-balanced flywheel or a separate rotating countershaft) in order to dampen these power pulses and increase smoothness. This works imperfectly and comes at the price of increased weight, rotating mass, and/or complexity.
Yet another approach which should be very smooth is the boxter design, which is used by Subaru and Porsche: cylinders are horizontally opposed at 180 degrees; this works quite well for Porsche, somewhat less well for Subaru.
Of course the smoothest automotive engine is the Wankel rotary currently used by Mazda - the "pistons" (rotors) rotate rather than reciprocate, and each power pulse lasts for 270 degrees.