More Details Of IBM's Blue Gene/L 119
Bob Plankers writes "By now we've all heard about IBM's Blue Gene/L, LLNL's remarkable new supercomputer which is intended to be the fastest supercomputer on Earth when done (360 TeraFLOPS). IBM has released some new photos of the prototype, and renditions of the final cluster. Note that the racks are angled in order to permit hot air to escape vertically and reduce the need for powered cooling. The machine uses custom CPUs with dual PowerPC 440 processing cores, four FPUs (two per core), five network controllers, 4 MB of DRAM, and a memory controller onboard. The prototype has 512 CPUs running at 700 MHz, and when finished the entire machine will have 65536 dual-core CPUs running at 1 GHz or more. Stephen Shankland's ZDnet article also mentions that the system runs Linux, but not on everything: 'Linux actually resides on only a comparatively small number of processors; the bulk of the chips run a stripped-down operating system that lets it carry out the instructions of the Linux nodes.'"
Doom3? (Score:5, Funny)
Re:Doom3? (Score:1)
This time guess who/what will hold the joystick
Turing machine will be Turing machine (Score:3, Interesting)
Re:Turing machine will be Turing machine (Score:3, Insightful)
On the other hand, it is nice to have a fast computer to play with now, not in 50 years time!
Re:Turing machine will be Turing machine (Score:1)
Re:Turing machine will be Turing machine (Score:2)
I have a turing machine but I can't find the end of the tape to thread it up.
Infinite (Score:5, Funny)
Re:Infinite (Score:5, Funny)
That would be the sound of the joke wooshing around, and around, and around, and
See for yourself (Score:5, Funny)
compile and link with:
gcc -g -o test test.c
run:
Infinite loop test
executed in 3.888419 seconds
Re:See for yourself (Score:1)
[zora@dilaudid infiniteloop]$ gcc -g -o test test.c
[zora@dilaudid infiniteloop]$
Infinite loop test
executed in 12.825751 seconds
My computer sucks.......
Re:See for yourself (Score:1)
executed in 18.778423 seconds
this sucks... Time to get a watercooler and overclock this baby...
Microft
Re:See for yourself (Score:3, Funny)
Re:Infinite (Score:1)
You only have to finish the first iteration in half a second. The second in 1/4th of a second. The third in 1/8th of a second. In general, the nth iteration in half the time required for the n-1th iteration.
After one second the loop will be finished.
Re:Infinite (Score:1)
I don't think the loop will be finished after 1 second. An infinite loop is "One that never terminates". There can be no satisfiable loop exit condition in a finite amount of time (interpretting "never" to mean "not at any time; not while forever is still happening").
Your machine would require being able to do an infinite number of loops in an infinitely small time. You could also do it if you had a machine that could do an infinite number of iterations simultaneously in fini
Travelling salesmen. (Score:5, Funny)
Re:Travelling salesmen. (Score:5, Funny)
Re:Travelling salesmen. (Score:2)
"4MB of DRAM" (Score:3, Funny)
Only a PPC 440? (Score:1, Interesting)
Re:Only a PPC 440? (Score:5, Informative)
Re:Only a PPC 440? (Score:1, Informative)
A PIII is as much a RISC processor under the hood as the PPC, but neither are pure RISC.
Pure RISC sucks, pure CISC sucks.
Re:Only a PPC 440? (Score:2, Insightful)
Re:Only a PPC 440? (Score:5, Insightful)
RISC vs CISC means very little these days. Most current CPUs have a core even more minimal than RISC chips, but present a CISC (in the case of x86) or RISC (in the case of the G5) interface to the outside. They used the PPC 440 for different reasons:
1) IBM had to do significant custom engineering for it, and they own the PPC 440 core. That allowed them to use it to design an SoC.
2) They needed to add FPU hardware, which is easier to do on a design they own. The PIII only has one FPU, while this chip as 2 FPUs. IBM had to add this to the design, because the regular PPC-440 has no FPUs.
3) The PPC-440 was designed from the beginning to be an embedded CPU. At 1GHz, a stock PPC-440 consumes about 2.5W. Even a low voltage PIII consumes more than that.
Re:Only a PPC 440? (Score:2)
- RISC needs smaller die sizes. The free space can be used for more pipelining etc. => more speed!
- Compiler's perform way better on RISC. If you do not code in assembler => more speed!
Re:Only a PPC 440? (Score:2)
Re:Only a PPC 440? (Score:2)
CISC instruction sets are more expressive than RISC ones, despite their name. Look for example (no highperf. computing, I know, I know, but anyway) at the ARMThumb. That's 16bit/instruction. Clearly RISC.
4 MB DRAM (Score:3, Funny)
Re:4 MB DRAM (Score:5, Interesting)
Final version to have 65536 CPUs.
Smells like 256GB to me, which is pretty decent in _any_ book, especially if it lives on the same silicon as the CPU...
Re:4 MB DRAM (Score:2)
Re:4 MB DRAM (Score:5, Informative)
If your p1 runs at the same speed than your P4 for 90% of operations, then there is something wrong with your computer! The HDD is not the bottleneck for most modern computers, as they have enough memory to minimize page faults for most common home computing tasks.... Startup times however may be equal since both machines have to get the data/program from HDD... once stuffs are in memory, buubbie P1....
Re:4 MB DRAM (Score:2)
Get two systems together. Both with the same amount of memory and the same windows OS. Then install an IDE controller in the P1 and put fast ide drives on both systems (a faster scsi controller will do as well). Put something
Re:4 MB DRAM (Score:1)
On a more serious note, I probably was fast enough to strike an obvious joke (4MB of RAM, very funny).
What I'd actually like to know:
Wouldn't these babies, considering *4* FPUs per double core be *screaminggggggggg* on typical tasks like Fluent, Ansys, Abaqus, and in general any fpu-intensive task?
Wouldn't these computers be a revolution (in the sense of the word) for companies looking for "the bang for the buck"?
Just wondering...
Re:4 MB DRAM (Score:2)
Re:4 MB DRAM (Score:2)
Definately higher density at a lower price but I thought the idea of supercomputers was to build the most powerful machine possible at ANY expense?
Not to say that just thinking of having even the prototype in my basement doesn't make me want to cream my jeans. But this is not news, if you want lots of memory real cheap you go with dynamic ram, if you want the fastest ram you go with static. Dynamic ram bears a double pena
Subjective... (Score:5, Funny)
Woah, this is the first time I think a box with 512 CPUs at 700 Mhz each one is crap.
Diego Rey
Re:Subjective... (Score:1)
Let me guess, you bought a Pentium 4? :-)
One should read 512 MB DRAM (Score:1)
Re:One should read 512 MB DRAM (Score:2, Informative)
It's gonna be 512 MB for BlueGene/L(ite) and 1Gb for proper BlueGene
I mean, per node :-)
AFAIK, 512 Mb is just too little for proper protein-folding calculations, while 1Gb provides enough capacity... And, of course, no swap is possible in this types of systems
It's gorgeous... (Score:4, Funny)
Re:It's gorgeous... (Score:2)
Some day, I too will own a supercomputer, even if it is a 15 yr old Cray...
What's new? (Score:4, Interesting)
Sorry about the sarcasm, I'm only asking to be proven wrong, but isn't Blue Gene just more of the same, only bigger? Big Mac was interesting because of how cheap it was and because it was the first of its kind to use Macs, the Earth Simulator was interesting because it brought back custom chips for supercomputing as opposed to off the shelf components, we've been reading about IBM's dishwasher-sized supercomputer, articles about efficient supercomputing, so what's new about Blue Gene, besides being newer and bigger?
Once again I'm not bashing, I haven't read much of anything but the
Re:What's new? (Score:5, Interesting)
What is significant about blue gene is that is some sort of compromise between off the shelf parts (PPC based Processing elements vs. the Earth Simulator SX based custom vector PEs), and efficient interconnection (plain crappy cluster like the Big mac with a better interconnect at multiple layers starting with dual cores per die).
In the end it all leads to the same goal: tackling bigger problems faster. So it may sound trivial but there is a lot of research going into this baby.
Re:What's new? (Score:5, Interesting)
That is where IBM tries to go: BlueGene's design is based on a system-on-a-chip - everything (except memory) is integrated on a single chip. In the long run, this allows them to build systems much larger than you could with a Beowulf. They are basically aiming for a system where you can easily add computing power by simply putting in a few more chips, and the thing will scale. They are doing the same thing for storage with this brick [slashdot.org]
BlueGene is a also the first supercomputer marketed to the life sciences. It's interesting to see that it developed from a project at Columbia University called QCDOC [columbia.edu] for "Quantum Chromodynamics on-a-chip" which did research in computational high-energy physics, and QCDSP before, which used DSP processors to build a supercomputer about ten years ago. Both an instructive example how academic research in the long run becomes industrially relevant, and how science changes.
Re:What's new? (Score:2)
No, *Smaller* and Faster (Score:4, Insightful)
BlueGene/L is about driving down the cost of supercomputing, not only in terms of money spent on hardware, but in terms of space, cooling, and maintanance, while at the same time improving scalability.
BlueGene/L is going to put 65,000+ processors in less space, using less power, and costing less, than many of todays >10,000 processor systems.
They do this with a minimalist approach, each processor is a SoaC (System on a Chip), with everything from the memory controller to internode networking to two cores and 4FPUs on the die, and the only other thing in a node besides the processors is a bit of RAM. This allows them to use much less power per node and gives them less heat per node to dissipate, which lets them pack the nodes much closer, which cuts down on internode latency, which increases scalability.
Re:No, *Smaller* and Faster (Score:1)
Re:No, *Smaller* and Faster (Score:1)
Re:What's new? (Score:2)
We NEED faster, dammit! (Score:1)
Now, Ken Wilson (=cool Nobelprize guy) who basically started this field, gave a famous estimate in a talk at a conference some time back, stating that to do a serious project in lattice QCD, one would need some (listen up n
Re:We NEED faster, dammit! (Score:1)
I'm surprised! (Score:5, Funny)
Re:I'm surprised! (Score:5, Funny)
The Racks Are Not Angled (Score:3, Insightful)
At the left side of the row of racks, there is an angled cover, which is either decorative, or being used to force cold air down the row of racks. Likely, its just decorative, and the cold air is being forced up from the raised flooring below.
Just like it is in every other enterprise-grade computer room...
Dumpsterdiving seems a waste of time at IBM (Score:1)
I guess corporate espionage is quite real for these guys.
Re:Dumpsterdiving seems a waste of time at IBM (Score:3, Funny)
Re:Dumpsterdiving seems a waste of time at IBM (Score:1)
Linux nucleus for slaved compute nodes? (Score:4, Interesting)
Linux actually resides on only a comparatively small number of processors; the bulk of the chips run a stripped-down operating system that lets it carry out the instructions of the Linux nodes.
The "stripped down operating system" must be the distribution nucleus on the compute-only subnodes, presumably something that allow the Linux nodes to distribute the code and I/O of computations to them and to query or control their state during debugging, and to reaccquire lost processor control.
It's only a matter of time before those of us who already have sizeable LANs at home will have embedded compute-only clusters within them too. Those would differ substantially from the typical Linux clustering for high availability. Instead of a non-Linux nucleus on those subnodes though, I'd prefer to see a pretty ordinary Linux kernel running slaved to remote masters.
Is anyone already playing with something like this in their Linux clusters?
Re:Linux nucleus for slaved compute nodes? (Score:2)
Re:Linux nucleus for slaved compute nodes? (Score:1, Interesting)
24 CPUs per unit (Score:2)
Marc Snir (Score:2)
OT: Hire a photographer (me ;)! (Score:2, Insightful)
Other than that, keep up the great work IBM!
Re:OT: Hire a photographer (me ;)! (Score:1)
You're all Missing the point here... (Score:2, Insightful)
Oh wow, another technical marvel
Oh Gee, another super computer...
Morons...
The whole point here, is that it makes the simulation
of folding a complete gene in about a years time.
If THAT doesn't bowl you over, don't post.
p.s. I can hear the rest of you "umm... so?" people and I can't help you. Sorry.
Re:You're all Missing the point here... (Score:1)
of folding a complete gene in about a years time.
Isn't that folding a protein?
But how big a protein?
A chromosone would take a lot longer to simulate - but it's essentially a double helix when you get down to the molecular level. That would be a good test though - if you take DNA and can't simulate it going into a double helix how can you trust the computer?
Wouldn't it be great... (Score:2, Interesting)
But otherwise, for all intents and purposes, its extremely proprietary and will ultimately run just a few specialized applications.
Never-the-less, with virtualized computing and beheamoth systems like these, the future of data centers is sure to change.
Cool movie... (Score:1)
du ? I want to crack passwords!!! (Score:1)
Did they... (Score:1)
If they need help for that, they can read an Douglas Adam's "Hitchhikers Guide to the Galaxy".
Re:it's all cool and everything... (Score:3, Funny)
10 LET x = 1
20 LET y = 2
30 PRINT x + y
This seems to be a "does it run Linux?" joke gone horribly wrong.
Or is that kernel code you're posting?
Re:it's all cool and everything... (Score:2)
nope, it runs Linux!
As a reply ;)
Re:it's all cool and everything... (Score:3, Funny)
#include <linux/config.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/types.h>
MODULE_LICENCE("GPL");
__asmlinkage inline unsigned int add_x_plus_y(unsigned int x, unsigned int y){
unsigned int ret;
spin_lock_irq(¤t->arith->lock);
current->arith->accum = x;
current->arith->oprand = y;
__perform_add(¤t->arith);
ret = current->arith->accum;
spin_unlock_irq
Re:it's all cool and everything... (Score:4, Funny)
Re:Can you imagine... (Score:2)
This is a dense amount of processing power. Beowulf clusters aren't nearly this dense. Actually, if one were to create a "Beowulf cluster" of these, probably the Blue Genes would be attached to the Beowulf nodes, rather than being themself nodes. (I suspect that the controller for the Blue Gene is a bit specialized for controlling the Blue Gene.)
Attaching a Beowulf Cluster of these together would result in a computer that was significantly slower...but able to d