SGI & NASA Build World's Fastest Supercomputer 417
GarethSwan writes "SGI and NASA have just rolled-out the new world number one fastest supercomputer. Its performance test (LINPACK) result of 42.7 teraflops easily outclasses the previous mark set by Japan's Earth Simulator of 35.86 teraflops AND that set by IBM's new BlueGene/L experiment of 36.01 teraflops. What's even more awesome is that each of the 20 512-processor systems run a single Linux image, AND Columbia was installed in only 15 weeks. Imagine having your own 20-machine cluster?"
hmmmm...... (Score:4, Funny)
Re:hmmmm...... (Score:5, Funny)
Tomorrow looks like developing a slight rise in Insightful post, but a drop in overall Informative. "First Post" will remain as a constant pattern.
Re:hmmmm...... (Score:5, Informative)
In other words: RTFA, that's exactly what they're using it for.
Re:hmmmm...... (Score:4, Insightful)
You don't live somewhere that gets hurricanes, do you? 'Cause scientists can already "potentially predict hurricane paths a full five days before the storms reach landfall." Hell, I can do that. A freakin' Magic 8 Ball can potentially do that.
Maybe they're trying to say something about doing it with a better degree of accuracy, or being right more of the time, or something like that, but it doesn't sound like it from that quote.
"Hey, guys, look at this life-sized computer-generated stripper I'm rendering in real-ti... oh, what? Um, tell the reporter we think it'd be good for hurricane prediction."
nothing compared to 500TF per floor at the CIA (Score:4, Funny)
2. I bet the CIA also can change the weather, go read HARP etc... if the russians can do it in the 80s then the CIA can do anything.
That's nothing... (Score:5, Funny)
Re:That's nothing... (Score:5, Informative)
They did! According to C-Net article [com.com] they "quietly submitted another, faster result: 51.9 trillion calculations per second" (equivalent to 51.9 teraflops).
Read on to the next paragraph (Score:5, Interesting)
Ok, so we have Linux doing tens of teraflops in processing, FreeBSD doing tens of petabits in networking,
Re:Read on to the next paragraph (Score:3, Informative)
NASA Secures Approval in 30 Days
To accelerate NASA's primary science missions in a timely manner, high-end computing experts from NASA centers around the country collaborated to build a business case that Brooks and his team could present to NASA headquarters, the U.S. Congress, the Office of Management and Budget, and the White House. "We completed the process end to end in only 30 days," Brooks said.
Wow. That's incredibly fast, IMHO.
As the article mentions, I suppose NASA owes thi
Re:Read on to the next paragraph (Score:4, Insightful)
Re:Read on to the next paragraph (Score:5, Insightful)
This is an SGI system. SGI has laid out plans for terascale computing (stupid marketing speak for huge ccNUMA systems) a while ago. I'm sure NASA and SGI worked together but this is essentialy an 'Extreme' version of an off-the-shelf SGI system.
Re:Read on to the next paragraph (Score:4, Informative)
Slashdot carries grudges.
Re:Read on to the next paragraph (Score:3, Interesting)
Re:Read on to the next paragraph (Score:3, Informative)
Their new machines stilled aren't clustered. Clusters don't generally run single system images on shared memory computers. SGI's Altix systems use a NUMA link to enable them to efficiently acces memory on remote computers, making them a kind of distributed shared memory machine. And SGI's Origin systems are your traditional SMP machine. The Altix or Origin systems are neither cheap, nor off the shelf.
Regarding your comment about them ignoring Linux, what was fundamentally
Re:Read on to the next paragraph (Score:3, Informative)
As for linux, they stepped towards linux about the same
Re:Read on to the next paragraph (Score:3)
Re:Read on to the next paragraph (Score:3, Interesting)
Mmmm, home consumer usage, maybe?? HA! What was I thinking!?
Re:Read on to the next paragraph (Score:3, Insightful)
Wonder why they run open source instead of proprprietary operating system on this? Maybe the multitude of answers to that question can show you why it can be considered open source victory.
Re:Read on to the next paragraph (Score:5, Informative)
This is why SMP computers tend to have 2 or 4 processors, and 8 at a pinch, but no more. It's just not practical, using current methods, to directly wire up more than 8 processors in such a tight package.
Lets say you have N processors, each capable of executing I instructions per second. Your total theoretical throughput would be N x I. However, this would only be the case if the system is 100% parallel, and no processor needed to communicate with any other. Rarely the case.
In practice, the function of performance to processors follows a distribution that looks a bit like a squished bell curve. As you increase the number of processors, the performance gain decreases, reaches zero, and actually becomes negative. At that point, adding more CPUs will actually SLOW the computer down.
The exact shape and size of the curve is partly a function of the way the components are laid out. A good layout keeps the amount of traffic on any given line to a minimum, minimizes the distances between nodes, and minimizes the management and routing overheads.
However, layout isn't everything. If your software can't take advantage of the hardware and the topology, then all the layout in the world won't gain you a thing. To take advantage of the topology, though, the software has to comprehend some very complex networking issues. It has to send data by efficient pathways.
If connections are not all the same speed or latency, then the most efficient pathway may NOT be the shortest. This means that the software must understand the characteristics of each path and how to best utilize those paths, by appropriate load-balancing and traffic control techniques.
If you look at extreme-end networking hardware, they can be crudely split into two camps - those where the bandwidth is phenomenal, at the expense of latency, and those where the latency is practically zero but so's the bandwidth.
The "ideal" supercomputer is going to mix these two extremes. Some data you just need to get to point B fast, and sometimes you're less worried about speed, but do need to transfer an awful lot of information. This means you're going to have two physical networks in the computer, to handle the two different cases. And that means you need something capable of telling which case is which fast enough to matter.
Even when only one type of network is used, latency is a real killer. Software, being the slowest component in the machine, is where most of the latency is likely to accumulate. Nobody in their right minds is going to build a multi-billion dollar machine with superbly optimized hardware, if the software adds so much latency to the system they might as well be using a 386SX with Windows 3.1
And that means Linux has damn good traffic control and very very impressive latencies. And it looks like these are areas the kernel is going to be improving in still further...
Re:Read on to the next paragraph (Score:3, Informative)
Umm, not true. Sun, can hold up to 106 processers in its Sunfire 15K product, or 72 dual-core processors in the E25K.
SGI's Origin systems are equally large I believe. And manufacturers like IBM also have large SMP machines.
Being able to efficiently use that many processors is a completely different matter that depends on the nature of the problem. It is possible to efficiently to use more that 8 processors though. I'v
This time there really is a turbo button! (Score:5, Informative)
There's also a dark horse in the supercomputer race; a cluster of low-end IBM servers using PPC970 chips that is in between the BlueGene/L prototype and the Earth Simulator. That pushes the last Alpha machine off the top 5 list, and gives Itanium and PowerPC each two spots in the top 5. It's amazing to see the Earth Simulator's dominance broken so thoroughly. After so long on top, in one list it goes from first to fourth, and it will drop at least two more spots in 2005.
20 system cluster?!? (Score:5, Funny)
Who cares about a 20 system cluster, I want a one 512 processor machine!
or 20, I'm not that picky
Everyone needs one! (Score:5, Funny)
Re:Everyone needs one! (Score:5, Funny)
Not to be pedantic, but the correct term is "Freedom Bomb".
Re:Everyone needs one! (Score:3, Funny)
Re:Everyone needs one! (Score:4, Funny)
Wow---- (Score:5, Funny)
and thats only 4/5 of the performance! (Score:3, Informative)
One is a parity bit... (Score:4, Funny)
RAEM (redundant array of expensive machines) just doesn't ring right - to close to REAM.
And after further cooperation with Redmond... (Score:4, Funny)
Re:And after further cooperation with Redmond... (Score:2)
Re:And after further cooperation with Redmond... (Score:3, Funny)
A: A slide projector.
(Old joke. cat nt-joke-1990.txt | sed -e 's/Windows NT/Longhorn/g')
its not the hardware thats important (Score:5, Funny)
Re:its not the hardware thats important (Score:3, Interesting)
But if you can decrease the grid size by throwing more teraflops at the problem, maybe we'll find that our models are accurate after all?
Re:its not the hardware thats important (Score:3, Interesting)
Re:its not the hardware thats important (Score:2)
Re:its not the hardware thats important (Score:5, Funny)
it's the wetware (Score:5, Insightful)
Of course, we're just getting started with chaos dynamics. We might find chaotic mathematical shortcuts, just like we found algebra to master counting. And studying weather simulation is a great way to do so. Lorenz first formally specified chaos math by modeling weather. While we're improving our modeling techniques to better cope with the weather on which we depend, we'll be sharpening our math tools. Weather applications are therefore some of the most productive apps for these new machines, now that they're fast enough to model real systems, giving results predicting not only weather, but also the future of mathematics.
In other news... (Score:2, Funny)
Photos of System (Score:5, Informative)
Re:Photos of System (Score:5, Funny)
Re:Photos of System (Score:3, Informative)
Re:Photos of System (Score:3, Funny)
Re:Photos of System (Score:3, Interesting)
Re:Photos of System (Score:5, Interesting)
I don't have a square footage number, but it's the overwhelming majority of the server floor. We had to "clear the floor" earlier this summer to make room.
Interesting Facts (Score:5, Informative)
(Link [sgi.com])
2) This number was using only 16 of the 20 systems, so a full benchmark should be larger too.
(link [sgi.com])
3) The storage attached holds 44 LoC's (link [sgi.com])
More on the Storage (Score:3, Informative)
They use tape storage from Storage Tek like this one [storagetek.com]
And harddrive storage from Engenio (formally LSI Logic Storage Systems) like this [engenio.com].
Imagine a... (Score:2, Funny)
oh wait, sorry, Cray deja-vu
Here's the current list... (Score:5, Funny)
http://www.netlib.org/benchmark/performance.pdf [netlib.org] See page 54.
And here's the current top 20 [wisc.edu] as of 10/26/04...
Re:Here's the current list... (Score:3, Informative)
Slashdot may have announced the news at 10:45, but this particularly silly post of mine [slashdot.org] demonstrates, I had the news 6 and half hours early, from Dongara's paper.
NASA.org? (Score:5, Funny)
Try NASA.GOV.
What is the stumbling block? (Score:5, Insightful)
It's a little like how Canada's and France's nuclear power plant system are built around standardized power stations, cookie cutter if you will. The cost to reproduce a power plant is negligble compared to the initial design and implementation, so the reuse of designs makes the whole system really cheap. The drawback is that it stagnates the technology and the newest plants may not get the newest and best technology. Contrast this with the American system of designing each power plant with the latest and greatest technology. You get really great plants each time, of course, but the cost is astronomical and uneconomical.
So to, it seems with supercomputers. We never hear about how these things are thrown into mass production, only about how the latest one gets 10 more teraflops than the last and all the slashbots wonder how well Doom 3 runs on it or whether Longhorn will run at all in such an underpowered machine.
But each design of a supercomputer is a massive success of engineering skill. How much cheaper would it become if instead of redesigning the machines each time someone wants to feel more manly than the current speed champion, that the current design be rebuilt for a generation (in computer years)?
Re:What is the stumbling block? (Score:2)
Although I joke, I do see your point. Perhaps it would be wiser if we left our current supercomptuer designs alone for a while until we really need an upgrade. Maybe they could spend some of their time fixing Windows instead?
Re:What is the stumbling block? (Score:2)
Bringing pre manufactured super computers into the building is probably the easiest step.
Re:What is the stumbling block? (Score:3, Interesting)
It doesn't. [rocksclusters.org]
Re:What is the stumbling block? (Score:5, Insightful)
Second thought experiment. Imagine the systems are built out of modular bricks that are identical to deskside servers. so that they can sell exactly the same hardware in anywhere from 2 to 512 processors by just plugging the same standard bricks together, and they all get the same shared memory, and run one OS. Rack after rack after rack. That is SGI's architecture. It is absolutely gorgeous.
So they install twenty of the biggest boxes they have, and network those together.
$/buck ? I dunno. Is shared memory really a good idea? Probably not. but it is absolutely gorgeous, and no-one can touch them in that shared memory niche that they have.
Re:What is the stumbling block? (Score:5, Insightful)
Well, are we talking about actual supercomputers, not just clusters? 'Cause if you're just trying to break these Teraflops records, you can just cram a ton of existing computers together into a cluster, and voila! lots of operations per second.
But it's rare that someone foots the bill for all those machines just to break a record. Los Alamos, IBM, NASA, etc. want the computer to do serious work when it's done, and a real supercomputer will beat the crap out of a commodity cluster at most of that real work. Which is why they spend so much time designing new ones. Because supercomputers aren't just regular computers with more power. With an Intel/AMD/PowerPC CPU, jamming four of them together doesn't do four times as much work, because there's overhead and latency involved in dividing up the work and exchanging the data between the CPUs. That's where the supercomputers shine: in the coordination and communication between the multiple procs.
So the reason so much time and effort goes into designing new supercomputers is that if you need something twice as powerful as today's supercomputer, you can't just take two and put them together. You have to make new architecture that is even better at handing vast numbers of procs first.
Re:What is the stumbling block? (Score:3, Insightful)
a) obtain space. Usually, raised floors, rack systems, with adequate HVAC for the huge thermal load you're about to throw into a few racks. For collocation, it'll take some time for your provider to wire together a cage for your installation, e
Which NASA is this again? (Score:2)
National Aeronautics and Space Administration [nasa.gov]
New Advanced Search Agent [nasa.org]
Re:Which NASA is this again? (Score:2)
Re:Which NASA is this again? (Score:2)
will soon be surpassed... (Score:5, Informative)
The amazing thing about it is that it's built at a fraction of the cost/space/size as the Earth simulatior. If I remember correctly, I think they already have some of the systems in place for 36 teraflops. It's the same Blue Gene/L technology from IBM, just a larger scale.
Cost (Score:5, Interesting)
For example, I know the Virginia Tech cluster (1,100 Apple Xserve G5 dual 2.3Ghz boxes) cost just under $6 million, runs at a bit over 12 teraflops, so it gets a bit over 2 teraflops per million dollars.
Other high-ranking clusters would be interesting to evaluate in terms of teraflops per million dollars, if anyone knows any.
Re:Cost (Score:4, Informative)
http://news.com.com/Space+agency+taps+SGI,+Intel+
The cost is quoted in the article at $45 million over a three year period, which indicates that the "Columbia" super cluster gets a bit more than 1 teraflop per million dollars. That seems impressive to me, considering the overall performance.
It would be interesting to see how well the Xserve-based architecture held its performance per dollar when scaled up to higher teraflop levels...
Re:Cost (Score:3, Informative)
What happened to NEC's new Vector Supercomputer (Score:2)
It was here on slashdot last week [slashdot.org], IIRC. :)
Not fully true (Score:2, Informative)
Ya know... (Score:5, Funny)
Seriously, am I on candid camera?
My proposed use of this super computer.... (Score:4, Funny)
Yes, but... (Score:2)
70.93 TeraFLOPs (Score:5, Interesting)
What?! (Score:4, Funny)
And with all that power (Score:2)
Columbia [sgi.com]
I can see a certain person now... (Score:5, Funny)
"512 processors, 20 machines, $699 per processor. All that intellectual property, yes! No free lunch no, Linux mine, MIIIINE, BWAAAAHAHAHAHA!!!"
*dials*
"Hello, NASA? About that $7,157,760 you owe me...
I'm sorry, where do you want me to jump?"
Linux #1 (Score:5, Interesting)
They'd run Windows XP... (Score:3, Funny)
Run Windows Update for each box.
Remove Windows Messenger.
Cancel the window telling you to take a tour of XP.
Cancel the window telling you to get a passport.
Run the net connection wizard.
Reboot after installing updates.
etc....
(I'm not being totally serious, I know you can deploy ghost images etc..)
Processors aren't relevant anymore? (Score:5, Interesting)
There was a time when different computers ran on different processors, and supported different OSes. Now what's happening? Itanic and Opteron running Linux seem to be the only growth players in the market; and the supercomputer world is completely dominated by throwing more processors together. Is there no room for substantial architectural changes? Have we hit the merging point of different designs?
Just some questions. Although it's not easy, I'm less excited by a supercomputer with 10k processors than I would be by one containing as few as 64.
I had a shell on the machine for day (Score:4, Funny)
# make -j 10534 bzImag
and even before I could hit the e and enter, it was done.
I was gonna build X but on this box the possible outcomes of "build World" scared me!
Units? (Score:3, Funny)
Yes, but what is that in bogomips?
Re:or (Score:2)
Re:Intent of NASA... (Score:4, Funny)
Re:Intent of NASA... (Score:2)
Re:Intent of NASA... (Score:4, Funny)
Re:Intent of NASA... (Score:4, Insightful)
What new private space industry? Spaceship One, for example, reached space. That's a long way from being able to do anything useful in space. They were nowhere near orbital velocity, for example. We're still many years, if not decades, away from private industry being able to take over NASA's near-earth space role.
Ways you are wrong (Score:4, Informative)
They don't carry schoolteachers.
They don't fly in the air.
This runs Linux, not Windows. It won't crash.
Re:Ways you are wrong (Score:3, Funny)
so it's not water-cooled?
[didn't RTFA]
Re:Ways you are wrong (Score:2)
Oh, yeah. Oops. I actually watched that from Japan when I was in the service. Not happy.
Re:Ok, what is the point of this? (Score:2)
Re:Ok, what is the point of this? (Score:4, Funny)
Re:Ok, what is the point of this? (Score:5, Insightful)
The super computer is a cluster (10k+ processors in 20 nodes).
Not all applications/computations scale by just adding computers to the cluster.
An example would be solving for z: x=84+19, y=5*3, z=x+y
The ultimate solution z is limited by the speed x & y can be solved. You can have an individual computer solve for x and another for y in parallel. But no matter how many more computers you add, none of them can solve z until x&y are solved first, and none of them would speed up the computation of x&y.
After a certain scale, you do not get benifits of parellel processing, so the only way to speed things up is to make each individual computer faster.
Re:Ok, what is the point of this? (Score:2)
Ok, I expected someone would say that, and that's fine. But isn't that exactly the scenerio that clustering technologies have been created to be used in?
I seriously have a hard time imagining what kind of problem could not be solved with a cluster of pentium fours, each with 4-5 cpus (for
Re:Ok, what is the point of this? (Score:5, Interesting)
Slightly more concrete example - right now with my photonics simulations (finite element) on my dual-opteron rig the max I can handle is about 180,000 elements (which means a (4*180000)x(4*180000) matrix with complex elements needs to be diagonalized, among other things), and it takes about half an hour for a standing-wave calculation. To do any time propogation, repeat same calculation in picosecond increments. And with the gridding I can do, for a 100 micron disc resonator in 2-D I have to use light at about 40 microns. To go to the 320nm wavelength these resonators are operating at, I'd need roughly 2 orders of magnitude more memory. There's also the time factor to be considered. As with any design process, one must iterate. Tweak a little here, run the program, rinse, repeat. How long are you willing to spend in this process before you feel something is "good enough"? The faster the computer spits the answer out, the more things you can try, and the more you can think things over and hopefully make it better.
And this is a single component in what can be a fairly complex integrated-photonics chip. [And might I mention again I've been working in 2-D this entire time instead of doing a full 3-D simulation?] You give me the computational power and I'll use it. And I'm an experimentalist doing fairly basic research who just wants to check some stuff in the computer before sinking a lot of time and effort into fabricating a test device.
On the other hand, I actually don't want to have one of the T100 supercomputers in our lab. That would mean I'd be spending all day writing code and designing complex simulations instead of in the lab getting my hands dirty.
And as for the commonality of problems requiring such computational power, I think almost any sort of simulation can easily use it. Consider more terms (everything I've done to date is horribly linearized - let's see some more terms in the Taylor expansion) to account for nonlinear behavior, grid things up finer to get more accurate results, consider more possibilities when dealing with chaotic behavior... I would hope any good scientist would find the possibilties endless.
Re:NEC's seems to be faster (Score:3, Informative)
I think you meant to say... (Score:3, Funny)
ARRRGGGHHHH PEOPLE'S HEADS ARE EXPLODING!!!
Now you know that there's some engineer with acces to this thing thinking how he can jump to the front of SETI@HOME.
Re:20? Try 10420 (Score:2)
You may think that, but you'd be wrong. It's 20 machines. After all, you don't think of a 100 CPU Sun E15K as 100 machines, or even a dual CPU desktop as two machines. SSI on Linux has come a long way...
Re:20? Try 10420,no 2560, make it 20 after all. (Score:4, Interesting)
Re:The obligatory phrases... (Score:2)
I think you mean
All your node are belong to us.
Re:PowerPC just got 0wned! (Score:2)
The System X cluster contained 1150 machines containing 2 CPUs each which equals 2300 CPUs in total. You were saying? Not to mention you are comparing an expensive Server CPU with a desktop/workstation CPU.
Why don't we wait for IBM to build a Power 4+ or Power 5 super cluster?
Re:windows (Score:4, Funny)
Re:The worst thing about this... (Score:4, Insightful)
Like what? Go out and look up SPEC results next time you're bored. I think you'll find that I2 is quite a bit more capable than you make out. IBM's dual-core POWER5 is just about the only thing out there that's even close to (a single-core) I2 in FP performance, and Opteron isn't even in the game at that level.
Is it a commercial failure? Probably, but so was Alpha - commercial success is not an indicator of actual performance.