SGI & NASA Build World's Fastest Supercomputer 417
GarethSwan writes "SGI and NASA have just rolled-out the new world number one fastest supercomputer. Its performance test (LINPACK) result of 42.7 teraflops easily outclasses the previous mark set by Japan's Earth Simulator of 35.86 teraflops AND that set by IBM's new BlueGene/L experiment of 36.01 teraflops. What's even more awesome is that each of the 20 512-processor systems run a single Linux image, AND Columbia was installed in only 15 weeks. Imagine having your own 20-machine cluster?"
and thats only 4/5 of the performance! (Score:3, Informative)
Photos of System (Score:5, Informative)
Interesting Facts (Score:5, Informative)
(Link [sgi.com])
2) This number was using only 16 of the 20 systems, so a full benchmark should be larger too.
(link [sgi.com])
3) The storage attached holds 44 LoC's (link [sgi.com])
Ways you are wrong (Score:4, Informative)
They don't carry schoolteachers.
They don't fly in the air.
This runs Linux, not Windows. It won't crash.
NEC's seems to be faster (Score:1, Informative)
Re:mankind has finally created... (Score:1, Informative)
Re:That's nothing... (Score:5, Informative)
They did! According to C-Net article [com.com] they "quietly submitted another, faster result: 51.9 trillion calculations per second" (equivalent to 51.9 teraflops).
will soon be surpassed... (Score:5, Informative)
The amazing thing about it is that it's built at a fraction of the cost/space/size as the Earth simulatior. If I remember correctly, I think they already have some of the systems in place for 36 teraflops. It's the same Blue Gene/L technology from IBM, just a larger scale.
Re:NEC's seems to be faster (Score:3, Informative)
This time there really is a turbo button! (Score:5, Informative)
There's also a dark horse in the supercomputer race; a cluster of low-end IBM servers using PPC970 chips that is in between the BlueGene/L prototype and the Earth Simulator. That pushes the last Alpha machine off the top 5 list, and gives Itanium and PowerPC each two spots in the top 5. It's amazing to see the Earth Simulator's dominance broken so thoroughly. After so long on top, in one list it goes from first to fourth, and it will drop at least two more spots in 2005.
Not fully true (Score:2, Informative)
Re:Read on to the next paragraph (Score:3, Informative)
NASA Secures Approval in 30 Days
To accelerate NASA's primary science missions in a timely manner, high-end computing experts from NASA centers around the country collaborated to build a business case that Brooks and his team could present to NASA headquarters, the U.S. Congress, the Office of Management and Budget, and the White House. "We completed the process end to end in only 30 days," Brooks said.
Wow. That's incredibly fast, IMHO.
As the article mentions, I suppose NASA owes this to the success of their 512-processor Kalpana system [nasa.gov], in honor of the late astronaut Kalpana Chawla.
And look at this --
"In some cases, a new Altix system was in production in as little as 48 hours," said Jim Taft, task lead, Terascale Applications Group, NASA. "This is starkly different from implementations of systems not based on the SGI architecture, which can take many months to bring to a reliable state and ready for science."
w00t! That's like super-fast in terms of development time. Good job, NASA. Way to go.
And what about the other companies mentioned in the article?
In addition to Intel Itanium 2 processors, the Columbia installation features storage technology from Brocade Communications and Engenio Information Technologies, Inc., memory technology from Dataram Corporation and Micron Technology, Inc. and interconnect technology from Voltaire.
I've not heard of any of them other than Voltaire - are they well known in this area, or are they defense/NASA contractors of some kind?
Re:Read on to the next paragraph (Score:1, Informative)
Re:70.93 TeraFLOPs (Score:2, Informative)
Re:so, where's the pr0n? (Score:1, Informative)
Re:Here's the current list... (Score:3, Informative)
Slashdot may have announced the news at 10:45, but this particularly silly post of mine [slashdot.org] demonstrates, I had the news 6 and half hours early, from Dongara's paper.
Re:Photos of System (Score:3, Informative)
Re:hmmmm...... (Score:5, Informative)
In other words: RTFA, that's exactly what they're using it for.
Re:Cost (Score:4, Informative)
http://news.com.com/Space+agency+taps+SGI,+Intel+
The cost is quoted in the article at $45 million over a three year period, which indicates that the "Columbia" super cluster gets a bit more than 1 teraflop per million dollars. That seems impressive to me, considering the overall performance.
It would be interesting to see how well the Xserve-based architecture held its performance per dollar when scaled up to higher teraflop levels...
More on the Storage (Score:3, Informative)
They use tape storage from Storage Tek like this one [storagetek.com]
And harddrive storage from Engenio (formally LSI Logic Storage Systems) like this [engenio.com].
Re:What is the stumbling block? (Score:2, Informative)
Actually, this method won't work for the benchmark that is used for the top 500 list, LINPACK. The difficulty is that to solve most problems in parallel, the processors need to talk to each other. This introduces an overhead into the program, and the amount of overhead depends on the interconnect. Programs which can be parallelised without a communication overhead are called trivially parallel. LINPACK is not trivially parallel, so if you took a whole lot of computers and banged them together over Ethernet, all you'd end up with is an expensive way to keep your network busy.
The beauty of the Altix systems is that the NUMA (Non Uniform Memory Architecture) is a really fast interconnect (speaking as someone who gets to run on them).
Re:Read on to the next paragraph (Score:5, Informative)
This is why SMP computers tend to have 2 or 4 processors, and 8 at a pinch, but no more. It's just not practical, using current methods, to directly wire up more than 8 processors in such a tight package.
Lets say you have N processors, each capable of executing I instructions per second. Your total theoretical throughput would be N x I. However, this would only be the case if the system is 100% parallel, and no processor needed to communicate with any other. Rarely the case.
In practice, the function of performance to processors follows a distribution that looks a bit like a squished bell curve. As you increase the number of processors, the performance gain decreases, reaches zero, and actually becomes negative. At that point, adding more CPUs will actually SLOW the computer down.
The exact shape and size of the curve is partly a function of the way the components are laid out. A good layout keeps the amount of traffic on any given line to a minimum, minimizes the distances between nodes, and minimizes the management and routing overheads.
However, layout isn't everything. If your software can't take advantage of the hardware and the topology, then all the layout in the world won't gain you a thing. To take advantage of the topology, though, the software has to comprehend some very complex networking issues. It has to send data by efficient pathways.
If connections are not all the same speed or latency, then the most efficient pathway may NOT be the shortest. This means that the software must understand the characteristics of each path and how to best utilize those paths, by appropriate load-balancing and traffic control techniques.
If you look at extreme-end networking hardware, they can be crudely split into two camps - those where the bandwidth is phenomenal, at the expense of latency, and those where the latency is practically zero but so's the bandwidth.
The "ideal" supercomputer is going to mix these two extremes. Some data you just need to get to point B fast, and sometimes you're less worried about speed, but do need to transfer an awful lot of information. This means you're going to have two physical networks in the computer, to handle the two different cases. And that means you need something capable of telling which case is which fast enough to matter.
Even when only one type of network is used, latency is a real killer. Software, being the slowest component in the machine, is where most of the latency is likely to accumulate. Nobody in their right minds is going to build a multi-billion dollar machine with superbly optimized hardware, if the software adds so much latency to the system they might as well be using a 386SX with Windows 3.1
And that means Linux has damn good traffic control and very very impressive latencies. And it looks like these are areas the kernel is going to be improving in still further...
Re:Read on to the next paragraph (Score:4, Informative)
Slashdot carries grudges.
Re:Read on to the next paragraph (Score:3, Informative)
Their new machines stilled aren't clustered. Clusters don't generally run single system images on shared memory computers. SGI's Altix systems use a NUMA link to enable them to efficiently acces memory on remote computers, making them a kind of distributed shared memory machine. And SGI's Origin systems are your traditional SMP machine. The Altix or Origin systems are neither cheap, nor off the shelf.
Regarding your comment about them ignoring Linux, what was fundamentally wrong with that? Irix was a very capable OS, why should they have just dumped it?
Re:Read on to the next paragraph (Score:3, Informative)
Umm, not true. Sun, can hold up to 106 processers in its Sunfire 15K product, or 72 dual-core processors in the E25K.
SGI's Origin systems are equally large I believe. And manufacturers like IBM also have large SMP machines.
Being able to efficiently use that many processors is a completely different matter that depends on the nature of the problem. It is possible to efficiently to use more that 8 processors though. I've heard of programs that scaled almost linearly up to at least 40.
Re:Cost (Score:3, Informative)
Of course, if BlueGene, Big Mac, and this supercomputer demonstrate one thing, it is that focusing on the processors exclusively is ridiculous. It is the processing element interconnect that really makes the difference in parallell computing. BlueGene has 16k 'pathetic' processors (700Mhz PPC) with a focus on a really potent interconnect network to be able to scale to 65k processors with very good scaling factors.
Big Mac leverages infiniband, low latency, expensive, high bandwidth network to get where it is.
And this, only 20 nodes, each with 512 processors within a box. I don't know the boxs interconnect strategy is, but you can bet the design is much better than myrinet and infiniband, technologies that communicate via PCI bus, that are not hardset in terms of processing element count, have longer cable lengths, etc.
Look at the top500, processors are important, but the network technology is what truly makes or breaks the clusters in that realm with such high node counts.
Re:Read on to the next paragraph (Score:3, Informative)
As for linux, they stepped towards linux about the same time IBM, HP, and Oracle did. They've contributed a LOT of code to linux and GPL products. They have transitioned the bulk of their product-line to linux in the last year or so, but they started that process five years ago. They have a LOT of legacy customers and legacy code to transition. Linux is a stable and high performance OS, and it would be that way without SGI, but it got there a lot faster because of SGI's efforts.
Furthermore, SGI doesn't give a damn (nor does anyone else) if slashdot loves them or not. They care if nasa, boeing, the US navy, BP, and NBC love them. These are the people with the bucks, more interested in a solution to a problem than to any license or technology.
The real reason that SGI doesn't get the credit they should is much simpler: They put a crappy scsi controller on the mezanine-bus of the challenge-S server in 1994. In the early 90s SGI was the darling of the multi-media world. Their workstations were everywhere, and they made pretty cool servers too. They were poised to ride the same dot-com wave that SUN rode. They introduced a single-CPU server called the challenge-S, which was derived from the indy workstation. It was reasonably speedy and quite affordable (for a unix server of the time). The scsi controller, however, was quite prone to failure. They developed a bad reputation. While the world was busy buying sun servers hand-over-fist, they avoided SGIs except in the technical/defense/media markets. That legacy shaped the company into what it is today: a niche player, struggling against giants like IBM and HP, in the relatively small market for high performance computers.