Follow Slashdot stories on Twitter


Forgot your password?
Silicon Graphics Software Hardware Science Linux

SGI & NASA Build World's Fastest Supercomputer 417

GarethSwan writes "SGI and NASA have just rolled-out the new world number one fastest supercomputer. Its performance test (LINPACK) result of 42.7 teraflops easily outclasses the previous mark set by Japan's Earth Simulator of 35.86 teraflops AND that set by IBM's new BlueGene/L experiment of 36.01 teraflops. What's even more awesome is that each of the 20 512-processor systems run a single Linux image, AND Columbia was installed in only 15 weeks. Imagine having your own 20-machine cluster?"
This discussion has been archived. No new comments can be posted.

SGI & NASA Build World's Fastest Supercomputer

Comments Filter:
  • by m00j ( 801234 ) on Tuesday October 26, 2004 @10:48PM (#10638161)
    According to the article it got 42.7 teraflops using only 16 of the 20 nodes, so the performance is going to be even better.
  • Photos of System (Score:5, Informative)

    by erick99 ( 743982 ) <> on Tuesday October 26, 2004 @10:50PM (#10638178)
    This page [] contains images of the NASA Altix system. After reading the article I was curious as to how much room 10K or so processors take up.
  • Interesting Facts (Score:5, Informative)

    by OverlordQ ( 264228 ) * on Tuesday October 26, 2004 @10:50PM (#10638181) Journal
    1) This was fully deployed in only 15 weeks.
    (Link [])

    2) This number was using only 16 of the 20 systems, so a full benchmark should be larger too.
    (link [])

    3) The storage attached holds 44 LoC's (link [])
  • Ways you are wrong (Score:4, Informative)

    by RealProgrammer ( 723725 ) on Tuesday October 26, 2004 @10:56PM (#10638228) Homepage Journal
    Computer superclusters don't even have O-rings.

    They don't carry schoolteachers.

    They don't fly in the air.

    This runs Linux, not Windows. It won't crash.
  • by Anonymous Coward on Tuesday October 26, 2004 @10:56PM (#10638231)
    Just wanted to remind you of an earlier post on slashdot [] about NEC's SX-8 which has peak performance of 65 TFlops. Now, which one is the fastest?
  • by Anonymous Coward on Tuesday October 26, 2004 @10:57PM (#10638236)
    Cool...something that won't slow to a crawl while playing Sims 2.
  • Re:That's nothing... (Score:5, Informative)

    by jm92956n ( 758515 ) on Tuesday October 26, 2004 @11:01PM (#10638258) Journal
    when they hit the "TURBO" button on the front of the boxes they'll really scream.

    They did! According to C-Net article [] they "quietly submitted another, faster result: 51.9 trillion calculations per second" (equivalent to 51.9 teraflops).
  • by Doppler00 ( 534739 ) on Tuesday October 26, 2004 @11:05PM (#10638283) Homepage Journal
    by a computer they currently being set up at Lawrence Livermore National Lab: 360 teraflops []

    The amazing thing about it is that it's built at a fraction of the cost/space/size as the Earth simulatior. If I remember correctly, I think they already have some of the systems in place for 36 teraflops. It's the same Blue Gene/L technology from IBM, just a larger scale.

  • by toby ( 759 ) on Tuesday October 26, 2004 @11:06PM (#10638288) Homepage Journal
    NEC's is announced, this one is installed.
  • by Dink Paisy ( 823325 ) on Tuesday October 26, 2004 @11:06PM (#10638297) Homepage
    This result was from the partially completed cluster, at the beginning of October. At that time only 16 of the 20 machines were online. When the result is taken again with all 20 of the machines there will be a sizeable increase in that lead.

    There's also a dark horse in the supercomputer race; a cluster of low-end IBM servers using PPC970 chips that is in between the BlueGene/L prototype and the Earth Simulator. That pushes the last Alpha machine off the top 5 list, and gives Itanium and PowerPC each two spots in the top 5. It's amazing to see the Earth Simulator's dominance broken so thoroughly. After so long on top, in one list it goes from first to fourth, and it will drop at least two more spots in 2005.

  • Not fully true (Score:2, Informative)

    by ValiantSoul ( 801152 ) on Tuesday October 26, 2004 @11:12PM (#10638332)
    They were only using 16 of those 20 servers. With all 20 they were able to peak 61 teraflops. Check the article [] at CNET.
  • by Troll-a-holic ( 823973 ) <> on Tuesday October 26, 2004 @11:30PM (#10638460) Homepage
    From the article -

    NASA Secures Approval in 30 Days
    To accelerate NASA's primary science missions in a timely manner, high-end computing experts from NASA centers around the country collaborated to build a business case that Brooks and his team could present to NASA headquarters, the U.S. Congress, the Office of Management and Budget, and the White House. "We completed the process end to end in only 30 days," Brooks said.

    Wow. That's incredibly fast, IMHO.

    As the article mentions, I suppose NASA owes this to the success of their 512-processor Kalpana system [], in honor of the late astronaut Kalpana Chawla.

    And look at this --

    "In some cases, a new Altix system was in production in as little as 48 hours," said Jim Taft, task lead, Terascale Applications Group, NASA. "This is starkly different from implementations of systems not based on the SGI architecture, which can take many months to bring to a reliable state and ready for science."

    w00t! That's like super-fast in terms of development time. Good job, NASA. Way to go.

    And what about the other companies mentioned in the article?

    In addition to Intel Itanium 2 processors, the Columbia installation features storage technology from Brocade Communications and Engenio Information Technologies, Inc., memory technology from Dataram Corporation and Micron Technology, Inc. and interconnect technology from Voltaire.

    I've not heard of any of them other than Voltaire - are they well known in this area, or are they defense/NASA contractors of some kind?
  • by Anonymous Coward on Tuesday October 26, 2004 @11:30PM (#10638462)
    FreeBSD hasn't broken any networking records.
  • Re:70.93 TeraFLOPs (Score:2, Informative)

    by Anonymous Coward on Tuesday October 26, 2004 @11:44PM (#10638548)
    ... also, Folding@home, as I just check it out, is running at 196.463 TFLOPS, thankfully proving that the general population would rather solve real problems than fucking pretend they're Uhura.
  • by Jeremy Erwin ( 2054 ) on Tuesday October 26, 2004 @11:53PM (#10638601) Journal
    It may prove enlightening to check that paper for updates-- as the november 8 deadline approaches, particularly competitive teams may submit new scores as they jockey for position.

    Slashdot may have announced the news at 10:45, but this particularly silly post of mine [] demonstrates, I had the news 6 and half hours early, from Dongara's paper.
  • Re:Photos of System (Score:3, Informative)

    by Jeffrey Baker ( 6191 ) on Wednesday October 27, 2004 @12:08AM (#10638688)
    Here are some much larger, high-quality images. []
  • Re:hmmmm...... (Score:5, Informative)

    by OblongPlatypus ( 233746 ) on Wednesday October 27, 2004 @12:17AM (#10638749)
    You asked for it: "...with Columbia, scientists are discovering they can potentially predict hurricane paths a full five days before the storms reach landfall."

    In other words: RTFA, that's exactly what they're using it for.
  • Re:Cost (Score:4, Informative)

    by MrMartini ( 824959 ) on Wednesday October 27, 2004 @12:27AM (#10638802)
    Since no one else has answered my question, I'll post the results of searching on my own:,+Intel+f or+supercomputer/2100-1010_3-5286156.html []

    The cost is quoted in the article at $45 million over a three year period, which indicates that the "Columbia" super cluster gets a bit more than 1 teraflop per million dollars. That seems impressive to me, considering the overall performance.

    It would be interesting to see how well the Xserve-based architecture held its performance per dollar when scaled up to higher teraflop levels...
  • More on the Storage (Score:3, Informative)

    by Necroman ( 61604 ) on Wednesday October 27, 2004 @12:55AM (#10638941)
    Check out [] for some more info about the storage they are using. For those that don't want to wander around the site, there is a link under the picture of the storage array that says "Watch a Video" and it gives an overview of the technology that SGI uses in their storage solution.

    They use tape storage from Storage Tek like this one []
    And harddrive storage from Engenio (formally LSI Logic Storage Systems) like this [].
  • by IncandescentFlame ( 773807 ) on Wednesday October 27, 2004 @01:18AM (#10639084)
    Well, are we talking about actual supercomputers, not just clusters? 'Cause if you're just trying to break these Teraflops records, you can just cram a ton of existing computers together into a cluster, and voila! lots of operations per second.

    Actually, this method won't work for the benchmark that is used for the top 500 list, LINPACK. The difficulty is that to solve most problems in parallel, the processors need to talk to each other. This introduces an overhead into the program, and the amount of overhead depends on the interconnect. Programs which can be parallelised without a communication overhead are called trivially parallel. LINPACK is not trivially parallel, so if you took a whole lot of computers and banged them together over Ethernet, all you'd end up with is an expensive way to keep your network busy.

    The beauty of the Altix systems is that the NUMA (Non Uniform Memory Architecture) is a really fast interconnect (speaking as someone who gets to run on them).
  • by jd ( 1658 ) <> on Wednesday October 27, 2004 @01:54AM (#10639243) Homepage Journal
    Hardware only takes you so far. Scalability comes largely from the efficiency of the software. Poor software results in large amounts of communication between nodes, slowing down a cluster.

    This is why SMP computers tend to have 2 or 4 processors, and 8 at a pinch, but no more. It's just not practical, using current methods, to directly wire up more than 8 processors in such a tight package.

    Lets say you have N processors, each capable of executing I instructions per second. Your total theoretical throughput would be N x I. However, this would only be the case if the system is 100% parallel, and no processor needed to communicate with any other. Rarely the case.

    In practice, the function of performance to processors follows a distribution that looks a bit like a squished bell curve. As you increase the number of processors, the performance gain decreases, reaches zero, and actually becomes negative. At that point, adding more CPUs will actually SLOW the computer down.

    The exact shape and size of the curve is partly a function of the way the components are laid out. A good layout keeps the amount of traffic on any given line to a minimum, minimizes the distances between nodes, and minimizes the management and routing overheads.

    However, layout isn't everything. If your software can't take advantage of the hardware and the topology, then all the layout in the world won't gain you a thing. To take advantage of the topology, though, the software has to comprehend some very complex networking issues. It has to send data by efficient pathways.

    If connections are not all the same speed or latency, then the most efficient pathway may NOT be the shortest. This means that the software must understand the characteristics of each path and how to best utilize those paths, by appropriate load-balancing and traffic control techniques.

    If you look at extreme-end networking hardware, they can be crudely split into two camps - those where the bandwidth is phenomenal, at the expense of latency, and those where the latency is practically zero but so's the bandwidth.

    The "ideal" supercomputer is going to mix these two extremes. Some data you just need to get to point B fast, and sometimes you're less worried about speed, but do need to transfer an awful lot of information. This means you're going to have two physical networks in the computer, to handle the two different cases. And that means you need something capable of telling which case is which fast enough to matter.

    Even when only one type of network is used, latency is a real killer. Software, being the slowest component in the machine, is where most of the latency is likely to accumulate. Nobody in their right minds is going to build a multi-billion dollar machine with superbly optimized hardware, if the software adds so much latency to the system they might as well be using a 386SX with Windows 3.1

    And that means Linux has damn good traffic control and very very impressive latencies. And it looks like these are areas the kernel is going to be improving in still further...

  • by AndyChrist ( 161262 ) <> on Wednesday October 27, 2004 @03:32AM (#10639540) Homepage
    The reason SGI isn't getting the kind of credit it should is probably because of how they resisted linux and clustering for so long (before apparantly caving and deciding to go the direction the wind was blowing and put their expertise to doing the fashionable thing BETTER.)

    Slashdot carries grudges.
  • by CommieOverlord ( 234015 ) on Wednesday October 27, 2004 @09:33AM (#10640945)
    how they resisted ... clustering

    Their new machines stilled aren't clustered. Clusters don't generally run single system images on shared memory computers. SGI's Altix systems use a NUMA link to enable them to efficiently acces memory on remote computers, making them a kind of distributed shared memory machine. And SGI's Origin systems are your traditional SMP machine. The Altix or Origin systems are neither cheap, nor off the shelf.

    Regarding your comment about them ignoring Linux, what was fundamentally wrong with that? Irix was a very capable OS, why should they have just dumped it?
  • by CommieOverlord ( 234015 ) on Wednesday October 27, 2004 @09:40AM (#10641024)
    This is why SMP computers tend to have 2 or 4 processors, and 8 at a pinch, but no more

    Umm, not true. Sun, can hold up to 106 processers in its Sunfire 15K product, or 72 dual-core processors in the E25K.

    SGI's Origin systems are equally large I believe. And manufacturers like IBM also have large SMP machines.

    Being able to efficiently use that many processors is a completely different matter that depends on the nature of the problem. It is possible to efficiently to use more that 8 processors though. I've heard of programs that scaled almost linearly up to at least 40.
  • Re:Cost (Score:3, Informative)

    by Junta ( 36770 ) on Wednesday October 27, 2004 @09:41AM (#10641033)
    Actually, a lot of the top500 supercomputers acheive or beat $1million/TFLOPs. Even if the price points weren't that good on the component parts, marketing departments are inclined to give huge discounts for the press coverage. You can bet SGI and Intel both gave exhorbitant discounts here, SGI's market presence has been dwindling, and overall the Itanium line has been a commercial failure. Being #1 on the top 500 for 6 months (the length between list compilations, and BlueGene isn't even close to finished, the NEC supercomputer is likely to make the list after next, etc etc), is very good marketing.

    Of course, if BlueGene, Big Mac, and this supercomputer demonstrate one thing, it is that focusing on the processors exclusively is ridiculous. It is the processing element interconnect that really makes the difference in parallell computing. BlueGene has 16k 'pathetic' processors (700Mhz PPC) with a focus on a really potent interconnect network to be able to scale to 65k processors with very good scaling factors.
    Big Mac leverages infiniband, low latency, expensive, high bandwidth network to get where it is.
    And this, only 20 nodes, each with 512 processors within a box. I don't know the boxs interconnect strategy is, but you can bet the design is much better than myrinet and infiniband, technologies that communicate via PCI bus, that are not hardset in terms of processing element count, have longer cable lengths, etc.

    Look at the top500, processors are important, but the network technology is what truly makes or breaks the clusters in that realm with such high node counts.
  • by flaming-opus ( 8186 ) on Wednesday October 27, 2004 @11:25AM (#10642161)
    Commodity linux clusters are not the only kind of cluster out there. SGI has been building clusters since the late 80s. Their first super-computer product, the power-challenge clusters, were 16 and 36 way SMP boxes clustered together with hippi. Remember terminator-2 and jurasic park? Those were rendered on clusters of crimsons and indigo workstations. They may have called the NOW(network of workstations) instead of beowulf, but it was the same thing.

    As for linux, they stepped towards linux about the same time IBM, HP, and Oracle did. They've contributed a LOT of code to linux and GPL products. They have transitioned the bulk of their product-line to linux in the last year or so, but they started that process five years ago. They have a LOT of legacy customers and legacy code to transition. Linux is a stable and high performance OS, and it would be that way without SGI, but it got there a lot faster because of SGI's efforts.

    Furthermore, SGI doesn't give a damn (nor does anyone else) if slashdot loves them or not. They care if nasa, boeing, the US navy, BP, and NBC love them. These are the people with the bucks, more interested in a solution to a problem than to any license or technology.

    The real reason that SGI doesn't get the credit they should is much simpler: They put a crappy scsi controller on the mezanine-bus of the challenge-S server in 1994. In the early 90s SGI was the darling of the multi-media world. Their workstations were everywhere, and they made pretty cool servers too. They were poised to ride the same dot-com wave that SUN rode. They introduced a single-CPU server called the challenge-S, which was derived from the indy workstation. It was reasonably speedy and quite affordable (for a unix server of the time). The scsi controller, however, was quite prone to failure. They developed a bad reputation. While the world was busy buying sun servers hand-over-fist, they avoided SGIs except in the technical/defense/media markets. That legacy shaped the company into what it is today: a niche player, struggling against giants like IBM and HP, in the relatively small market for high performance computers.

All laws are simulations of reality. -- John C. Lilly