World's Fastest Supercomputer To Be Built At ORNL 230
Homey R writes "As I'll be joining the staff there in a few months, I'm very excited to see that Oak Ridge National Lab has won a competition within the DOE's Office of Science to build the world's fastest supercomputer at Oak Ridge National Lab in Oak Ridge, Tennessee. It will be based on the promising Cray X1 vector architecture. Unlike many of the other DOE machines that have at some point occupied #1 on the Top 500 supercomputer list, this machine will be dedicated exclusively to non-classified scientific research (i.e., not bombs)."
Cowards Anonymous adds that the system "will be funded over two years by federal grants totaling $50 million. The project involves private companies like Cray, IBM, and SGI, and when complete it will be capable of sustaining 50 trillion calculations per second."
good stuff (Score:4, Interesting)
Personally I'm happy to see Cray still making impressive machines. Not every problem can be solved by "divide and conquer" clusters.
Re:good stuff (Score:2)
Re:good stuff (Score:4, Insightful)
Obviously there is a lot more that could affect the performance, such as how memory is implemented. In general though, the system will perform best when each processor is performing calculations, rather than looking after ehernet connections.
Re:good stuff (Score:5, Interesting)
Re:good stuff (Score:2)
Last I heard, the VT cluster had done littl
Re:good stuff (Score:3, Insightful)
Re:good stuff (Score:2)
Re:good stuff (Score:5, Informative)
For other problems, where interprocess/node communication is high or very high, you need a high speed interconnect (like NUMAflex in SGI's) to get you the scalability you need, as you increase the number of processors/nodes and the size of the data set increases. The big systems like Crays and the bigger SGI's and IBM Power series have those high speed interconnects and will allow you to scale more efficiently than the clusters. They cost a lot more though
A good book to read on the subject of HPC is High Performance Computing by Severance and Dowd (O'Reilly). It's a little old now, but it covers a lot of the concepts you need to know about building a truly HPC system (architecture as well as code).
Wow... (Score:3, Funny)
Re:Wow... (Score:5, Funny)
Re:Wow... (Score:2, Insightful)
Re:Wow... (Score:2)
Re:Wow... (Score:2)
However, if it was moving AWAY at a high rate of speed, it would indeed result in red shift. And since we ALL want to move away from Outlook as fast as we can, this is indeed a grand achievement!
Re:Wow... (Score:2)
How's that for an anti-MS joke?
Re:So... (Score:2)
Someone mod the parent up, it's funny.
Qualifier (Score:5, Insightful)
What makes for fastest? (Score:2)
When complete it will be capable of sustaining 50 trillion calculations per second.
Screw that. How many fps can it manage in Quake III?
Re:Qualifier (Score:2)
However, to answer the original poster, our (Cray's) definition of "fast" is probably pretty close to NEC's definition. The goal here is to build a machine that's not just the fastest on benchmarks, but really is the fastest in the world on most real pr
Re:Qualifier (Score:2)
50 trillion (Score:2, Interesting)
Wow, that's darn fast.
I wonder if that processing power could be used for rendering like was done by Weta and how the performance could compare to their renderfarm.
Re:50 trillion (Score:2, Funny)
Re:50 trillion (Score:4, Insightful)
Sure, but the real question is why would you? The cost of this on a per mip basis is sure to be much higher than a renderfarm. In addition, ray tracing lends itself to parellelism. There are many other problems out there that do not that can use this kind of box.
Hmm (Score:5, Funny)
Hmm...I wonder if I could borrow it for a few days to give my dnet [distributed.net] stats a boost
Re:Hmm (Score:3, Funny)
Shamelessly plagerized (Score:3, Funny)
Yeah... (Score:2)
Re:Yeah... (Score:3, Interesting)
Re:Yeah... (Score:2)
They've gone from giving me a mid to late April ship date to "Sometime in June".
Screw that. Apple is screwing the pooch if they're at all serious about getting into enterprise computing. It's one thing to slip one or two months, but now they're at four, and I wouldn't be suprised to see it go to 6 at this point.
Fartknockers.
Re:Yeah... (Score:2)
Doom III (Score:4, Funny)
They better hurry ... (Score:5, Interesting)
Still a whole year until they have a full machine, but the 512-way prototype reached 1.4 TFlops (LinPack). The complete machine will have 128 times the nodes and 50% higher frequency. So even with pessimistic scalability, this will be more than twice as fast.
Re:They better hurry ... (Score:5, Informative)
It's a mesh of a LOT of microcontroller-class processors. The theory being that these processors give you the best performance per transistor. Thus you can run them at a moderate clock, get decent performance out of them, and cram a whole hell of a lot of them into a cabinet. It's a cool design, I'm interested to see what it will be able to do, once deployed. However, for the problems they have at ORNL, I'm sure the X1 was a better machine. Otherwise they would have bought IBM. They already have a farm of p690s, so they have a working relationship.
50 trillion calcs/sec...how fast really? (Score:4, Insightful)
The article mentions that the new supercomputer will be used for non-classified projects. Does anyone have more exact details of what these projects may involve? Will it be a specific application, or more of a 'gun for hire' computing facility, with CPU cycles open to all comers for their own projects? It would be interesting to know what types of applications are planned for the supercomputer, as it may be possible to translate a raw measure of speed like the quoted '50 trillion calculations per second' into something more meaningful, like 'DNA base pairs compared per second', or 'weather cells simulated per hour'. Are there any specialists in these kinds of HPC applications who would like to comment? How fast do people think this supercomputer would run apt-get for instance? Would 50 trillion calculations per second equate to 50 trillion package installs per second? How long would it take to install all of Debian on this thing? Could the performance of the system actually be measured in Debian installs per second? I look forward to the community's response!
Re:50 trillion calcs/sec...how fast really? (Score:2)
Re:50 trillion calcs/sec...how fast really? (Score:2)
Other big things are weather prediction, fluid dynamics, classical (i.e. "Newtonian") molecular dynamics with some kind of empirical potentials (e.g. protein folding and stuff can be thought of as MD).
DOE "user facility" (Score:2)
This will be what is known as a "user facility" at DOE. CPU time will be doled out on a competetive basis, i.e., if someone has a project they would like to use it for, they will submit a proposal which will then be reviewed against others.
Re:50 trillion calcs/sec...how fast really? (Score:2)
Generally, these are voltage and force relaxations, with some areas of well defined voltage, some point charges thrown around, and very complex geometry. Basically, that means I set up a
Maybe it's me. (Score:2)
Re:Maybe it's me. (Score:2)
Re:Maybe it's me. (Score:2)
I don't know why they would need it, but that's just because I don't know anything about the work of the DOE (not being an american and all that)
as a former DOE employee (Score:5, Interesting)
the DOE does a lot of basic research in nuclear physics, quantam physics, et cetera. the FEL was used to galvanize power rods for VPCO (now Dominion Power) and made them last 3 times as long. Some William & Mary people use it for doing protein research, splicing molecules and stuff.
The DOE does a lot of very useful things that need high amounts of computing power, not just simulating nuclear bombs (although Oak Ridge does taht sort of stuff, as does Los Alamos). We only had a lame Beowulf cluster at TJNAF. I wish we would have had something like this beast.
I want to know how it stacks up to the Earth Simulator.
Re:Maybe it's me. (Score:5, Informative)
But the computer's record will be short-lived... (Score:2, Funny)
It's Longhorn compatible then ? (Score:2, Funny)
or it certainly seems like it (reading the specs of the thing)
No bombs? (Score:2)
Re:No bombs? (Score:2)
Bomb usable life (Score:2)
Also, in nukes, the short-lived component is the initiator, which is based on an alpha emitter with a half-life of a few months. They have to be changed out regularly.
Cray X1.. What role do IBM and SGI have? (Score:2, Informative)
Oak Ridge has done extensive evaluations of recent IBM, SGI and Cray technology. Though I am still looking forward to data on IBM's Power5.
Cray X1 Eval [ornl.gov]
SGI Altix Eval [ornl.gov]
Re:Cray X1.. What role do IBM and SGI have? (Score:3, Informative)
It's probably just spin to call the project "A computer", rather than "several computers". Deep in one of those ORNL whitepaper
Grab that cash with both hands and make a stash (Score:2, Offtopic)
And I'm still waiting for my turn to drive one of the Mars rovers.
Actually, after auditing, it looks like you owe us (Score:2, Funny)
Thank you for your understanding in this matter,
Your friendly neighbourhood IRS agent.
3D torus topology (Score:5, Informative)
So each node is directly connected to six ajacent nodes. Contrast this with the Thinking Machines Connection Machine CM2 topology, which had 2^N nodes connected in an N dimensional hypercube. [base.com] So each node in a 16384 node CM2 was directly connected to 16 other nodes. There's a theorem that you can always embed a lower dimensional torus in an N dimensional hypercube, so the CM2 had all the benefits of a torus and more. This topology was criticized because you never needed as much connectivity as you got in the higher node-count machines, to CM2 was in effect selling you too much wiring.
Thinking Machines changed the topology to fat trees [ornl.gov] in the CM5. One of the cool things about the fat tree is it allows you to buy as much connectivity as you need. I'm really surprised that it seems to have died when Thinking Machines collapsed. On the other hand, any kind of 3D mesh is probably pretty good for simulating physics in 3D. You can have each node model a block of atmosphere for a weather simulation, or a little wedge of hydrogen for an H-bomb simulation. But it might be useful to have one more dimension of connection for distributing global results to the nodes.
Re:3D torus topology (Score:3, Funny)
Excellent. We can finally solve the Optimal Dungeon Theorem on hex tile games.
Re:3D torus topology (Score:2)
(Oh, and if you meant something else
Re:3D torus topology (Score:2)
Re:3D torus topology (Score:2)
Fighting the temptation ... (Score:2)
Re:Fighting the temptation ... (Score:2, Informative)
Re:Fighting the temptation ... (Score:4, Informative)
For these sorts of machines, one can by utilities for data migration, backup, debugging, etc. However, the production code is written in-house, and that's the way they want it. Weather forcasting, for example, uses software called MM5, which has been evolving since the Cray-2 days, at least. A lot of this code is passed around between research facilities. It's not open source exactly, but the DOD plays nice with the DOE, etc.
The basic algorithms have been around for a long time. In the early 90's, when MPPs and then clusters came onto the schene, a lot of work was done in structuring the codes to run on a large number of processors. Sometimes this works better than other times. Most of the work isn't in writing the code, but rather in optomising it. Trying to minimize the synchronous communication between nodes is of great importance.
Re:Fighting the temptation ... (Score:2)
While much of software will be custom applications, there are common packages that you'll find for simulatiing molecular interactions, doing sequence analysis, etc.
You can check out a list of software available on a CRAY T3E [psc.edu] to get an idea.
NOT the fastest! (Score:5, Interesting)
Re:NOT the fastest! (Score:2, Insightful)
Re:NOT the fastest! (Score:2)
Something like CFD or FEM is about in the middle, which is to say that clusters and SCs do about as well as each other. This is because, although there is a requirement that nodes communicate, the amount of communication is relatively low compared to the amount of internal computation. ie each cell is mostly affected by the cells direc
Un-classified research uses (Score:4, Interesting)
In the long run one would like to be able to get such simulations from the 10,000 atom level up to the billion-to-trillion (or more) atom level so you could simulate significant fractions of the volume of cells. Between now and then molecular biologists, geneticists, bioinformaticians, etc. would be happy if we could just get to the level of accurate folding (Folding@Home [standford.edu] is working on this from a distributed standpoint) and eventually to be able to model protein-protein interactions so we can figure out how things like DNA repair -- which involves 130+ proteins cooperating in very complex ways -- operate so we can better understand the causes of cancer and aging.
Folding@Home URL (Score:3, Informative)
I Guess the Real Question is... (Score:2)
Tim
Thinking Ahead (Score:2)
Considering the whole of spacetime as a single unit, with our perception limited to only one piece of it at a time, it occurs to me that perhaps everything in both our future and past exists all at once; we're just sliding down a scale as the next section is revealed to us.
That said, wouldn't it make sense that the world's fastest computer is among the very last "super" computers built, many years (centuries? millennia?) in our future (if you want to call it that)? No comp
Wow! (Score:2)
Open to all scientists (Score:3, Insightful)
The idea is to make it more like other national labs where - for example in neutron scattering - you don't have to be an expert on neutron scattering to use the facility. They have staff available to help and you may have a grant from NSF or NIH but you can use a facility run by DOE if that's the best one for the job.
I attended this session [aps.org] at the American Physical Society meeting this March and I'm assuming this is the project referred to in the talks - I apologize if I'm wrong there, but this is at least what is being discussed by people within DOE. I'm essentially just summarizing what I heard at the meeting so although it sounds like the obvious list of things to do, apparently it has not been done before.
The prospect of opening such facilities to all scientists from all nations is refreshing during a time where so many problems have arisen from lack of mobility of scientists. For example, many DOE facilities such as neutron scattering at Los Alamos (LANL) have historically relied on a fraction of foreign scientists to come and use the facility and this helps pay to maintain it. Much of this income has been lost and is not being compensated from other sources. Further, many legal immegrants working within the Physics community have had very serious visa problems preventing them from leaving the country to attend foreign conferences. The APS was held in Canada this year and the rate of people who could not show up to attend and speak was perhaps ten times greater then the APS conferences I attended previously. Although moving it to Canada helped many foreign scientists attend, it prevented a great deal of foreign scientists living within the US from going. Even with a visa to live and work within the US, they were not allowed to return to the US without additional paperwork which many people had difficulty getting.
Obviously, security is heightened after 9/11, as it should be. I'm bringing up the detrimental sides to such policies not to argue no such policies should have been implemented, but to suggest the benefits be weighed against the costs - and the obvious costs such as to certain facilities should either be compensated directly or we should be honest and realize we are (indirectly) cutting funding to facilities which are (partly) used for defence in order to increase security.
I mention LANL despite it's dubious history of retaining secrets because I have heard talks by people working there (this is after 9/11) on ways to detect various WMD crossing US boarders. Even though they personally are (probably) well funded, if they facilities they need to use don't operate any more this is a huge net loss. My understanding is that all national labs (in the US) have had similar losses from lost foreign use.
___________________________________________
Re:Talking out my ass here, but (Score:5, Insightful)
Re:Talking out my ass here, but (Score:2)
I thought the age of the over-priced supercomputer was over, and the age of the cluster had begun?
Sure, I'd love to have one of those things in my house, but as long as the government is spending my money, I think I'd rather see them go for a more cost effective solution, rather than another 1 ton monster that'll be obsolete in two years.
If you think that $50 mil is overpriced for the fastest computer in the world, then the gu
Re:Talking out my ass here, but (Score:5, Informative)
The number of processors isn't as important as the memory architecture. Clusters of workstation-class machines have isolated memory spaces connected by I/O channels. Many non-clustered supercomputers have a single unified memory space where all processors have equal access to all of the memory in the system. This can be important for algorithms that heavily use intermediate results from all parts of the problem space.
Even so, for a given number of FLOPS, a vector machine would generally require fewer CPUs than a cluster of general-purpose machines. This reduces the amount of splitting that has to be done to the problem in the first place.
Re:Talking out my ass here, but (Score:3, Informative)
(Score:-10, Wrong)
I'm sorry dude, but this macine is going to have more than 1 CPU in it, and the work will have to be split among the processors and ran in parallel.
(Score:-100, Wronger)
Sorry, but you have it all wrong. The parent is right. The parent stated that there are problems that can't be split in
Re:Talking out my ass here, but (Score:2)
The system features powerful vector processors combined with an interconnect that scales to peak performances of multiple tens of teraflops.
The Cray X1 programming environments include a powerful and complete set of compilers, libraries, debugger and performance analysis tools that have been designed to exploit its architecture.
The Cray X1 system provides support for a variety of parallel programming models, from traditional distributed memory parallel models, to shared memory parall
Re:Talking out my ass here, but (Score:2)
No. MPI vs. pthreads can hardly called "about the same."
That sounds like an office lan to me, not a cluster. Clusters havn't used 10mbit ethernet in a long, long time. Many utilize interconnect technology like infiniband, myrinet, or dolphin which can go up to 800 MByte/sec.
I wish you were correct. I know of a new Alpha cluster that uses 100MB ethernet. Specialized interconnects are certainly better, but those in charge d
Re:Talking out my ass here, but (Score:2)
2) It's not just the bandwidth, it's the latency between nodes that is helpful. If you have a large application that is sitting on a barrier waiting to proceed, you don't need much bandwidth to tell everyone to go, but you sure as heck want to be able to tell them quickly! Not having to go th
Re:Talking out my ass here, but (Score:2)
2 Years? (Score:3, Informative)
Clusters solve different jobs than supercomputers. Sometimes they bleed into one another, but there are some things supercomputers will always be better at (because of higher memory bandwidth for one thing).
Re:Talking out my ass here, but (Score:5, Interesting)
Firstly, the X1 was greater per-processor performance by a factor of 4. Then you add an interconnect that has half the latency, and 50 times the bandwidth of myrinet or infiniband. It also has memory and cache bandwidth enough to actually fill the pipelines, unlike a Xeon which can do a ton of math on whatever will fit in the registers. Some problems just don't work real well on clustered PCs, they need this kind of big iron.
Secondly, some problems cannot tollerate a failure in a compute node. IF you cluster together 10,000 PCs, the average failure rate means that one of those nodes will fail about every 4 hours. If your problem takes three days to complete, the cluster is worthless to you. A renderfarm can tolerate this sort of failure rate, just send those frames to another node. Some problems can't handle it.
Oak ridge is very concerned with getting the most bang for the buck.
Being Snide Here (Score:5, Insightful)
I think ORNL and PSC [psc.edu] know a lot more about supercomputing than you (or Internet rag pundits) do. As others have noted, there are real reasons for Big Iron.
Clusters are great for certain problems but for heavy computation -- think simulating two galaxies colliding or earthquake modeling -- off the shelf clusters don't cut it.
They're not wasting tax-payer money unless you consider basic researcher a waste.
Re:Talking out my ass here, but (Score:2, Informative)
The civics might be fine for couriers, but if you need to move - say - an elephant they're useless.
Analogies suck, though, and I'm pretty sure I got that one wrong.
Re:Talking out my ass here, but (Score:2)
Re:Talking out my ass here, but (Score:2, Informative)
Certain operations, though, are highly dependant upon each previous result. Physics and chemical simulations are a good example. When you have situations like this,
Re:Talking out my ass here, but (Score:3, Interesting)
Umm, bwah?
It's only going to be sitting there idle if you're not properly scheduling and qeueing jobs. Also, you -CAN- do the kind of simulations (Physics, chemicals) on a cluster *points at clusters at Ch
Re:Talking out my ass here, but (Score:2)
Re:It must be said: (Score:3, Funny)
Re:Hyphenation Troll (Score:2)
I don't know which is more accurate in this case being the typical slashdotter, and not actually reading the article.
Which brings me to my point. Half the time I don't even bother trying to read the article and the other half the time it is slashdotted, which is about the same result.
Re:Hyphenation Troll (Score:2)
In fact, the intellegence definition is the typical oxymoron. Classified as "unclassified" is typical government stupidity. Think about it.
Re:Cray X1 OS is.. (Score:2)
Cray is a company that sells to huge research labs, and fortune 500 companies. Just because they don't appear on TomsHardware, or do interviews for
UNICOS/mp is based on IRIX 6.5 (Score:2)
http://www.cray.com/craydoc/manuals/S-2346-23/htm
Re:How many Apples would it take? (Score:3, Informative)
The 128 node cluster was benchmarked at ~80% efficiency, or ~1.6 Teraflops. The final cluster achieved a RMax of 10.28 TFlops, ~60% of the 17.6 TFLOP theoretical peak.
A 6000 node cluster would be very difficult to manage.
Re:Huh? (Score:3, Informative)
You're correct that th
I grew up in Oak Ridge (Score:2)
X-10 (ORNL) has branched out into a lot of helpful areas. Some of its projects include environmental cleanup and alternative energy production. It also spends a lot of resources on testing how to safely store and transport dangerous waste (
NNSA vs. Office of Science (Score:3, Informative)
You're right, but let me clarify something:
The biggest weapons labs in the country are DOE, not DOD facilities. These are the "tri-labs": Los Alamos, Lawrence Livermore, and Sandia. They are operated by the DOE's NNSA (National Nuclear Security Administration).
The other major DOE labs (including ORNL) are operated by the DOE's Office of Science. These are non-weapons labs. For you conspiracy theorists out there, it
Re:I grew up in Oak Ridge (Score:2)
Re:I grew up in Oak Ridge (Score:2)
Re:classified nonsense (Score:2)
Re:World fattest people are USian GAYS ! (Score:2)
Re:cray and fast computing -- I don't think so (Score:3, Informative)
I believe the speed was due to many factors. Here are a few.
Re:cray and fast computing -- I don't think so (Score:3, Informative)
comparing this to early crays is a little difficut though. For the early crays one advantage was vectors and the other was pipelines.
vector processors are cool, because they tend to be much more tolerant of the latency. You issue a load command, and it does loads until the vector-register is full. Equivalent to dozens of loads (and dozens of round trip latency to memory) on a scalar architecture. The same thing applies to the execution units. You tell the CPU ADD R1 R2 R3, and it pu