Virginia Tech to Build Top 5 Supercomputer? 460
hype7 writes "ThinkSecret is running a story which might explain exactly why the Dual 2GHz G5 machines have been delayed to the customers that ordered them minutes after the keynote was delivered. Apparently, Virginia Tech has plans to build a G5 cluster of 1100 units. If it manages to complete the cluster before the cut-off date, it will score a Top 5 rank in the Linpack Top 500 Supercomputer List. Both Apple and the University are playing mum on the issue, but there's talk of it all over the campus."
As a VT student... (Score:5, Informative)
Funny, I haven't heard anything about it prior to today. Guess I'm just out of the loop then...
Re:What? (Score:5, Informative)
I take it you don't look at Think Secret on a regular basis. It is, easily, the most accurate Mac rumors site out there. In fact, they have posted info on numerous occasions that has caught the attention of Apple's lawyers, and have been forced to pull down and issue their standard disclaimer. Say what you will about other rumors sites (most of them simply feed off each other) but there are some startlingly reliable sources informing Think Secret. Frankly, I don't recall the last time they were wrong about anything they've posted.
Actually they're not that mum (Score:5, Informative)
Virginia Tech is in the process of building a Terascale Computing Cluster which will be housed in the Andrews Information Systems Building (AISB). For those who are interested in learning more about this project, we will host an information session on Thursday, September 4th from 11 a.m. to noon in the Donaldson Brown Hotel and Conference Center auditorium.
We look forward to seeing you there
Terry Herdman Director of Research Computing.
I'll try to remember to take notes on this and let you all know if there's anything interesting...
Not fast enough (Score:3, Informative)
Re:Macs ? (Score:5, Informative)
I can see it making a lot of sense. NASA and lots of bio companies use the G4s this way.
Memory (Score:3, Informative)
Rus
I get to help build it =D (Score:2, Informative)
Orientation today was postponed, however, so I won't have more details until Wednesday =/ I'm looking forward to helping out, though.
Re:Do they have a need for it? (Score:3, Informative)
Re:Do they have a need for it? (Score:4, Informative)
Re:What about latency? (Score:5, Informative)
I suggest you look at the list of the top supercomputers [top500.org] in the world. Most are clusters, ie. separate, distinct machines (just a quick glance shows the top 25 all are). It's just too darn hard to make a shared memory computer with 1000's of processors. So the common architecture is to make a cluster of smaller shared memory machines.
Besides, most clusters built utilize special interconnects like Myrinet that offer low latency connections. They're more expensive than ethernet, but it's a supercomputer so you spend it.
>> All this "the internet is one giant distributed computer" doesn't acknowledge this.
On the contrary... people know this very well. That's why we see rendering and SETI processing as distributed. They don't really need to communicate with others often.
Re:What about latency? (Score:5, Informative)
For example, you could use Myrinet to get 2 Gigabit, super low latency connectivity, or Quadrix, or Infiniband, or just a well laid out Gigabit Ethernet with high end switches.
In multiple processors in a box, the processors have to fight for the resources that box has to offer. NUMA alleviates demand on the memory, but IO operations (when writing to disk or to network) in a multiprocessor box block a good deal as the processor count in a node rises.
The idea with clusters is that inter-node communication in most cases can be kept low. Each system can work on a HUGE chunk of a problem on its own, with its own dedicated hard drive, memory subsystem, and without having too much competition for the network card. A lot of problems are really hard to solve computation wise, but are *very* well suited to distributed computing. A prime example of this is rendering 3D movies. Perhaps oversimplifying things, but for the most part, a central node divides up discrete parts (a segment of video), and each node works without talking to others until done, so the negative impact is minimal. Certain problems (i.e. nuclear explosion simulations where time and spacial chunks interact more with one another) are much more sensitive to latency/throughut. Seti@Home and distributed.net are *extremely* apathetic to throughput/latency issues (not much traffic and very infrequent communication).
Re:AltiVec (Score:5, Informative)
Real world numbers don't bear this out. Check out the Photoshop and other application performance numbers for this. The gcc version used by the SPEC benchmarks used by Apple didn't even take advantage of AltiVec. When accounted for, and any institution making such a purchase would definitely have considered this, the AltiVec-enabled PowerPC chips totally spank x86 and others in number crunching tasks.
What I am wondering is, what OS is this cluster going to run? I mean, have the BSD folks figured out how to scale? No chance it will be OS X...maybe AIX?
An OS doesn't need to 'scale' to be a member of a cluster. It just needs to run the code locally and send the result back to the cluster master node.
Re:Not fast enough (Score:1, Informative)
American Supremacy in Supercomputers (Score:1, Informative)
American fears are unfounded. Numerous universities like Virginia Tech have trained a generation of American (not foreign) students in building the finest supercomputers. MIT, Carnegie Mellon University (CMU), and Virginia Tech (to name just a few) have launched large-scale research projects staffed by top American graduate students. Their work became the foundation of several generations of multiprocessors.
By contrast, very few (if any) Japanese universities conduct large-scale research projects to build high-performance supercomputers. The Japanese government has tended to avoid funding this kind of research. Worse, there is little collaboration between industry and academia in Japan. Yet, precisely this kind of collaboration is needed for such large-scale projects: e.g. Virginia Tech is enlisting the help of Apple computer.
American companies lead by scientists trained at MIT and CMU could easily design a computer that outperforms the Earth Simulator. These companies simply have chosen to not do so because there is far more profits to be garnered by building commercial supercomputers geared for database transactions. In fact, the highest-performance commercial supercomputers nearly all come from the United States of America (IBM).
The 21st century remains Pax Americana, not Pax Asia. The hordes of immigrants trying to get the hell out of Asia and into the USA underscores this fact.
Re:That's just Hokie (Score:3, Informative)
I'm at Duke...let's just say I'm "in" on a lot of computing stuff...and I don't know of any supercomputer on campus of any significant magnitude. There's a couple clusters....
Maybe you were just making a joke....I had no idea. :)
Re:Not fast enough (Score:5, Informative)
Think Secret's Record (Score:5, Informative)
Bottom line? Like any other news organization, Think Secret has occasional misses. But those misses don't appear to include any of the items mentioned here. I think our record speaks for itself.
Nick dePlume
Publisher and Editor in Chief
Think Secret
Talk about a ton of desktops in a server room (Score:5, Informative)
The stated objective was to be on the next 500 list. Dell and HP were considered, but they couldn't fill the order in time (possibly as they have made announcements of other large clusters recently) and Apple promised delivery after someone leaked the story of the cluster meetign with Dell and HP to Apple and Apple jumped at the chance.
Basically, the story is not a rumor from the point of view of the geeks on campus who have been effected by the preperations. I'll probably post the
I'm disapointed about this being only on the Apple section of
Re:As a VT student... (Score:5, Informative)
Re:Do they have a need for it? (Score:5, Informative)
Re:Macs ? (Score:5, Informative)
The G5's floating point hardware is the most advanced to be found right now, either in standard double-precision or vector double precision.
(FYI: yes, this cluster exists, or will exist. Unfortunately I believe they will be using MPICH which might put a dent into their numbers.)
Re:Macs ? (Score:4, Informative)
The dual floating point units on the G5 will help, but it's nothing extraordinary. P4s and Athlons both have multiple floating point units. P4's are relatively orthogonal, Athlons less so. However, SSE2 allows for vectorized double precision operations. It is likely that for the linpack benchmark, best-in-class P4 or Athlon architecture-based machines would outperform best-in-class G5 machines.
Altivec is extremely powerful. However it is only useful for applications that don't require their floating point to be double precision. SSE2 is less powerful, but allows for double precision SIMD processing.
Re:AltiVec (Score:5, Informative)
Re:Talk about a ton of desktops in a server room (Score:4, Informative)
Do the math - the new generator is rated at 600kva and is already carrying several hundred machines (including a very power-hungry Sun E10K and a number of E6K-class machines). There's not enough capacity on that generator for 1,100 more systems.
(And I just wanted to say "This is All Kevin's Fault" - except for a few unrelated parts we blame on Randy
G5 Vs. Itanium2 and Opteron: Some perspectives (Score:5, Informative)
Re:What about latency? (Score:2, Informative)
Seymour Cray once said that a supercomputer is a device for turning compute-bound problems into I/O-bound problems. In other words, if your job is I/O bound, great! That means you're making the best possible use of your compute resources.
In the real world, most jobs require far more compute resources than they do I/O resources. So scaling to a thousand processors or more makes sense, because we can already scale I/O up to gigabytes per second, either disk or network, very easily.
The idea with clusters is that inter-node communication in most cases can be kept low.
Alas, the idea with supercomputers is that inter-node communication cannot be kept low. Consider terabyte-scale data set visualization, for example. There's simply no way to do that job without distributing a copy of the entire data set to every node. That makes is a really bad job for a cluster, but a perfect job for a supercomputer.
Do not be fooled into thinking that clusters are superior to supercomputers, or vice versa. There are tasks that clusters can do cost effectively that supercomputers cannot do as cost effectively. However, there are tasks that supercomputers can do that clusters simply cannot do, no matter the cost. So a one-to-one comparison between the two is inevitably going to be incomplete and misleading.
Why G5's? I helped set up the racks (Score:3, Informative)
They are having someone write infinaban drivers for OS X just for this cluster.
I look forward to helping install 4GB of ram + the infinaban cards in each of these bad boys.
It is great having connections!
Re:a Hokie alumn speaks (Score:2, Informative)
Re:Macs ? (Score:3, Informative)
Re:Macs ? (Score:1, Informative)
Speaking as somebody who has written a lot of both Altivec and SSE code, it is a bit more complex than that. Even the intel compiler doesn't generate SIMD code in most circumstances. However, the SSE/SSE2 instruction sets include instructions that are essentially normal floating-point operations operating only on the first element of the XMMn registers.
The reason why Intel tells you to use SSE instead of x87 is that they have focused on optimizing the SSE unit and only provide the x87 registers for compatibility. This way all code will be fast (even if it is "normal" FP code and only uses the first element), and in the very few cases where the compiler or programmer can optimize and use all elements it will be very fast, without having to move stuff between the normal FP registers and the XMM ones.
Go ahead and compile with -S to see the assembly output - it's quite interesting.
The fact that AltiVec actually scales pretty well for many tasks doesn't prove that the G4 is "unbalanced", it shows that SSE2 isn't particularly good.
As a theoretical pissing contest about the cleanest vector implementation I agree that Altivec is much better. The problem is that most problems aren't perfectly vectorizable, and as soon as you need to access unevenly spaced (not only unaligned) data Altivec sucks.
SSE, on the other hand, is an ugly hack just as a lot of other things from Intel. But just as most of the other ugly things from Intel the performance often wins over a theoretically nicer architecture in practice. For general algorithms SSE can often be better than Altivec. Intel isn't stupid - they've got some of the best engineers in the world.
It isn't a religion: OS X and Apple aren't bad just because SSE can be pretty good in practice. However, I do agree with the statement that the G4 is a bit "unbalanced" in that it is very fast for a small set of programs that have been manually tuned with Altivec, but it is very slow (especially on double precision FP) on the huge majority of compiled code.
In 20 years I think we're all going to be using the type of explicitly parallel instructions as present in ia64. It is horribly difficult to program, but it allows you to get much closer to the theoretical peak, and you can increase performance with more integer/fp units instead of only higher frequency. (Normal compilers can't really schedule more than 2 fp units).
Again - don't underestimate Intel. They've got WAY too much money, patents, & prestige invested in ia64 to let it fail. With Madison they have just shown us they can produce the fastest CPU in the world - all that remains now is getting the price down, and that's economy of scales (read: a matter of time).
Comment removed (Score:4, Informative)
Re:AMD x86/64 (Score:2, Informative)
Comment removed (Score:3, Informative)