Virginia Tech to Build Top 5 Supercomputer? 460
hype7 writes "ThinkSecret is running a story which might explain exactly why the Dual 2GHz G5 machines have been delayed to the customers that ordered them minutes after the keynote was delivered. Apparently, Virginia Tech has plans to build a G5 cluster of 1100 units. If it manages to complete the cluster before the cut-off date, it will score a Top 5 rank in the Linpack Top 500 Supercomputer List. Both Apple and the University are playing mum on the issue, but there's talk of it all over the campus."
What about latency? (Score:2, Interesting)
Macs ? (Score:5, Interesting)
Wow, that'll make Apple's quarter for sure
Seriously though, why PowerMacs ? I've always been under the impression that intelloid machines are the cheapest commodity hardware around for equivalent processing power, if not the most exciting. Would anybody know why Powermac G5s are a better choice here?
(Note to computer zealots: it's not a flamebait, it's a genuine question, from someone who is rigorously ignorant of the Mac world. And just in case, the first sentence is a joke, too
I'm a Hokie (Score:1, Interesting)
Not! (Score:2, Interesting)
Re:What about latency? (Score:3, Interesting)
You win the moron of the article award. Congrats.
Re:Macs ? (Score:5, Interesting)
A couple of things make them suitable for clustering:
* There's heaps of processor-processor bandwidth and memory bandwidth.
* On board gigabit ethernet.
* Monster fast execution of properly written vector code.
* Well designed cooling.
Of course, the bang/buck ratio could be an issue for some debate but there's little doubt that in comparison to other commercial unices it's an absolute bargain.
Dave
Sounds familiar... (Score:2, Interesting)
Right after the Sony Playstation 2 launch, there was a big shortage. Several media stories blamed it on some "unnamed" Middle East country buying them all up to power their missles and supercomputers (because, the rumor claimed, the PS2 was just so powerful).
Wonder if Apple is trying to "pull a Sony" here...
What operating system will they be using? (Score:4, Interesting)
The article makes no mention of the operating system that will be running on this supercomputer. I for one would like to see them get this done w/ OS X rather than use GNU/Linux.
Top 5? I dont think so (Score:3, Interesting)
There is only one machine in the top 5 that this cluster could beat. The rest of the world has had 6 months to build machines too.
This should be a top 10 machine for sure. Good to see more fast machines being built every day.
Re:What? (Score:1, Interesting)
The Altivec stuff is the key, I'll bet. (Score:5, Interesting)
Re:What about latency? (Score:4, Interesting)
AltiVec (Score:3, Interesting)
What I am wondering is, what OS is this cluster going to run? I mean, have the BSD folks figured out how to scale? No chance it will be OS X...maybe AIX?
Re:Do they have a need for it? (Score:5, Interesting)
But I would bet this will be not too dissimilar in use from the HP Itanium2 referenced earlier on slashdot. I would bet one of the paramount concerns this cluster would look at is the effect of farm runoff, and probably climatology too among other things.
Re:Macs ? (Score:1, Interesting)
The rules are extremely clear: the ranking is based on high performance parallel LINPACK in DOUBLE PRECISION.
Re:Macs ? (Score:5, Interesting)
2) You would be hard pressed to configure a dual-opteron or dual-Xeon which could trounce the G5 in terms of speed and cost significantly less. MacOS X server also costs less than any version of windows (pure capital cost here for an 1100 seat license), which may also have factored in.
3) My guess is that they have struck a fairly significant deal with Apple (even so low as Apple provides them at cost, though I doubt its quite that low) in exchange for some degree of publicity when this thing is built.
Another brilliant idea, inspired by Apple (Score:5, Interesting)
Who cares?
APPLE G5'S DO NOT SUPPORT ECC.
The random bit error rate for 2200 DIMMs with 0.13u cells is roughly one '1' bit dropped to '0' every 9 hours. In other words: good luck getting any reliable, large-scale computation done with this cluster. (And I do mean "good luck" - they might get a run of two or three days without any problems once in a while.)
Now if only Apple would support PC3200 ECC DIMMS, which certainly do exist:
http://www.intel.com/technology/memory/ddr/vali
this cluster might be a bit more useful for real work.
Re:What about latency? (Score:5, Interesting)
Then if it got popular, and they were really clever, they could sell off a part of that computational power they amassed to solve other peoples problems providing for funding for new versions and new supercomputing clusters.
Re:is that so? (Score:4, Interesting)
Stephen
Re:What about latency? (Score:4, Interesting)
Now, since today's supercomputers are *all* massively parallel constructions, the difference between a commercial design and an off-the-shelf cluster is in the quality and speed of the interconnects. NEC's Earth Simulator, the prime example of 'custom' supercomputer architecture, puts many processor units on *ridiculously* fast 'local' buses, and its racks are all interconnected with still_pretty_insanely_fast (and rather expensive) custom links.
Meanwhile, more 'commercial' designs use various interconnects. IIRC, NEC's 'regular' supercomputers, which formed the design basis for the Earth Simulator architecture, use Fibre Channel 'mesh' networks between racks. The Opteron - sure to be an up-and-coming player in this market - offers HyperTransport, which it looks like Cray will be stretching to its limits on Red Storm; I'm not sure *how* long an HT bus can be, but one gets the impression they'll be stretching it as far as possible, and it's certainly high throughput/low-latency versus the technologies you'd usually find in use for 'networking.'
Anyhow, point is, those designs pack a lot of CPUs together with *very* fast interconnects (equivalent to 16, 32, 64+-way SMP), and have lots and lots of racks of those. (The Opteron/Red Storm approach sounds sexy to me, because I think Hypertransport should let them pack 'lots and lots' of CPUs together versus existing designs. I've yet to read anything about what they're actually doing with it, though.)
Now.. In contrast, an 'off the shelf' cluster is usually going to stick with Ethernet, and will only have 1 to perhaps 4 processors per [node-unit-where-the-CPUs-are-connected-on-a-fast- local-bus], depending how affordable 'cheap' multiprocessor systems are at the time. But *everyone* building supercomputers bumps up against the latency/routing problem; it's just a question of whether it's a problem for, say, 50 Earth Simulator racks (aren't there quite a few more?) vs. 1100 PowerMacs. Experimenting with 'lots of little nodes' has led us to better understand the problem, and learn how to produce tuned topologies that can compete favorably with 'purpose-built' hardware. See: http://aggregate.org/KASY0/ [aggregate.org]
Now, the question *is* one of cost-benefit. Large supercomputers tend to be built with maintenance features and power efficiency in mind. In turn, a totally 'off the shelf' cluster like KASY0 has some advantages because each machine is a cheap, practically disposable 'module' unto itself, and can doubtless be downed off the cluster, pulled out and replaced with another while being easily bench-repaired (since, after all, it's a self-contained PC, rather than a CPU blade or some other random card that would require an expensive test rack to troubleshoot). Meanwhile, if you absolutely demand low-latency, you want one sort of design (Red Storm seems to be acheiving it 'on the cheap,' by combining off-the-shelf - and thus cheap - chips and buses with smart 'custom-design' engineering) while if you can sacrifice some for throughput (jobs with few conditionals), you want another... (like 1100 G5 Macs on a shelf, wired with 'boring' gigabit ethernet, especially if Apple is giving you a bulk discount on the hardware).
So what I'm trying to say is... this is a *combination* of PR stunt and intelligent planning, and there's certainly a lot of 'good science' they could do with the beast - both in number-crunching and 'computer science' a-la cluster topologies. Whether they'll actually *use* it for such, or if it'll be solely a topology toy is anyone's guess.
I think there's some hope that it'll be the "Real Thing," though, since this would explain some of the weird rumors about FC-on-the-mainboard Macs. So they get a Real Monster, made of what will be revealed as "the new G5 Xserves" at the unveiling. The best of COTS *and* fresh d
so no G5 Xserves soon? (Score:5, Interesting)
unless there is some reason the desktops are better for this project that i did not pick up on?
as for the above question about Macs.... depending on what they want to really do with this, Altivec is really efficient for some computations. all flame wars aside there have always been people clustering Macs for certain uses. i do not know how much of it was user preference or the software they wanted to run or the simplicity of getting the cluster running.
it is supposedly VERY simple to cluster Macs. there was a story on
Re:As a VT student... (Score:5, Interesting)
Re:so no G5 Xserves soon? (Score:4, Interesting)
The heatsink is a large oblong about 5"x4"x6" with a thin grille like construction. It's just too big to go in the 1U Xserve. Give them some time to work on designing it to fit though. The G5 is an ideal CPU for the Xserve as you say.
Re:What about latency? (Score:3, Interesting)
Or how about this: your bandwidth is dependent upon the amount you contribute to the distributed processing.
Hopefully there would be some sort of minimum service level, maybe 64kbps; presumably people dropping tens of thousands expect at least a modicum of return on their investment. People who didn't want to install the client could trudge along at those speeds.
Eventually there would be a market system, whereby people would trade their completed blocks for other commodities, like food vouchers, prints, copies, cash, and sexual favours.
Good luck and godspeed,
Branch
1100 machines is ~1.1% of the placed orders (Score:2, Interesting)
According to Apple, there were "over 100,000" pre-orders for the G5. Now this includes single processor models, but the university's alleged order of 1100 machines is not going to make a big impact on everyone else.
Besides, the real reason that Apple's machines are late is case defects and AGP problems, amongst other issues that Apple has not been forthright about. At the keynote an honest Apple employee told me the machines wouldn't ship until October as there were many little problems and I should wait for the January refresh so I don't get a flaky machine.
And one has to wonder why anyone building a cluster would build it using desktop machines and not use the forthcoming G5 rackmount machines from Apple and IBM... which is supposed to include a quad-processor from IBM.
SGI Origin 3000, 1024 processors... (Score:5, Interesting)
It's hard, but not too hard or impossible. The Silicon Graphics Origin 3000 supports 512 processors in a single image system with the stock IRIX kernel and 1024 processors with the "XXL" kernel.
Rumor has it Origin 4000 will support 2048 processors, as will Altix once SGI has done some major work with their kernel patches. (Altix is currently limited to 64 processors per system image).
Re:Not fast enough (Score:1, Interesting)
Re:What about latency? (Score:1, Interesting)
Yes and no. The degree to which a cluster suffers from node-to-node latency depends on the inter-node interconnect used. However, the choice of interconnect cannot remove inter-node latency. When Node X needs to work on data set Y, the cluster controller has to send data set Y to Node X over the interconnect. Data has to be copied into Node X's memory (or onto Node X's disk, if the data set is large). This is not true in a supercomputer. So while you can lower the node-to-node latency, you cannot eliminate it in a cluster.
Isn't it premature to draw conclusions until we better understand how Apple and Virginia Tech plan to architect this new super type computer cluster?
Depends on the conclusion.
If the data set can be parallelized that easily, then they probably wouldn't use G5's, because the problem wouldn't benefit from vectorization as much as we're all assuming theirs does.
Re:1100 machines is ~1.1% of the placed orders (Score:2, Interesting)
Now what maybe true is that because these machines are going into a cluster, they don't care about cosmetic problems with a patched case door and they don't care if AGP 8X doesn't work right for high-performance 3D cards. So Apple could be dumping 1100 "good for clusters" machines on the university while waiting for an inline revision of the first batch of G5's. That makes more sense.
And you know... I'm quite glad I took an honest Apple employee's advice and don't have to worry about this stuff. Come January (Feburary is what I put on the calendar), I'll have a G5 with some of the bugs worked out and for less money.
Re:Do they have a need for it? (Score:3, Interesting)
Is there something particularly about building any clusters today that is ill advised? Anything specifically about a cluster built with these parts? Why do any science that involves a large expense when the money could be applied to "lowering tuition"? Maybe because an important part of the mission of some universities is to advance the state of knowledge by performing research that would not be done by other segments of society.
Re:Macs ? (Score:3, Interesting)
Re:Macs ? (Score:2, Interesting)
I am one of the designers of KLAT2 [aggregate.org] and KASY0 [aggregate.org], and the guy who ran the Linpack benchmarks on both. Over 3 years ago when we submitted our results for KLAT2 to the top500 list, there was no public indication that 64-bit floating point was required. It took them awhile, but the top500 website now has a FAQ that indicates "full precision" [netlib.org] is required, and they interpret that as 64-bit for most machines. FYI, 32-bit FLOPs are useful in many situations, and machines had been on the top500 list that had used 32-bit FLOPs. You might take a look at our KASY0 FAQ on GFLOPS [aggregate.org]. As a means to rank the top500, I think it is quite legitimate to require 64-bit FLOPS, but that doesn't make it "illegal" to use 32-bit Linpack FLOPS for other comparisons.
As for the G5, it won't need AltiVec to get good Linpack numbers due to its fused multiply-add capability in its dual floating point pipes. That's 4 FLOPs per clock peak! I hope VT was able to get Apple to leave out, and not charge for, the components not needed in a cluster node. The PCI-X slots in the G5 should allow VT to better use a high-speed cluster network technology. Commodity x86 boxes tend to only have 32-bit 33MHz PCI, limiting the usable link bandwidth between nodes to under a gigabit per second. For 64-bit Linpack GFLOPS per dollar, a cluster of G5's could be competative. I look forward to seeing their results, and any similar work using the upcoming Athlon 64.
Re:Macs ? (Score:2, Interesting)
So what? Did he say it was exclusive to PowerMacs? And Apple is a member of the HyperTransport consortium.
* On board gigabit ethernet.
Huh? Such things exist in the x86 world as well.
But they usually aren't standard on the motherboard.
AltiVec cannot be used; it can only perform 32 bit floating point calculations which is not legal for the Linpack benchmark used at the top500 site.
Wonderful. Once you've gotten your "top 500" rating you can turn it back on and start spanking other clusters. Next?
The Apple G5 has far more fans than your typical x86 box
Which was done to reduce the amount of noise, not because the 970 puts out enormous amounts of heat.
Oh yeah -- they can't strike out the cost of buying a copy of MacOS for each machine, can they?
Hmm, lets think about this for a second. They're buying over a thousand brand new machines and giving Apple some great PR. Do you really think that Apple wouldn't come down on the price of the included operating system?
Any more jerky questions?