Cray CTO Says Cray Computers Are Great 338
Jan Stafford writes "Linux clusters can not offer the same price-performance as supercomputers, according to Paul Terry, chief technology officer of Burnaby, British Columbia-based Cray Canada. In this interview, Terry explains that assertion and describes Cray's new Linux-based XD1 system, which will be priced competitively with other types of high-end Linux clusters."
Theyyyyyy'rrrrrre Great! (Score:3, Funny)
I wonder how Cray computers are in milk...
Imagine... (Score:5, Funny)
The issues are progress and long-term usefulness (Score:5, Informative)
Given the difference in rate-of-evolution in the two camps, it can't be long before PC clusters, probably running Linux / with PVM or BSP (that's bulk-synchronous parallel rather than 3D graphics
It's all very well to mock the I/O of PCI, but that's why we're all imminently moving to PCI Express, at a rather more respectable (current) maximum of 8+GBps rather than 133Mbps... Run a few gigabit ethernets in a hypercube formation and you have some rapid data transfer...
I notice he hasn't quoted the data-transfer rate on these new super-duper chips. The whole article does rather look like a piece of advertising on the cheap, speaking of which, the cluster solution is (relatively) CHEAP. Did I mention that ITS CHEAP...
Simon.
Re:The issues are progress and long-term usefulnes (Score:5, Interesting)
For some applications, a cluster of slow PCs is ok. Bu if you want to do real time-intensive computation, you really can't beat a good internal bus.
Re:The issues are progress and long-term usefulnes (Score:5, Informative)
Re:The issues are progress and long-term usefulnes (Score:2)
Re:The issues are progress and long-term usefulnes (Score:4, Informative)
Re:agreed (Score:3, Interesting)
A NUMA machine is just a cluster where the wire is in the form of a bus rather than copper or fibre cabling. The communications protocol for the bus may be better optimized for "supercomputing". However, you can do the same thing for a MPP optimized network protocol.
It's all ultimately just wires and protocols.
The total lack of process migration between nodes in a cluster might actually give clusters and edge over some NUMA implementations.
Watching a single process dance aroun
Re:The issues are progress and long-term usefulnes (Score:5, Informative)
Re:The issues are progress and long-term usefulnes (Score:5, Informative)
It's not just the speed of the data transfer, it's also the latency of the interconnect. A lot of scientific codes will pass around a lot of little messages, and GigE is fast for bulk transfer, but it's not so good for that. That's why there are companies like Quadrics, Myricom, etc... Infiniband should fix this, but you'll want a big infiniband switch.
His point is building fast machines is hard, and the fastest machines are really hard. Too many folks think all you have to do is throw enough PCs and GigE nics at the problem. You can build a machine that way, but the codes don't scale well. Some scientific code will quickly show negative scaling in fact (where the more processes you add, the *slower* you code will run.) MPI codes do that all the time, which is one of the reasons you'll see people running their code at sizes smaller than the whole machine, and different sizes on different machines.
Yeah, you can build a Linux based world-class supercomputer as a cluster, but you better be willing to sweat the details is all. Or buy a Cray, I guess. ;-)
Re:The issues are progress and long-term usefulnes (Score:5, Insightful)
But even then, there are legitimate needs for supercomputers. A traditional PC-based server solution will address probably 99% of all problems. An inexpensive cluster will get you 99.9%. But there's that remaining 0.1%, and that's the target audience for whom Cray and similar companies exist.
The fact that PCs can be used almost unmodified to create supercomputers and high-speed clusters is remarkable, and says tremendously good things about the flexibility and power of the architecture as a whole. But there are just places it can't go, not yet. For example, you know how you never get 99% efficiency with 100 megabit ethernet? You're lucky to get 70% with gigabit, and 50% is a pretty common figure. PCI-X, at least at the speeds we're talking about here, is so rare now that it's hardly cheaper than custom supercomputer-style solutions - effectively because it is a custom supercomputer-style solution. I don't think we'll ever see common systems, even midrange servers, with more than one 16X PCI-X slot.
I really think this is what Cray mean here. Not that Linux-based clusters have no use, but that there is still a significant market for which they are suboptimal. And, in all probability, will always remain suboptimal. However fast PCs get, however popular PCI-X and similar high-speed buses become, supercomputers will just get faster to match... and computational problems will get harder to go along with them. I just don't see the need for supercomputers, at some level, ever going away.
(I hope people find my comment useful in some way. I elected to post it rather than mod down the idiot posting flamebait about Macs in reply to you. And here's hoping people don't interpret this as karma whoring, since usually if you say "This will get modded down" it doesn't. But... oh, hell. I don't even know which Slashdot rule of thumb applies to my post at this point.)
Re:The issues are progress and long-term usefulnes (Score:3, Informative)
Re:The issues are progress and long-term usefulnes (Score:5, Insightful)
The coolest thing about this IMHO is that Cray are using Linux for their single image systems.
Yep the performance of computers is always on the increase but there will always be demand for more compute, the question is where do you want to be on the performance curve, not the absolute performance. People solve increasingly difficult problems with increasing detail and there looks to be no slowdown. They buy what suits their budget and solve as rigorously as they can for their hardware, and as hardware improves they redefine the types of problem they want to solve.
Yup clusters are cheap and they're on the top 500 but nobody actually buys a supercomputer to run LINPACK. They use them to solve real problems, the list is just for bragging rights.
Re:The issues are progress and long-term usefulnes (Score:3, Funny)
Or for that matter, a warezed copy of Unicos....
Latency (Score:3, Informative)
The main reason for supercomputers to exist is not the high bandwidth, it's the latency of the switch. The network hardware that is used in clusters as the interconnect medium (switch) can provide very high bandwidth, but the latency
Re:Latency (Score:4, Insightful)
(BTW, I get 100 us ping time on my GigE network, but you're right that that's still 100x too slow for HPC.)
Re:The issues are progress and long-term usefulnes (Score:4, Interesting)
http://www.sec.gov/Archives/edgar/data/949158/000
Here they discuss the limitations of clusters and vector-based supercomputing.
Basically, they offer three types of supercomputers aimed at different markets: vector, massively parallel, and multithreaded. Not really sure why multithreaded means in this context (Microkernel capable of threading itself across many processors i.e. UNICOS/mk?) but they do a decent job of explaining the whole thing:
Re:The issues are progress and long-term usefulnes (Score:3, Informative)
Unless I'm now out of date, the last figures I saw said the CrayLink Interconnect can do 102 GB/sec. That's Just a tad bit more, don't you think? No messing with masses of gig ethernet to crossconnect them. It's just done.
NO WAY! (Score:5, Funny)
Next you'll tell me the CEO of SCO thinks the lawsuit is completely valid and fair!
Re:NO WAY! (Score:3, Funny)
Re:NO WAY! (Score:5, Insightful)
How about... (Score:3, Funny)
Linux vs. linux (Score:5, Funny)
The difference (Score:4, Insightful)
Re:The difference (Score:2)
Re:The difference (Score:3, Informative)
Hm, I haven't played with infiniband, but I have access to a small Myrinet cluster and it takes hell lot of efforts to write your application in such a way as to overcome the big disparity CPU power/network thoroughput and to have some normal speed-up.
Paul Terry is right - if they remove the PCI bottleneck it will be much easier to write scalable high-performance applications and then the costs will decrease.
editor training (Score:3, Interesting)
Oh, by the way, everyone who has a slashdot account should go to their preferences and set the "light" layout. You wont suffer with the bad color schemes anymore, and the results are more printer-friendly too.
Re:editor training (Score:3, Funny)
Well, if I make a particularly witty comment, of course I'd like to frame it and hang it on the wall behind my desk...
Slashdot Poster Says Comment Is Funny (Score:2, Funny)
A better angle would have been... (Score:4, Funny)
He's right!! (Score:2)
He's completely right, just not in the way he intended. You'd have a hard time making the cluster as expensive as the supercomputer....
Re:He's right!! (Score:2)
Let me see, we'll take a quarter mill and use that to purchase the switches an cabling needed to interconnect everything. Might have to spend a bit to upgrade the power to our facilities, and speaking of facilities, we will probably need a warehouse some place to keep all the systems we are
Re:He's right!! (Score:3, Insightful)
No, he's right in the way he intended.
He just leaves out a lot of information. The business environment determines what is or is not expensive. The computational environment determines what will or will not run fast, the two make a measure of how expensive something is.
If you are crunching a big continuous stream of numbers with multiple small results which are then loo
Dupe! (Score:5, Informative)
Re:Dupe! (Score:5, Funny)
Maybe "APPLE" will buy another Cray! (Score:2, Interesting)
However it spawned a popular story about how "Cray designs on Apple and Apple designs on Cray" (see link.) [tafkac.org]
And now for the REST of the story:
Did you know that Macintoshes are designed on PCs!? That's right--PCs running WINDOWS. You see, nobody makes software to burn eproms or design printed circuit boards that runs on MacOS, so the hardware group has a bunch of Windows PCs!.
So now you know the *
Re:Maybe "APPLE" will buy another Cray! (Score:2)
Re:Maybe "APPLE" will buy another Cray! (Score:5, Interesting)
Apple was trying to design a new cpu chip. It would have had vector processing capabilities not all that different from the Cray, so they bought the Cray both to do circuit simulations on the chip and as a model for their own design.
The chip was going to be a 100 MHz chip (an astonishing speed for the time) with a four-pipleline vector processing unit.
They considered (but eventually declined to) hire us to develop some kind of 3D desktop for the Mac. The idea was this would distinguish the Mac further from other computing systems, but they wouldn't be able to emulate the interface because they didn't have the horsepower.
Anyway, that's the Apple-Cray story as I understand it. I'm sure that there is a lot more to the story than I know, of course.
Thad Beier
Re:-1 Informative (Score:2)
Who cares? This is /.! Nobody reads the article, and I got modded up instantly! All it takes is a few lines of text with a few links [jerkcity.com] in it. Why bother doing any more?
heh (Score:2)
(Not a verbatim quote.)
Or for an alternative press release (Score:2)
You could look to SGI. Their Altix range is up to 1024 Itanium 2 processors in a single supercomputer, and they are putting 20 512 * processor nodes together in a cluster of linux supercomputers for NASA [sgi.com] while also working on doubling up the maximum single machine cpu count to 2048.
Clusters don't scale, huh? (Score:3, Informative)
Has this guy ever heard of Google? I can see his point to an extent; in fact his whole q&a session/blatant advert really boiled down to a single point: If you need to move a lot of data between processors, then a cluster will faire worse than one of Cray's supercomputers which have (obviously) more bandwidth between the CPUs and shared memory. It really does depend on the application, but for him to suggest an HPC is always a more economic, or even better option than a cluster of cheap x86 boxes is demonstrably false...
Re:Clusters don't scale, huh? (Score:5, Insightful)
It would be if he'd said it, so it's a good thing he didn't. He even commented that there are applications (emabarassingly parallel algorithms) that clusters do very well at. And Google is a perfect example of that.
Geez (Score:5, Informative)
Cray is a great company, but I really hate that they have to come out with things like this every now and then. Most people in need of a lot of computing power already know the difference between your products and linux clusters and really, they're going to choose whichever's most appropriate for their problem regardless of what your CTO says.
Re:Geez (Score:5, Informative)
Indeed. He actually made that point himself: "There are some applications where a well-designed Linux cluster can deliver good price/performance on a particular application; those embarrassingly parallel applications where processors spend little time exchanging data."
Correction (Score:3, Funny)
Actually, I think he said that "Cray computers rock, eh?" or perhaps it was "Cray computers kick ass, eh?" or something like that.
- Leo
Not quite so simple really is it? (Score:5, Informative)
For a 12 CPU opteron unit the academic pricing (admittedly lower than commercial but where most of their sales will go) is about 45K. That's not too shabby. Before you bounce up and down and say I can build four times the cluster for that price, it should be noted that the XD1 gives you a single systems image, which simplifies programming and makes shared memory applications (increasingly important for areas such as bioinformatics).
We have a cluster with dolphinics wulfkit, using distributed shared memory slows us down. It's not the end of the world type slow down but it's a factor. Our cluster is a sixteen node, dual xeon 2.2GHz with wulfkit 3d torus interconnects. It cost us, at academic prices, $50K. Admittedly more CPU power than the 12 Opterons but we find ourselves using distributed shared memory alot, wulfkit is great here, and that would probably be much better on the XD1. Had the XD1 been available a year ago we may have bought one instead.
It really depends on your application. Are Crays cheaper than clusters in terms of harnessable compute power per dollar? Maybe. Depends on your application. Surely that's the correct answer.
Also, buying Cray is about getting access to their software technology too.
R-S
The argument (Score:5, Informative)
He basically said faster communications needed (Score:3, Insightful)
So, the answer is to recognize that in a cluster most of the machines don't need video cards. That means Somebody can design a fiber-optic communications card that plugs into the AGP slot (or maybe a PCI Express slot). Then, Cray, look out!
Re:He basically said faster communications needed (Score:3, Interesting)
2 word summary of article (Score:2)
I for one ... (Score:3, Funny)
I, for one, welcome our new story-duplicating, supercomputer-mocking, Slashdot editor overlords ...
Why are Linux clusters' interconnects slow? (Score:2)
Why can't Linux clusters use the same high performance interconnects? Is it because of cable overhead (length, signal travel, insulation, etc...) or is it bec
Re:Why are Linux clusters' interconnects slow? (Score:2)
They can. It's just a matter of how much you want to spend, and the result wouldn't necessarily be a "cluster" any more. It's distance, bus overhead, network overhead, chipset architecture, everything you listed and more.
ALERT!!! BREAKING NEWS!!! (Score:2)
Seriously, this is news?
A little inaccurate... (Score:2)
Re:A little inaccurate... (Score:3, Insightful)
Something like SETI@home could scale almost infinitely. The data elements are completely unrelated.
But if every node needed access to the same chunk of data, then the more nodes you add, the more they "fight" over that chunk of data.
Ultimately, with a PC cluster solution, only one node at a time can be accessing any given section of "shared" memory.
That's what he means, and he's right.
Look at the slashbots who can't under
In other news... (Score:5, Insightful)
It ain't religion. (Score:5, Insightful)
Yes clusters are good for some stuff but we should be rooting for Cray if they're creating interesting products that fill a need, and that's exactly what they do.
It is a fact that supercomputers have an architecture that clusters cannot compete with for some classes of problem. Get over it, live with it and enjoy the fact that supercomputers are running Linux too.
It's pretty darned cool that Cray survived until now and that they still have a market for large single image systems.
Comment removed (Score:5, Funny)
Let's do some bandwidth math... (Score:4, Interesting)
"A 96 GB per second, nonblocking, crossbar switching fabric in each chassis provides four 2 GB per second links to each two-way SMP and twenty-four 2 GB per second interchassis links."
-So for a dual-opteron XD1 processor unit, there is 8GB total bandwidth available.
Total aggregate PCI bandwidths (Accepted standards):
PCI32 33MHz = 133MB/s
PCI32 66MHz = 266MB/s
PCI64 33MHz = 266MB/s
PCI64 66MHz = 533MB/s
PCI-X 133MHz = 1066MB/s
PCI Express = 200MB/s (Per slot)
PCI Express x16 = 3000MB/s (Usable bandwidth)
-So for PCI Express x16 we're talking 3GB/second
SMP Opteron with two PCI Express x16 slots can do 6GB/second aggregate bandwidth. A couple of Infiniband links can easily saturate that. I'm sure this all costs quite a bit less than Cray's propriatary stuff.
Taken a little out of context (Score:4, Insightful)
As always, it depends on the application (Score:5, Insightful)
I am currently running a model using legacy FORTRAN 90 code which was written before there were clusters. It does use OMP but OMP sucks and is no substitute for code which is written with MPI in mind. The model as it currently stands requires big iron to do big runs, and it is inefficient, but it works and sometimes I just need to do science and not model development. I am working on MPI-izing the code; no small feat, but the rewards would be quite worth the effort.
In summary, both clusters and big iron have their place. Folks have a habit of making a false dichotomy with regards to these two options. I wouldn't trade my cluster for the world (currently doing parallel POV-Ray rendering of my 3D thunderstorm data, see my web link and an upcoming [not sure what month] Linux Journal article if interested) as it is perfect for much of what I am doing right now and I don't have to share it with anyone. But I will also use big iron when necessary.
Doom III (Score:3, Funny)
Target audience... (Score:4, Insightful)
Most slashdotters are technical enough to realise this...but...we are not the target audience of the original article. Such articles are meant for high level executives and relatively non-specialist managers who don't always hear all sides of the story. Every day these people are seeing articles and news blurbs stating how the latest linux cluster is as good or better than a supercomputer, and gee isn't that swell! While such press is good, and important, not everyone hearing that implicitly understands that such reports only apply to SOME applications.
So what the original article is, is a message from one executive to other executives trying to clarify the situation. Basically saying "hey, just because Wired ran a story that says linux clusters are the next best thing since sliced bread, doesn't mean that this is the best solution for you. Now, let us talk about what you need."
I see nothing wrong with this. I read the article, and found nothing in it that was false.
It is good because sometimes an exec will listen to a fellow exec when they won't listed to the advice of their own techs because of something said exec read in Scientific American.
Welcome to corporate america boys and girls.
(Disclaimer: Wired and American Scientific were random examples. I know of know articles in either publication about linux clusters. Both are fine publications.)
Re:*Shock* (Score:5, Insightful)
Re:*Shock* (Score:5, Insightful)
There are tasks that a cluster of Linux shitboxen will do well, and tasks where the cluster will not hold up so well against a real supercomputer. Google is an example of a perfect application for networked Linux servers. If you're simulating cloud physics one molecule at a time, though, you are a lot better off using the right tool for the job instead of 1,024 wrong ones.
Re:*Shock* (Score:5, Insightful)
In this case the right tool is a vector based supercomputer like the SV1 (8 vector processors at 2Gflops each . . . MMmmmmmmm). A cluster based approach will waste more processing time with the message passing than anything else. Cheaper maybe, but grosely ineffecent.
-nB
Re:However (Score:3, Informative)
Um, yes. The grandparent and ggp were (I think) inferring though that for that particular application you actually won't be able to be both better and cheaper with a clustering solution.
i.e. if you throw enough Linux boxes into the cluster to be able to achieve the "better (faster)" solution, you will no longer be cheaper.
But I don't think anyone was arguing that even if a cluster is cheaper and faster you should still go with a super
Re:*Shock* (Score:2, Interesting)
If you want a Cray supercomputer, you have to buy it from Cray. If you want a Linux cluster, you can buy it (or build it) from anyone.
I'm sure there are applications for a supercomputer, but I see universities, production studios (Pixar!), and research labs moving toward clusters. The supercomputer companies will do anything it takes to either stop that from happeneing or to gain in that market.
Re:*Shock* (Score:2)
There are very few situations in which there isn't a single parallizable task, and if there is one, a cluster is probably your best bet.
Re:*Shock* (Score:5, Insightful)
Re:*Shock* (Score:2)
That's the thing about clustering - you only need *one* parallelizable cpu-intensive task, and a cluster becomes worth it.
Re:*Shock* (Score:4, Insightful)
Of course, it really does depend on the problem you're facing. Most people who pay for results, though, want results as fast as possible, and that's why supercomputers win for problems that aren't "embarassingly parallel".
Exploiting parallelism vs. efficient computation (Score:3, Interesting)
Re:*Shock* (Score:3, Insightful)
But if the supercomputer is more efficient per raw unit of power, then the price per unit doesn't matter.
I work for living with HPC, buth with clusters and with large SMP machines. The cluster is nice, but there are some things than can _only_ be run a large SMP machine or are much, much faster on a SMP.
Re:*Shock* (Score:3, Insightful)
a) (google-like) jobs well suited to a high degree of parallel processing.
b) complicated problems that can't easily be broken down to make use of a large number of CPUs, but require a lot of operations to be completed in the proper sequence.
On the first, a cluster is a great idea.
On the second, a reaaaaaallly fast CPU is a great idea.
Re:*Shock* (Score:5, Informative)
However, if your supercomputer goes down... well, your screwed
Cray supercomputers have built-in redundancies. All the subsystems are separate from the processors and memory, which are actually "clustered" (depends on model). Even the OS has build-in means to survive the harshest hardware catastrophe by checkpointing the running jobs regularly, to off-site disks.
1000 machines are more reliable then 1 big machine
Wrong again. With 1000 lousy cheap machines, you need an on-site team of technitians to keep the all up. Supercomputers (with built-in redundancy etc.) have equal or less maintenance requirements.
I dont have enough money.. (Score:4, Funny)
Re:*Shock* (Score:5, Insightful)
P.S. You are so l33t for using TT.
Re:*Shock* (Score:5, Informative)
He's saying that linux-based *supercomputers* are faster then linux-based *clusters*.
(although, you can probably cluster those supercomputers...)
My understanding of what Cray is actually saying (Score:3, Insightful)
Cray's competitors in the cluster markets include IBM, and their main competitor in the vector-based market is NEC.
Re:*Shock* (Score:2, Informative)
Re:*Shock* (Score:5, Informative)
Re:*Shock* (Score:2)
Unless you have a single monolithic entangled run, you don't need a supercomputer - hence, the surging popularity of clusters. Yes, not everything is suited for clusters... but most things are, because most have parallelizable components at least *somewhere* in the process.
Re:*Shock* (Score:2)
Depending on a particular Cray, the tech may or may not be significantly different than a Beowulf cluster. Let's take NUMA as an example. NUMA started at Cray, was acquired by SGI and then sold to Sun.
In those examples, the "supercomputer" is nothing more than what amounts to a fancy cluster. The interconnects are faster. However, you are still just tying together a bunch of big bricks that l
Re:Unfuglify (Score:2, Informative)
Not to nitpick but a Viola is a string instrument in the violin family, the word you want is voilà.
Re:Unfuglify (Score:2)
Found this a while back, and now have it in my Firefox Toolbar - works great.
No ... (Score:5, Informative)
It means it is so trivial to parallelize the problem and get gains from it (think SETI@Home) that it's a no-brainer.
Other computational problems don't just simply fan out to the bazillions of nodes with tiny independant pieces of data.
Your assertion that the Cray CTO is talking FUD when he uses the actual term is just plain wrong and unfair to him. He actually knows what he's talking about.
Re:No ... (Score:2)
Re:Yes he's talking FUD (Score:4, Informative)
There's an entire branch of parallel application which are labeled "embarrassingly parallel". This description simply means that such programs are trivially parallelized and achieve as close to linear as possible when scaled across many nodes. This is because of the low inter-node communications.
For "embarrassingly parallel" applications, a cluster is a really good tool. For programs that parallelize as nicely a nice big vector or smp will do nicely. Some code will run better on small 20CPU SMP machine than on a 1000 node cluster.
Re:Yes he's talking FUD (Score:2)
Re:Whee! (Score:3, Informative)
Hi, clueless Slashbot. This is a quick rundown of why your post was stupid, and why Cray supercomputers do, in fact, do some things better than a PC cluster regardless of price.
If you have a supercomputer, you have a very, very, very fast internal bus handling all necessary data transfer. Even with the advent of PCI Express, a cluster of PCs must run in a network model. Therefore, any data crunching that occurs must pass through a network layer, the bus, the physical medium, and back through those limiters
Re:Cool, competition (Score:2)
Re:Cool, competition (Score:2)
No kidding.
Even for geeks, this isn't really news.
Re:Cool, competition (Score:2)
Re:Two words: (Score:4, Insightful)
If your goal is to run simulations where each piece of the simulation depend on large subset of the other pieces, then you will need ridiculous interconnect speeds, and you're likely to end up with something you could have bought from Cray or SGI or some of the other remaining supercomputer manufacturers for a fraction of the price.
Luckily for you and the rest of us many problems can be split into relatively independent pieces, in which case a Beowulf cluster or similar is more than adequate.
If you seriously believe that clusters can compete with supercomputers for every type of problem, you need to think again.
Re:Two words: (Score:3)
1. Cray is definitely pro-linux. It's what their XD1 runs. Though not their bigger computers.
2. There are some problems for which that a cluster can not even come close to achieving the performance of a supercomputer. For a lot of problems yes, for some maybe if you spend a fortune on fancy interconnects, and for some no.
3. If you're commercially building clusters let me know company it is. I'm in the market for a 128CPU cluster and I want to know who not to buy from.
Re:Two words: (Score:5, Interesting)
Not quite true. First off, you get much higher bandwidth between processors using proprietary (NUMA) based interconnects than you can with commodity hardware. Why? Because you can optimize for your situation. Second you can exploit things like cache-coherency between processors (even if they're in different "nodes") and therefore true shared memory. So, a 1024 processor SGI Altrix, or a 256 processor Cray is one computer as far as the OS and user-land stuff is concerned.
There's another advantage Cray has on the SV and X series and that's a vector unit on the processor. That allows you to conduct operations on arrays of numbers at once instead of having to cycle through the numbers in a loop. For example, the dot_product between two small arrays might be accomplished with one or two instructions, as opposed to a loop. Apple's AltiVec is also a vector unit.
If you took money out of the picture it would be easier to deal with a big-honkin' super computer like an SGI or Cray rather than a cluster. One computer is easier to manage and you could always use threads and plain old heap memory (which is much faster than message passing over a network).
Add money back in and 500,000 goes a lot farther in raw compute power when you're buying racks of DELLs and infiniband interconnects. However, depending on the application, you may be faster, slower, or even dog-slow compared to the cray. If you need the answer today, and the $ is not a factor, go to Cray or SGI with a blank check. If you have to balance cost and time, then a cluster might be better.
Essentially, it boils down to how much communication you do between nodes. Cray does it orders of magnitude faster than off-the-shelf stuff. If you hardly ever pass messages between nodes, clusters are fast. If you have to pass a lot of messages between nodes, one big computer will trounce lots of little ones.
So why did you say it was FUD? (Score:3, Insightful)