Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Linux Business Hardware

Cray CTO Says Cray Computers Are Great 338

Jan Stafford writes "Linux clusters can not offer the same price-performance as supercomputers, according to Paul Terry, chief technology officer of Burnaby, British Columbia-based Cray Canada. In this interview, Terry explains that assertion and describes Cray's new Linux-based XD1 system, which will be priced competitively with other types of high-end Linux clusters."
This discussion has been archived. No new comments can be posted.

Cray CTO Says Cray Computers Are Great

Comments Filter:
  • by ackthpt ( 218170 ) * on Friday August 20, 2004 @10:07AM (#10023250) Homepage Journal
    I thought that was Tony the Tiger.

    I wonder how Cray computers are in milk...

  • Imagine... (Score:5, Funny)

    by Rosco P. Coltrane ( 209368 ) on Friday August 20, 2004 @10:07AM (#10023254)
    no nevermind.
  • by Space cowboy ( 13680 ) * on Friday August 20, 2004 @10:07AM (#10023256) Journal

    Given the difference in rate-of-evolution in the two camps, it can't be long before PC clusters, probably running Linux / with PVM or BSP (that's bulk-synchronous parallel rather than 3D graphics :-) are perfectly capable of doing what supercomputers do today. Of course, there'll be new really-super computers then, but that's a different story :-)

    It's all very well to mock the I/O of PCI, but that's why we're all imminently moving to PCI Express, at a rather more respectable (current) maximum of 8+GBps rather than 133Mbps... Run a few gigabit ethernets in a hypercube formation and you have some rapid data transfer...

    I notice he hasn't quoted the data-transfer rate on these new super-duper chips. The whole article does rather look like a piece of advertising on the cheap, speaking of which, the cluster solution is (relatively) CHEAP. Did I mention that ITS CHEAP...

    Simon.
    • by Marx_Mrvelous ( 532372 ) on Friday August 20, 2004 @10:15AM (#10023362) Homepage
      There are some limitations to clusters that "supercomputers" don't have. Even if your network were exactly as fast as the internal bus of one of the Cray supercomputers (which I highly doubt it is), you still have a logical layer on top of it (TCP/IP/UDP etc). This slows it down.

      For some applications, a cluster of slow PCs is ok. Bu if you want to do real time-intensive computation, you really can't beat a good internal bus.
    • by vondo ( 303621 ) * on Friday August 20, 2004 @10:18AM (#10023393)
      The latency on Ethernet is too high for many tightly coupled applications (lattice QCD for example). This is why people who need better networking use something like Myrinet. I would assume that these Cray machines have very high band-width, low-latency communications. This is where super-computers distinguish themselves from clusters.
    • by PythonCodr ( 731083 ) * on Friday August 20, 2004 @10:22AM (#10023444)

      It's not just the speed of the data transfer, it's also the latency of the interconnect. A lot of scientific codes will pass around a lot of little messages, and GigE is fast for bulk transfer, but it's not so good for that. That's why there are companies like Quadrics, Myricom, etc... Infiniband should fix this, but you'll want a big infiniband switch.

      His point is building fast machines is hard, and the fastest machines are really hard. Too many folks think all you have to do is throw enough PCs and GigE nics at the problem. You can build a machine that way, but the codes don't scale well. Some scientific code will quickly show negative scaling in fact (where the more processes you add, the *slower* you code will run.) MPI codes do that all the time, which is one of the reasons you'll see people running their code at sizes smaller than the whole machine, and different sizes on different machines.

      Yeah, you can build a Linux based world-class supercomputer as a cluster, but you better be willing to sweat the details is all. Or buy a Cray, I guess. ;-)

    • by ctr2sprt ( 574731 ) on Friday August 20, 2004 @10:23AM (#10023460)
      You're right, the key is "cheap." Clusters don't offer the same level of performance as supercomputers. I don't think you'd disagree with that statement. What they do is offer a similar level of performance - once unattainable by desktops or even high-end servers, and here I mean real high-end servers instead of just quad Opterons or the like - for probably a tenth the cost.

      But even then, there are legitimate needs for supercomputers. A traditional PC-based server solution will address probably 99% of all problems. An inexpensive cluster will get you 99.9%. But there's that remaining 0.1%, and that's the target audience for whom Cray and similar companies exist.

      The fact that PCs can be used almost unmodified to create supercomputers and high-speed clusters is remarkable, and says tremendously good things about the flexibility and power of the architecture as a whole. But there are just places it can't go, not yet. For example, you know how you never get 99% efficiency with 100 megabit ethernet? You're lucky to get 70% with gigabit, and 50% is a pretty common figure. PCI-X, at least at the speeds we're talking about here, is so rare now that it's hardly cheaper than custom supercomputer-style solutions - effectively because it is a custom supercomputer-style solution. I don't think we'll ever see common systems, even midrange servers, with more than one 16X PCI-X slot.

      I really think this is what Cray mean here. Not that Linux-based clusters have no use, but that there is still a significant market for which they are suboptimal. And, in all probability, will always remain suboptimal. However fast PCs get, however popular PCI-X and similar high-speed buses become, supercomputers will just get faster to match... and computational problems will get harder to go along with them. I just don't see the need for supercomputers, at some level, ever going away.

      (I hope people find my comment useful in some way. I elected to post it rather than mod down the idiot posting flamebait about Macs in reply to you. And here's hoping people don't interpret this as karma whoring, since usually if you say "This will get modded down" it doesn't. But... oh, hell. I don't even know which Slashdot rule of thumb applies to my post at this point.)

    • by Performer Guy ( 69820 ) on Friday August 20, 2004 @10:42AM (#10023694)
      Clusters are nice for some problems but message passing and memory copying over a network is not ideal even when you have what *you* think is a lot of bandwidth. Latency and cache coherency and having a single image system can be critical factors in some classes of supercomputing problem, not to mention ease of use and specialized fp vector instructions that are often supported. The topology in large systems is often built (flexibly) into the memory controller hardware, the CPU writes to memory and it finds the right node, page migration and process affinity along with other advanced features like hardware level cache coherency helps these systems outperform clusters with ease given the right problems.

      The coolest thing about this IMHO is that Cray are using Linux for their single image systems.

      Yep the performance of computers is always on the increase but there will always be demand for more compute, the question is where do you want to be on the performance curve, not the absolute performance. People solve increasingly difficult problems with increasing detail and there looks to be no slowdown. They buy what suits their budget and solve as rigorously as they can for their hardware, and as hardware improves they redefine the types of problem they want to solve.

      Yup clusters are cheap and they're on the top 500 but nobody actually buys a supercomputer to run LINPACK. They use them to solve real problems, the list is just for bragging rights.
    • Don't suppose anyone has an old YMP or whatever that they'd be willing to give to a good home in Virginia?

      Or for that matter, a warezed copy of Unicos....
    • Latency (Score:3, Informative)

      by khrtt ( 701691 )
      It's all very well to mock the I/O of PCI, but that's why we're all imminently moving to PCI Express, at a rather more respectable (current) maximum of 8+GBps rather than 133Mbps... Run a few gigabit ethernets in a hypercube formation and you have some rapid data transfer...

      The main reason for supercomputers to exist is not the high bandwidth, it's the latency of the switch. The network hardware that is used in clusters as the interconnect medium (switch) can provide very high bandwidth, but the latency
    • You might want to read the latest 10-K form from CRAY.

      http://www.sec.gov/Archives/edgar/data/949158/0000 89102004000325/v96761e10vk.htm [sec.gov]

      Here they discuss the limitations of clusters and vector-based supercomputing.

      Basically, they offer three types of supercomputers aimed at different markets: vector, massively parallel, and multithreaded. Not really sure why multithreaded means in this context (Microkernel capable of threading itself across many processors i.e. UNICOS/mk?) but they do a decent job of explaining the whole thing:


      Cray Research pioneered the use of vector systems, from the Cray-1 to the Cray C90 and T90 systems. These systems typically use a moderate number (one to 32) of very fast custom processors in connection with a shared memory. Vector processing has proven to be highly effective for many scientific and engineering application programs which over the years have been written to maximize the number of long vectors. Traditional vector systems do not scale effectively (that is, increase performance by increasing the number of processors) past a limited number of processors. We currently market one classic vector supercomputer, the Cray SX-6 system.

      Massively parallel processing architectures typically link tens, hundreds or thousands of standard or commodity processors to act either on multiple tasks at the same time or together in concert on a single computationally-intensive task. Type T systems connect each processor directly to its own private memory and the programmer must manage the movement of data among memory units and processors. Consequently these systems can be difficult to program. Type C massively parallel systems, unlike low bandwidth clusters, have high bandwidth and low latency interconnect systems and are said to be "tightly coupled" -- the Cray T3E, Red Storm and the OctigaBay product are examples of balanced high bandwidth purpose built systems that employ standard microprocessors.

      The Cray X1 system is revolutionary in that it is the first supercomputer that combines the attributes of both vector and high bandwidth massively parallel systems. The Cray X1 system has up to 64 processors per cabinet and a shared memory. The Cray X1 system can run small problems as a vector processor would or, by focusing many processors on a task, the Cray X1 system operates as a massively parallel system with a system-wide shared memory and a single-system image. The Cray X1 system is designed to provide efficient scalability and high bandwidth to run complex applications at high sustained speeds. The Cray X1E system furthers this architectural design with increased processor speed and capability.

      Our MTA-2 project for NRL is designed to have sustainable high speed, be broadly applicable and easy to program, provide scalability as systems increase in size and have balanced I/O capability. The multithreading processors make the MTA-2 system latency tolerant and, with the system's flat shared memory, able to address data anywhere in the system.
    • Wow. 8+ GB/s. Nice.

      Unless I'm now out of date, the last figures I saw said the CrayLink Interconnect can do 102 GB/sec. That's Just a tad bit more, don't you think? No messing with masses of gig ethernet to crossconnect them. It's just done.

  • NO WAY! (Score:5, Funny)

    by FortKnox ( 169099 ) on Friday August 20, 2004 @10:07AM (#10023258) Homepage Journal
    The CTO from Cray said Crays are great machines and are priced competitively!

    Next you'll tell me the CEO of SCO thinks the lawsuit is completely valid and fair!
  • by OxygenPenguin ( 785248 ) <mrunyon@gmail.com> on Friday August 20, 2004 @10:09AM (#10023276) Homepage
    a Linux cluster of Cray's?
  • by Anonymous Coward on Friday August 20, 2004 @10:10AM (#10023281)
    Is MS somehow involved? Who am I supposed to hate? Editors?
  • The difference (Score:4, Insightful)

    by rwven ( 663186 ) on Friday August 20, 2004 @10:11AM (#10023297)
    The difference is that linux clusters aren't really designed for supercomputing... more of distributed computing. Cray specializes in it. Of course they're going to come out on top....
    • Huh? Umm, no, clusters are heavily used for supercomputing. Take a glance at the top500 and see for yourself. With high-speed interconnects (i.e. infiniband/myrinet), it is very feasible.
      • Re:The difference (Score:3, Informative)

        by trifakir ( 792534 )
        With high-speed interconnects (i.e. infiniband/myrinet), it is very feasible.

        Hm, I haven't played with infiniband, but I have access to a small Myrinet cluster and it takes hell lot of efforts to write your application in such a way as to overcome the big disparity CPU power/network thoroughput and to have some normal speed-up.

        Paul Terry is right - if they remove the PCI bottleneck it will be much easier to write scalable high-performance applications and then the costs will decrease.

  • editor training (Score:3, Interesting)

    by Knights who say 'INT ( 708612 ) on Friday August 20, 2004 @10:11AM (#10023304) Journal
    You really shouldnt place commentary on a story title, unless it's an "its funny, laugh" one.

    Oh, by the way, everyone who has a slashdot account should go to their preferences and set the "light" layout. You wont suffer with the bad color schemes anymore, and the results are more printer-friendly too.
  • by Linker3000 ( 626634 ) on Friday August 20, 2004 @10:14AM (#10023341) Journal
    ...Your square boxes will never look as sexy as our 'Love Seat' [computerhistory.org]
  • "Linux clusters can not offer the same price-performance as supercomputers"

    He's completely right, just not in the way he intended. You'd have a hard time making the cluster as expensive as the supercomputer....

    • No, I can easily make a cluster as expensive as a super computer. Well, assuming I can spend all the money on the hardware necessary. Of course for a couple of million dollars, you can expect a rather impressive cluster.

      Let me see, we'll take a quarter mill and use that to purchase the switches an cabling needed to interconnect everything. Might have to spend a bit to upgrade the power to our facilities, and speaking of facilities, we will probably need a warehouse some place to keep all the systems we are
    • He's completely right, just not in the way he intended. You'd have a hard time making the cluster as expensive as the supercomputer....

      No, he's right in the way he intended.
      He just leaves out a lot of information. The business environment determines what is or is not expensive. The computational environment determines what will or will not run fast, the two make a measure of how expensive something is.
      If you are crunching a big continuous stream of numbers with multiple small results which are then loo
  • Dupe! (Score:5, Informative)

    by Xpilot ( 117961 ) on Friday August 20, 2004 @10:16AM (#10023369) Homepage
    Yeah, no wonder this post looked familiar [slashdot.org]. Yup, it's a dupe, folks.

  • Remember when Apple [tafkac.org]bought a Cray? It was mostly for show, so their R&D group can have the blinkenlights.

    However it spawned a popular story about how "Cray designs on Apple and Apple designs on Cray" (see link.) [tafkac.org]

    And now for the REST of the story:

    Did you know that Macintoshes are designed on PCs!? That's right--PCs running WINDOWS. You see, nobody makes software to burn eproms or design printed circuit boards that runs on MacOS, so the hardware group has a bunch of Windows PCs!.

    So now you know the *

    • Bullshit. Alot of the high end auto-routing stuff for PCB design runs on HPUX, AIX, even some Solaris, if I remember. Windows is an also-ran in this category, but mostly for the small developer. I doubt that Macs are designed on windows....
    • by Thagg ( 9904 ) <thadbeier@gmail.com> on Friday August 20, 2004 @11:00AM (#10023922) Journal
      As usual, there is more to the story. Apple brought my company in on a project back in the mid 80's when they bought the Cray. While we had to sign an NDA in blood, I doubt anybody will mind me talking about it now, almost 20 years later.

      Apple was trying to design a new cpu chip. It would have had vector processing capabilities not all that different from the Cray, so they bought the Cray both to do circuit simulations on the chip and as a model for their own design.

      The chip was going to be a 100 MHz chip (an astonishing speed for the time) with a four-pipleline vector processing unit.

      They considered (but eventually declined to) hire us to develop some kind of 3D desktop for the Mac. The idea was this would distinguish the Mac further from other computing systems, but they wouldn't be able to emulate the interface because they didn't have the horsepower.

      Anyway, that's the Apple-Cray story as I understand it. I'm sure that there is a lot more to the story than I know, of course.

      Thad Beier
  • I saw this MST3k blooper once where Tom called out "Cray" instead of "Crow". Still in character, And with false modesty, Crow replied with "Well that's very nice of you, Tom. I'm really more of a PC though."

    (Not a verbatim quote.)
  • You could look to SGI. Their Altix range is up to 1024 Itanium 2 processors in a single supercomputer, and they are putting 20 512 * processor nodes together in a cluster of linux supercomputers for NASA [sgi.com] while also working on doubling up the maximum single machine cpu count to 2048.

  • by FyRE666 ( 263011 ) * on Friday August 20, 2004 @10:22AM (#10023447) Homepage
    Scaling or upgrading these systems requires much more than simply ordering more parts; it opens up the whole integration exercise. From an application perspective, clusters limit application scaling. Bandwidth and latency restrictions significantly constrain performance as more processors are applied to a problem.

    Has this guy ever heard of Google? I can see his point to an extent; in fact his whole q&a session/blatant advert really boiled down to a single point: If you need to move a lot of data between processors, then a cluster will faire worse than one of Cray's supercomputers which have (obviously) more bandwidth between the CPUs and shared memory. It really does depend on the application, but for him to suggest an HPC is always a more economic, or even better option than a cluster of cheap x86 boxes is demonstrably false...
  • Geez (Score:5, Informative)

    by iamdrscience ( 541136 ) on Friday August 20, 2004 @10:23AM (#10023462) Homepage
    Being the CTO of Cray, can you expect him to say anything less? Now while his points are often valid, I think his conclusion, that supercomputers outshine linux clusters is a little inaccurate. Rather, I think the real conclusion is that linux clusters and supercomputers are both good, but at slightly different things. Which one you need to solve your problem depends ultimately, on the specific details of your problem. Again, though, being the CTO of the company, can really expect him to give a balanced opinion like that, rather than the skewed opinion that his company is always on top?

    Cray is a great company, but I really hate that they have to come out with things like this every now and then. Most people in need of a lot of computing power already know the difference between your products and linux clusters and really, they're going to choose whichever's most appropriate for their problem regardless of what your CTO says.
    • Re:Geez (Score:5, Informative)

      by argent ( 18001 ) <peter&slashdot,2006,taronga,com> on Friday August 20, 2004 @10:38AM (#10023645) Homepage Journal
      I think the real conclusion is that linux clusters and supercomputers are both good, but at slightly different things. Which one you need to solve your problem depends ultimately, on the specific details of your problem

      Indeed. He actually made that point himself: "There are some applications where a well-designed Linux cluster can deliver good price/performance on a particular application; those embarrassingly parallel applications where processors spend little time exchanging data."
  • Correction (Score:3, Funny)

    by Leomania ( 137289 ) on Friday August 20, 2004 @10:26AM (#10023501) Homepage
    Cray CTO Says Cray Computers Are Great

    Actually, I think he said that "Cray computers rock, eh?" or perhaps it was "Cray computers kick ass, eh?" or something like that.

    - Leo
  • by Anonymous Coward on Friday August 20, 2004 @10:27AM (#10023508)
    I don't think the Cray assertion is that crazy.

    For a 12 CPU opteron unit the academic pricing (admittedly lower than commercial but where most of their sales will go) is about 45K. That's not too shabby. Before you bounce up and down and say I can build four times the cluster for that price, it should be noted that the XD1 gives you a single systems image, which simplifies programming and makes shared memory applications (increasingly important for areas such as bioinformatics).

    We have a cluster with dolphinics wulfkit, using distributed shared memory slows us down. It's not the end of the world type slow down but it's a factor. Our cluster is a sixteen node, dual xeon 2.2GHz with wulfkit 3d torus interconnects. It cost us, at academic prices, $50K. Admittedly more CPU power than the 12 Opterons but we find ourselves using distributed shared memory alot, wulfkit is great here, and that would probably be much better on the XD1. Had the XD1 been available a year ago we may have bought one instead.

    It really depends on your application. Are Crays cheaper than clusters in terms of harnessable compute power per dollar? Maybe. Depends on your application. Surely that's the correct answer.

    Also, buying Cray is about getting access to their software technology too.

    R-S

  • The argument (Score:5, Informative)

    by manavendra ( 688020 ) on Friday August 20, 2004 @10:40AM (#10023659) Homepage Journal
    is based on :
    1. Heritage and resultant architecture: Linux clusters are typically processors are connected through I/O links, whereas supercomputing machines where processors exchange data and instructions through shared memory.
    2. PCI bottlenecks: This the key argument made - the bottlenecks introduced by PCI communication and the bottlenecks therein. He goes on to say that performance problems in any given such cluster tend to remain with any other such cluster. I agree with that.
    3. High Availability: He then goes on to talk about the reliability, availability and manageability of the supercomputers against typical clusters. I think there is where the FUD creeps in, along with marketing BS.
    In all fairness, he does raise a critical point, however, overall, I think considering the relative ease and popularity of building, administering and growing a cluster these days, I think cost-effectiveness of a single monolithic machine is a moot point
  • by VernonNemitz ( 581327 ) on Friday August 20, 2004 @10:40AM (#10023661) Journal
    That is, for a Linux cluster to keep up with a supercomputer, the cluster needs faster communications between processors. The bottleneck of going from processor to South Bridge to PCI Bus to Ethernet card, and back again at another processor, is the problem.

    So, the answer is to recognize that in a cluster most of the machines don't need video cards. That means Somebody can design a fiber-optic communications card that plugs into the AGP slot (or maybe a PCI Express slot). Then, Cray, look out!
  • by cascadingstylesheet ( 140919 ) on Friday August 20, 2004 @10:43AM (#10023711) Journal

    I, for one, welcome our new story-duplicating, supercomputer-mocking, Slashdot editor overlords ...

  • On the other hand, supercomputers are purpose-built to handle HPC applications, which place enormous demands on both processing power and inter-processor communication. Their design includes high performance interconnects that provide high bandwidth, low-latency communications across the entire system, regardless of the number of processors required.

    Why can't Linux clusters use the same high performance interconnects? Is it because of cable overhead (length, signal travel, insulation, etc...) or is it bec
    • Why can't Linux clusters use the same high performance interconnects

      They can. It's just a matter of how much you want to spend, and the result wouldn't necessarily be a "cluster" any more. It's distance, bus overhead, network overhead, chipset architecture, everything you listed and more.
  • This just in! Company exec. says their products are great!!!

    Seriously, this is news?
  • While many things that the Cray CTO said are true, I think the issue (obviously) has be skewed some. It really depends on the problem you are solving. Some problems will need to have data shared between all of the the nodes, but others will require that each node only has access to the data that is important to the small part of the problem that it solves. Also, the CTO mentioned that clusters don't scale very well. I don't really know what made him think this, but it seems to me that clusters do scale
    • They don't scale for applications that require shared memory access.

      Something like SETI@home could scale almost infinitely. The data elements are completely unrelated.

      But if every node needed access to the same chunk of data, then the more nodes you add, the more they "fight" over that chunk of data.

      Ultimately, with a PC cluster solution, only one node at a time can be accessing any given section of "shared" memory.

      That's what he means, and he's right. ..offtopic..

      Look at the slashbots who can't under
  • In other news... (Score:5, Insightful)

    by mrjb ( 547783 ) on Friday August 20, 2004 @10:48AM (#10023774)
    MS says their operating system is great. McDonald's says their food is great *and* cheap.
  • It ain't religion. (Score:5, Insightful)

    by Performer Guy ( 69820 ) on Friday August 20, 2004 @10:51AM (#10023801)
    It's a but depressing to watch everyone jump on Cray here despite having no clue about the key differences between supercomputers and clusters are. All this cheerleading for clusters in various posts here illustrates how thoughtless some of these posts are. Why the heck should you care if someone makes a supercomputer or a cluster. Both clusters and supercomputers lose value fast over time.

    Yes clusters are good for some stuff but we should be rooting for Cray if they're creating interesting products that fill a need, and that's exactly what they do.

    It is a fact that supercomputers have an architecture that clusters cannot compete with for some classes of problem. Get over it, live with it and enjoy the fact that supercomputers are running Linux too.

    It's pretty darned cool that Cray survived until now and that they still have a market for large single image systems.
  • by account_deleted ( 4530225 ) on Friday August 20, 2004 @10:54AM (#10023832)
    Comment removed based on user account deletion
  • by JBMcB ( 73720 ) on Friday August 20, 2004 @11:26AM (#10024282)
    From Cray (From XD1 page):
    "A 96 GB per second, nonblocking, crossbar switching fabric in each chassis provides four 2 GB per second links to each two-way SMP and twenty-four 2 GB per second interchassis links."

    -So for a dual-opteron XD1 processor unit, there is 8GB total bandwidth available.

    Total aggregate PCI bandwidths (Accepted standards):

    PCI32 33MHz = 133MB/s
    PCI32 66MHz = 266MB/s
    PCI64 33MHz = 266MB/s
    PCI64 66MHz = 533MB/s
    PCI-X 133MHz = 1066MB/s
    PCI Express = 200MB/s (Per slot)
    PCI Express x16 = 3000MB/s (Usable bandwidth)

    -So for PCI Express x16 we're talking 3GB/second

    SMP Opteron with two PCI Express x16 slots can do 6GB/second aggregate bandwidth. A couple of Infiniband links can easily saturate that. I'm sure this all costs quite a bit less than Cray's propriatary stuff.

  • by UnknowingFool ( 672806 ) on Friday August 20, 2004 @11:28AM (#10024302)
    In a way he's right. Reading the whole article, it seems apparent that he's talking about certain high performance applications. Clusters are not always the best way to solve a problem. For problems that can broken down into small independent tasks like SETI, clusters are a good solution. Clusters do have their optimization challenges with latency, bottlenecks, etc. For simulations where the tasks are dependent on each, these bottlenecks add up. The individual nodes spend as much time communicating with each other as they do computing. There are also problems that cannot be distributed. In these cases clusters are not the right solution and it may not be cost effective to use a cluster.
  • by Orp ( 6583 ) on Friday August 20, 2004 @11:53AM (#10024596) Homepage
    Both clusters and big iron have their place. I am a meteorology professor and my current research involves high-resolution numerical modeling of thunderstorms. For a problem where the domain decomposition is straightforward and internode communication isn't your bottleneck, clusters are great. One huge advantage of clusters is that they are cheap and it isn't too big of a deal to get a grant together to buy the hardware, and it's YOURS and nobody else's. A huge disadvantage to big iron is that you have to share it with about a hundred other researchers. Waiting in a queue for three days only to find you goofed up in your startup script (and the model exits immediately) is NO FUN (cf the Regatta at NCSA).

    I am currently running a model using legacy FORTRAN 90 code which was written before there were clusters. It does use OMP but OMP sucks and is no substitute for code which is written with MPI in mind. The model as it currently stands requires big iron to do big runs, and it is inefficient, but it works and sometimes I just need to do science and not model development. I am working on MPI-izing the code; no small feat, but the rewards would be quite worth the effort.

    In summary, both clusters and big iron have their place. Folks have a habit of making a false dichotomy with regards to these two options. I wouldn't trade my cluster for the world (currently doing parallel POV-Ray rendering of my 3D thunderstorm data, see my web link and an upcoming [not sure what month] Linux Journal article if interested) as it is perfect for much of what I am doing right now and I don't have to share it with anyone. But I will also use big iron when necessary.

  • Doom III (Score:3, Funny)

    by Yousef ( 66495 ) on Friday August 20, 2004 @12:22PM (#10024903)
    Finally, a machine capable of running Doom 3!
  • Target audience... (Score:4, Insightful)

    by umshaggy ( 460672 ) <damadpoet@gmail.com> on Friday August 20, 2004 @02:12PM (#10026245) Journal
    Many posts have pointed out the true fact that supercomputers are better for certain jobs that are not suited to clustered solutions (and visa versa).

    Most slashdotters are technical enough to realise this...but...we are not the target audience of the original article. Such articles are meant for high level executives and relatively non-specialist managers who don't always hear all sides of the story. Every day these people are seeing articles and news blurbs stating how the latest linux cluster is as good or better than a supercomputer, and gee isn't that swell! While such press is good, and important, not everyone hearing that implicitly understands that such reports only apply to SOME applications.

    So what the original article is, is a message from one executive to other executives trying to clarify the situation. Basically saying "hey, just because Wired ran a story that says linux clusters are the next best thing since sliced bread, doesn't mean that this is the best solution for you. Now, let us talk about what you need."

    I see nothing wrong with this. I read the article, and found nothing in it that was false.
    It is good because sometimes an exec will listen to a fellow exec when they won't listed to the advice of their own techs because of something said exec read in Scientific American.

    Welcome to corporate america boys and girls.

    (Disclaimer: Wired and American Scientific were random examples. I know of know articles in either publication about linux clusters. Both are fine publications.)

If you steal from one author it's plagiarism; if you steal from many it's research. -- Wilson Mizner

Working...