Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Hardware Technology

Where to Spend $1M on a Cluster? 104

Natchswing asks: "My university has been given a $757,825 NSF grant to build, 'A 256 node (128 pair) Beowulf parallel computing cluster ... to improve the realism of gravity-wave modeling by permitting treatment of the three dimensional problem and multiple wave interactions.' They want to pay a company to just show up and drop off a functional cluster rather than build it themselves. Since word has leaked out regarding the purchase intent, every computer manufacturer under the sun (including Apollo himself) has called up trying to sell their cluster. Since I'm no cluster expert, I'm writing Slashdot. If you had $0.7 mil to buy a pre-built cluster who would you go with and why?"
This discussion has been archived. No new comments can be posted.

Where to Spend $1M on a Cluster?

Comments Filter:
  • Competitive Bidding (Score:5, Interesting)

    by duffbeer703 ( 177751 ) on Wednesday August 04, 2004 @07:47PM (#9884507)
    Have the companies submit bids... then compare them and make a decision.

    This isn't rocket science.
    • by Kris_J ( 10111 ) *
      Buying from the lowest tender is rarely a good idea.
      • [sarcasm]My god! what a concept![/sarcasm]

        Yeah, so if you know how to write a contract, the lowest bidder is always the best choice. Think of terms like this: contract price is $700,000 if the following conditions are meet: (a,b,c) by date x and $600,000 if meet by date y. System must be free of manufacturer defects until date z. Manufacturer defects are defined with high specificity here...

        But in any case, if you tell the companys what to bid to, they will all bid to that. Then you can pick the company tha

      • No shit.

        Here's how it works. You get 5 or 6 technical staff and managers, at least 3 of whom are not involved with the proposals.

        Then you Request Proposals via a sealed bid.

        You then come up with a scoring worksheet; you weigh cost, implementation track record, hardware or whatever other factors are important to you.

        Then each person scores the proposals and you meet to go over them and come up with an overall ranking.

        It may seem drawn out, but its a system that works well AND controls costs.
      • OK this is worst joke I've ever made:

        it "worked" for the space shuttle :(

        • Take this ticket to hell and go wait for the train on Fark.
        • Actually, there was no real competitive bidding for Shuttle.

          The players were the same ones from Gemini and Apollo, they submitted proposals and NASA decided on one, then it went through years of changes and redesign, but in the end, almost everyone who had a piece of Apollo and Gemini were in on Shuttle in some form or another.

          The failues of Shuttle wasn't from the bidding process, it was from engineering tradeoffs.
    • Even if what you said were true, it's a pretty useless statement. Like reducing capitalism to "buy low, sell high."

      But there's more here than figuring out who can plunk down the best system for the specified price. There's the maintenance/support costs. And picking a particular hardware platform kind of defines your choices for software -- so whose compiler do you like best? And any serious school needs to ask: can we maybe do a better job, more cheaply, cobbling together a cluster from cheap (abandoned,

      • Capitalism is "buy low, sell high". The rest is detail.

        High-dollar Federal grants generally require that you adhere to some sort of standardized purchasing practice.

        Competitive bidding isn't simply "Ok, this guy said he can do it for $50, he wins."

        When you issue an RFP for others to come in and do work, you have to weigh various factors in your scoring.

        Price is one factor. Experience and hardware features are another. You might assign bonus points to companies that allowed for a few students to take par
      • That is a horrible definition of capitalisem, maybe a good one for the markets though.

        Capitalism would be invest in the opportunities offering the best return.
  • Me! (Score:4, Funny)

    by Jahf ( 21968 ) on Wednesday August 04, 2004 @07:50PM (#9884540) Journal
    I'll do it ... send via paypal to my /. username @ yahoo.com ...
  • Penguin Computing (Score:4, Informative)

    by retostamm ( 91978 ) on Wednesday August 04, 2004 @07:53PM (#9884561) Homepage
    Penguin Computing [penguincomputing.com] does this kind of stuff for a living. I think they are an all open source shop, too... There may be others, too.
    • Re:Penguin Computing (Score:4, Informative)

      by DA-MAN ( 17442 ) on Thursday August 05, 2004 @03:27PM (#9892441) Homepage
      Penguin Computing does this kind of stuff for a living. I think they are an all open source shop, too... There may be others, too.

      As a Systems Engineer who has worked with a number of vendors, I would say that Penguin is the bottom of the barrel in service and quality control.

      We have five clusters at our facility, the slowest of which is on the top500 in the 150 range. We've tried big and small vendors.

      Penguin is the absolute worst. No two scsi hard disks had the same firmware version, the raid controller was DOA, etc. We buy/borrow a node from each vendor and evaluate them before buying clusters, and out of all the vendors the Penguin is the one that would crash or hang all the time. After months of trying, they were never able to get this going properly. Regardless of the fact that we shipped it back twice and were told each time that we'd get back a whole new machine (it wasn't).

      I would personally recommend Appro, IBM or Western Scientific in that order. Service and quality hardware are their game.
  • ...you seriously need to put this out for RFP (Request for proposal).

    Once you've done that, look through the proposals and pick which one sounds the best.
  • Does anyone else see the VERY obvious discrepancy between the submission title and the submission? Where are the editors? Last time I checked, 0.7 mil != 1 mil.
  • by Txiasaeia ( 581598 ) on Wednesday August 04, 2004 @08:02PM (#9884630)
    could you say, "Imagine a beowulf cluster of those!" and actually be asking a legitimate question!
  • Intergalactic NatchswingCo Research.
  • by Anonymous Coward on Wednesday August 04, 2004 @08:09PM (#9884706)
    Shit, my department needs to take lessons from you guys. We need to specify the budget down to the last rubber foot and network cable just to have them review the application, and you got a grant without even having an idea what you were going to spend it on?
  • "gravity-wave modeling"?

    I thought that E-R was an aviation school.
    Are they developing an anti-gravity levitation vehicle?
  • RFP is the answer (Score:5, Interesting)

    by hectorh ( 113198 ) on Wednesday August 04, 2004 @08:15PM (#9884766) Homepage
    I think that you should look at your intended application.

    - How much disk space are you going to need in total?
    - How much disk space are you going to need per node?
    - How much RAM is each node going to need?
    - Is your application going to benefit from a low-latency or a high-bandwith connection between nodes?
    - What about cpu? which cpu family will provide the best bang/$ for your calculations? PPC or X86? x86-64 maybe?

    Once you know what you need, put it together in an RFP and send it out to every company that shows up under a google search for "beowulf cluster"

    Review the responses and pick the best.

    Since you are asking this question here, I'm going to refrain from suggesting the better option which is to build your own.

    Hector

    • I think that's pretty much right. Two suggestions though...

      Firstly, the more you put into the process the more you'll get out of it so be prepared to come up with a good RFP. If you're not an expert in clusters then you might well not know the answers to some of these questions so be prepared to take advice from suppliers. Sure, some of them may try and rip you off but most will be honest and helpful which will make the dodgy ones pretty easy to spot. Alternatively, look for some external, independent hel

    • I've been privileged to answer a lot of RFPs in my career, so here's some tips from the other side to make the process go a little smoother:

      Corporate background questions are fine, but please stick to general stuff that can be answered with boilerplate. No one at the vendor knows or cares where our executive team went to college, and it's going to be a huge PITA to track that sort of BS down.

      Ask what you want to know, but please re-read the RFP when you're done writing it. If you've asked the same questio
  • Microway (Score:5, Informative)

    by brsmith4 ( 567390 ) <.brsmith4. .at. .gmail.com.> on Wednesday August 04, 2004 @08:26PM (#9884859)
    I run a 48 Node Microway [microway.com] beowulf and I must say that it is the most stable system available. Everything came assembled and ready to go (of course, I built the enclosure and did the networking, but they will do that for you if you'd like). If you're not very knowledgeable about beowulfs, how do you know you'll need so much power? Do you know how well the software you will be using will scale? Is it close to embarassingly parallel or does it lose efficiency over X nuber of nodes? What type of resources and consumption does the program use? Is it extremely processor hungry, or does it deal with dense matrices and require low-memory latency and high bandwidth or both? Do you know if you will need the power of Myranet or will you be able to get by on GigE?

    These are important questions you must ask your researchers and yourself before you purchase this cluster. But, to answer your question, I believe Microway is the best choice and I plan on having them build our next cluster in the next fiscal year.

    -brian
    • You mean Myrinet [myri.com]?
    • Uh... this dude doesn't know how to operate his pant's fly. give him a break
  • by elliotj ( 519297 ) <slashdot&elliotjohnson,com> on Wednesday August 04, 2004 @08:31PM (#9884896) Homepage
    Whoever you chose to go with (I'm partial to Apple, but that's just me - and just because they have sexy hardware), see if you can get them to give you either more for your money, or free implementation/consulting help, or something like that in exchange for using your implementation as a success story. I think Virginia Tech got a bunch of free stuff from Apple when they decided to build their supercomputer.

    All these vendors want to be able to talk about their work. Letting them use you for marketing may help you get more for your money.
  • Not Angstrom (Score:4, Informative)

    by Anonymous Coward on Wednesday August 04, 2004 @08:33PM (#9884916)
    I currently maintain some Opteron based Angstrom Microsystems Linux clusters. We've had them for less than a year, and already 30% of our nodes have had to be replaced. Support has been a nightmare.

    Sadly, I was not around when the proposal was made, otherwise I would have rejected this cluster outright. There is no way to hook external storage up to this beast. There is no USB, Firewire, SCSI, external SATA, or fibre channel options. You can't even run an ATA cable out of the thing without drilling holes into the blade walls.

    Personally? I'm looking at an XServe or an IBM Bladecenter.. but maybe it's just because I'd like some real support.
    • Actually, the professor awarded the grant was going to choose that company. Please provide more information so I can approach him with some facts. I'm sure he would be very appreciative of the advice.
  • cluster experience (Score:3, Interesting)

    by Robbat2 ( 148889 ) on Wednesday August 04, 2004 @08:53PM (#9885028) Homepage Journal
    First of all, you really should put out an RFP for your cluster.

    We've got a 128 node (1 cpu per node) cluster from Atipa http://www.atipa.com/ that cost CDN$ 0.25M.
    128 P4 Xeon, 1GB RAM, 120Gb IDE, Gigabit Ethernet.
    I'd expect you to get a lot more for your USD$ .75M, like maybe doubling your size and getting AMD64 nodes. Look at your primary problemset first, see if it's IO-intensive or CPU-intensive to figure out what you want in the way of disk/networking.

    The only thing I don't like about it is Atipa's configuration of Redhat8 (they didn't offer anything newer at the time). Look for something newer there.

    Atipa is one of the suppliers for SGI-branded clusters as well.

    I'd really like a cluster from http://adelielinux.com/en/, but I wasn't aware of them at the time we did our RFP and cluster purchase.
    • by Robbat2 ( 148889 )
      furthermore, make SURE you have sufficent physical space and airconditioning capacity for your new cluster.
      • Also make sure that the power feed to your computer room will handle the load required for running the machines in the cluster in addition to everthing else in there. I have a friend who ran into this with a beowulf cluster where he was working...

        t
  • Maybe I'm dumb but I always thought apollo was bought by HP not sun ;) [obsoleteco...museum.org]

    Peter.
    • I think he means: Apollo - the sun god
      as apparent from this his words:
      ... every computer manufacturer under the sun (including Apollo himself)...
  • ...f'ked. Like, seriously arse backwards.

    You got then grant *then* went shopping? Does all US academia work like this? Aren't you supposed to work out what you want to do, how to do it, how much and only then apply for the grant?

    Dave
    • Uhhhh...it looks like they did work out what they want to do (gravity wave research), and how to do it (with a 256 node beowulf cluster), then they got the grant. The only thing left is to find a vendor for their hardware. The guy writing this probably isn't the researcher who got the grant, he's an IT person who needs to help figure out who to buy it from.

      Since it can take over a year to get a grant in some cases, picking out the vendor before the grant arrives is usually stupid. By the time it arrives, t
  • cluster problem set (Score:2, Informative)

    by Raleel ( 30913 )
    I see you mentioned the problem set, which is good. to me and my only somewhat novice mind (I work with scientists all day, hear all kinds of stuff), this sounds suspiciously like a fine grained problem. that is to say, there will be a lot of interprocess communication, so don't skimp on the network. I'm not talking "get gigE". I'm talking "look at myrinet, or quadrix, or infiniband".

    Most people can do you up a 256 node cluster for under half a million, but doing up one with high speed and low latency netw
  • I'd probably just buy 10 blade enclosures with 14
    2-way Xeon blades each from ibm off the shelf.
    They have blades with dual gigabit nics. A Pair of
    3-Com 16-ways nics give you 2 parallel networks,
    which makes it flexible. Run OpenMOSIX.
    I'm pegging the whole shooting match at roughly
    $420k. Spend the rest on NAS, pack out the RAM,
    get a nice visualization wall, etc.
  • First off, it's disturbing that you got this grant. The NSF should be ashamed of themselves for giving that much cash to someone so clueless.

    Second: you're almost certainly going to have to put it out to bid. For example, at UIUC [uiuc.edu], the bid limit is $28,100. Anything over that *must* go to bid unless you can provide a really good reason why you have to "sole source" it.

    Now, you need to start thinking about stuff. First off, forget the number of nodes. You need to start by thinking about how they'll b

    • Come on. He asked the slashdot community for help because he needed it. If you think it's wasteful government spending, then write to your congressman. Don't put this guy down because he admits he would like some technical input by people who he knows have more experience than he does.

      You did give some useful information, but there was no need to start it off by calling him clueless.
  • ...but I need it to run Doom3 in "Ultra". Sorry.
  • by Parsec ( 1702 ) on Thursday August 05, 2004 @01:06AM (#9886213) Homepage Journal
    If you haven't already, google for beowulf clusters at other universities and contact those departments.
  • what you buy depends mostly on how much you want to spend on your interconnect, which in turn depends on your applications. You can spend >50% of your cash on the interconnect - but do you need to?
    • are your apps parameter study serial jobs? (interconnect doesn't matter much - just use gigE)
    • already written MPI apps? (few large messages? many small?)
    • OpenMP only? (you need large SMP nodes)
    • do they need large bandwith or low latency or both?

    Infiniband or gigabit ethernet are your main options. IB is low

  • by Apollo ( 15220 ) on Thursday August 05, 2004 @02:33AM (#9886482) Journal
    Hey, you -- yeah, you. Wanna buy a cluster? I know you'd like some UltraSPARC IVs. No? Come on. I've got great deals on last year's hardware, too. For the low, low price of $757,825, you, too, can own a piece of precision equipment from the Sun Fire line. OK, OK, fine! Go to that guy across the street. But make sure you come back here before you decide, because I've been authorized to toss in some incentives.

    You'll be back, believe me. You'll be back in no time.

  • ring.. (Score:3, Insightful)

    by ivano ( 584883 ) on Thursday August 05, 2004 @03:43AM (#9886767)
    ...Apple
    ..Dell
    ...IBM
    and *talk* to a sales rep. I know how hard this is (not!) but asking Slashdot is kinda silly. Sure you might want some impartial advice but /. might not be the right place :) Ring these people and decide for yourself (you're a smart man, no?). From the media Apple is getting for its "out-of-the-box" clusters I would seriously put them as an option.

    ..and good luck ! it sounds like a good project

    ciao

  • Warning ! (Score:3, Interesting)

    by dargaud ( 518470 ) <[ten.duagradg] [ta] [2todhsals]> on Thursday August 05, 2004 @03:44AM (#9886771) Homepage
    Commercial clusters, hah ! My university did exactly that and they've had only problems. There was specialised hardware in it. It was never well supported by the Linux they installed on it, which was impossible to upgrade or change according to the admin who kept loosing hair on it. In other words that system never worked properly.

    When my research group decided to build one [gdargaud.net], I was incharge, opted for OpenMosix [sourceforge.net] and after a tweaking period worked really well. Now with the various bootable CDs with OpenMosix (PlumpOS, BCCD, Quantian, ClusterKNOPPIX...), tests and upgrades are done by just pressing reset !

    Of course with clusters your mileage may vary.

  • I am serious.

    Building or buying a cluster is serious business. Talk to supercomputing experts. Issues involved are numerous. Just a short list:

    • what applications will this cluster run? Just the one you mentioned or will you be running ore than just that one?
    • Will you need a low-latency network (hint: you'll want one)? Will this be the current safe choice Myrinet or the up-and-coming Infiniband? This is again, application dependent.
    • Who will do the hardware support? Are you allowed to chainge disks and
  • I found mine on eBay... but make sure you check for bad RAM first (use Knoppix)
  • I would very much recommend this research site from one of my professors at the University of Kentucky. He has been doing work with cluster super computing for quite some time now and has managed to build some very impressive systems at low costs. Much lower costs than what your current grant is for. With a grant of that size using this professor's techniques you could build a whole bunch of clusters. I would suggest you taking a look at his group's research site aggregate.org [aggregate.org].

    You can also see one of
  • Chances are, your school has a hefty existing contract with one of the vendors bidding on your cluster. If you like that vendor, and they haven't fucked you over in the past, why not go with them?

    The are less likely to take advantage, since they want to continue doing business with you. Your existing relationship will give you a little leverage.
  • One option is to pay an experienced independent (ie. someone who doesn't sell their own hardware or have affiliation with a hardware vendor) person to make the decision for you. If it costs 5% of the purchase price and the person saves you from buying a lemon (or even saves you 6% on th system) hasn't that been a good investment?

    Perhaps consider using a team member from a free software clustering project as your consultant (check credential though)? That way you hopefully get someone who is an expert an

  • I would have the research group that I work with at the University of Kentucky build it. Maybe you should contact my professor, Dr. Hank Dietz.

    KAYS0 [aggregate.org]
    University Of Kentucky Supercomputer Breaks The $100 Per GFLOPS Barrier [aggregate.org]

    They built the supercomputer for under $40,000 with 128 nodes + 4 spare nodes, just think how many nodes and how powerful it could be with $700,000!

    • I've been keeping up with Dr. Dietz's work since Purdue. I really admire his work, and I even ran a small 2-node PAPERS cluster at home using his AFAPI library.

      PeTS [aggregate.org] may be applicable here, especially his research into Flat Neighborhood Networks (FNNs). However, I think that AMD/Intel sytems use too much power (70 watts or so each). A computationally-equivalent cluster of VIA EPIA motherboards (maybe 10 watts each) would be both physically smaller and much easier on the electric bill. At $100 each for a VI [axiontech.com]
  • Let me start off with a disclaimer: I do work for IBM however the following represents my opinions, not that of IBM.

    There's a reason that they say you never get fired for going with IBM. IBM has more super-computing experience than anyone. We've got an amazing turn-around capability when it comes to building clusters. But perhaps the best thing with going with IBM is the fact that it builds the relationship.

    IBM is very involved with universities especially in the areas of high performance computing. W
  • Do what Celera Genomics did for their equipment bids for human genome computing resources. Develop a benchmark test run representing sample code and data. Have each vendor run your benchmark in time trials.

    "People asked me why we chose Compaq," says Marshall Peterson, Celera's vice president of infrastructure technology. "The answer is simple. We took a benchmark and gave it to all the vendors. Only two could run it. One ran it in 87 hours.

    Compaq ran it in seven." Peterson didn't disclose the name of

  • by KangXii ( 785324 )
    Whoa, I've been thinking about going to Embry-Ribble, except the one in Daytona Beach, Florida. Maybe I should think about the Arizona one now.

"When the going gets tough, the tough get empirical." -- Jon Carroll

Working...