Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Hardware

Ask Slashdot: Clusters On the Cheap? 264

First time accepted submitter serviscope_minor writes "A friend of mine has recently started a research group. As usual with these things, she is on a shoestring budget and has computational demands. The computational task is very parallel (but implementing it on GPUs is an open research problem and not the topic of research), and very CPU bound. Can slashdotters advise on a practical way of getting really high bang for buck? The budget is about £4000 (excluding VAT/sales tax), though it is likely that the system will be expanded later. The computers will probably end up running a boring Linux distro and Sun GridEngine to manage batch processing (with home directories shared over NFS)."
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Clusters On the Cheap?

Comments Filter:
  • by Anonymous Coward on Thursday September 15, 2011 @12:44AM (#37406580)

    Why waste money on building a cluster when you can rent the best in the world * by the hour * ?

  • by jpedlow ( 1154099 ) on Thursday September 15, 2011 @12:48AM (#37406606)
    AWS EC2 was my response aswell. :)

    for raw horsepower on the short - medium term, use AWS http://aws.amazon.com/ec2/ [amazon.com]

    ec2 should do well for this, imho :)

  • by Goonie ( 8651 ) <robert DOT merkel AT benambra DOT org> on Thursday September 15, 2011 @12:55AM (#37406638) Homepage
    Many universities/consortia have supercomputers available on which researchers can apply for (or buy) time. For example, my university is a member of VPAC [vpac.org], which has a big-arse cluster shared between a number of institutions. She might get much better bang for buck if she uses the money for that, rather than splashing out for dedicated hardware.
  • by Chalex ( 71702 ) on Thursday September 15, 2011 @01:04AM (#37406702) Homepage

    You can get a SuperMicro reseller to sell you one workstation with 4 sockets of CPUs and a bunch of RAM. UK£ 4000 = 6 299.2 U.S. dollars

    That buys you a box with 4 x Opteron 6134 (32 cores) and 128GB RAM (32 x 4GB sticks). And some hard disks.

  • by subreality ( 157447 ) on Thursday September 15, 2011 @01:05AM (#37406710)

    +1. It is very nice to be able to spin up 50 instances, run the hell out of your job, then delete them. It gets done faster, and you don't have to deal with maintenance, upgrades, and obsolescence. Realized you need more RAM? Just adjust it! And so on. It'll likely come out cheaper than owning your own after you add up all the hidden costs (power, cooling, space, time, etc).

    The only downside is there are no GPUs. But that's not really a downside: if you do end up developing a GPU version, your cluster configuration would completely change (1x2 cores per box, 3-4U boxes with many PCI-E slots, instead of 2x8 cores or however many you can economically cram into a 1-2U pizza box), so the investment you'd make now would be completely wrong for that future development. With cloud servers you minimize sunk costs.

    I use Rackspace Cloud [rackspace.com] and it performs as promised. It's definitely worth a look.

  • Theoretical analysis (Score:4, Informative)

    by Warlord88 ( 1065794 ) on Thursday September 15, 2011 @01:13AM (#37406742)
    OP hasn't mentioned a lot except budget. Since you are on such a tight budget, I would highly recommend doing some theoretical analysis first. Do you have a serial code? How much parallelism exists in the code? You say the task is 'very parallel', but Amdahl's law [wikipedia.org] (which is really common sense) will tell you that even for small amounts of serial sections of code, your speedup will be limited. You should also consider the amount of time the code actually runs. Achieving a speedup of 2 for a serial code that runs for one minute is near worthless.

    After you estimate speedup, do some rough calculations on the basis of average cost of a processor and the the number of processors required. This should give you an estimate of the hardware cost required. Compare that with the cost of CPU cycles per dollar you get using a cloud service such as Amazon.
  • Amazon AWS. (Score:5, Informative)

    by Haven ( 34895 ) on Thursday September 15, 2011 @01:13AM (#37406744) Homepage Journal

    $1.60 / hour for the largest non-GPU cluster instance. This also provides you with rather fast interconnects and scalability with multiple instances.

    Only £4,000 in hardware would be a waste of money. You wouldn't have all that much computing power, and it would be obsolete immediately.

  • by Anonymous Coward on Thursday September 15, 2011 @01:42AM (#37406842)

    parent is correct.

    or for some more get the the 4x6168 (48 cores for about 770$ each ~ 3000, MB is around 800$, supermicro cabinet - 800$ OR use a chenbro fileserver for 300$) and 8GB sticks are pretty cheap these days and almost proportional in price compared to the 4GB - get the KVR1333D3Q8R9S/8G for about 90$ or about 90*8 = 720$ for 64GB. Around 5000 in total, though there are some additional costs like coolers etc but still wont break budget.

    i got a similar config for doing something that is cpu intensive (branches, fp and mem intensive) and not easily portable to gpus.

    the resultant machine hardly cracks 600w, and is 4 times as fast as an i7 960 on my application.

  • by toruonu ( 1696670 ) on Thursday September 15, 2011 @04:33AM (#37407416)

    Yes, my recommendation would be also, we do loads of LHC data analysis and simulations and have found that for real science real cores outweigh hyberthreaded ones so we run Opteron 6172 x2 in supermicro chassis that fits 4 servers into 2RU. The cost of such a box of course is ca 11keur, but it gives 96 cores 192GB ram. Now she can get for half the money that she has about half of that so 48 cores 96GB ram should be doable using SM boxes and you can scale up/down with CPU frequency to adjust the cost and maybe adjust total RAM alongside to fit in the budget. If she plans to later expand she may actually want to spend the money to get the 2U chassis with only 2 of the 4 machines present and later add one/two more by just buying the board with cpu/ram.

  • Re:trade-off (Score:4, Informative)

    by pz ( 113803 ) on Thursday September 15, 2011 @07:53AM (#37408102) Journal

    Don't forget to add up power, cooling, sysadmin time...

    If the friend's research group is in an academic institution, power and cooling are outside of the acquisition budget, along with space, network, etc., as those are typically part of overhead. Depending on the institution, sysadmin services are too. Often the institution will even have embarrassingly large discounts with hardware and software vendors (at my institution, a licensed copy of Matlab, for example, is about $100 per seat per year).

    GBP 4000 buys a rackfull of modern computers that can be run as long as you want. It can be used to explore ideas without concern for cost. In contrast, once the GBP 4000 has been paid to a cloud service, the money is gone. Given that the pressures for a new researcher are already immense (and I speak from recent first-hand experience) not worrying about running out of compute resources, even if it means the instantaneously available compute power is somewhat lower than what you could get from a cloud service.

    If this new research group is going to be competing for research funds, for example, then the compute resource is going to be highly utilized for the first 12-18 months to get preliminary results in order to write grants. I can't imagine that GBP 4000 is going to last long enough. Looking at Rackspace, as another poster suggested, they charge about USD 350 per decent configuration (8GB RAM / 320 GB disk) per month. That single server is going to last 18 months before the money is gone. If the memory demands of the computation aren't so large, then the charges are lower, say USD 45 per month (1GB RAM / 40 GB disk), then you get to use 7 virtual machines for the same 18 months.

    Given that a highly capable system can be purchased new for USD 500, the same money gives the researcher about a dozen real machines for 18 months, and beyond (buying off-lease machines can easily double the amount of hardware). From my perspective as a researcher, there's no comparison: when money is tight, buy your own hardware and take advantage of the services provided by your institution.

  • Re:EC2 is expensive (Score:5, Informative)

    by TheRaven64 ( 641858 ) on Thursday September 15, 2011 @07:53AM (#37408104) Journal

    The problem is the constraints. The cheap cluster in my old department cost £100k. £4k does not buy you a lot of hardware. You will probably find a lot more lying around in the undergrad labs. For some of my work as a PhD student, that's exactly what I used - each lab had 40 machines on a GigE network and closed overnight, and for work that wasn't that latency sensitive, I could distribute it across the machines there and run it at night without anyone minding.

    If you're serious about needing a cluster, then you need to spend a lot more than £4K. If you only need a cluster for a short time, then £4K can buy you a chunk of time on someone else's hardware. Since this is the UK, they should contact the Manchester Supercomputing Centre, which provides this kind of service to UK universities at quite a reasonable price (and will also lend you people who are good at optimising code for their systems). If the university doesn't already have some clusters lying around, then you should get in contact with a few other research groups. £4K won't go very far, but if half a dozen research groups each put in £4K then that gives you enough for a reasonable cluster to share between the various users.

  • Re:trade-off (Score:4, Informative)

    by cas2000 ( 148703 ) on Thursday September 15, 2011 @05:47PM (#37414664)

    or use a 16GB or 32GB USB flash (or better yet, a small SSD - swapping to USB flash would suck) as the boot drive on most machines and have one machine (the head node) with hard disks as a file server - NFS will do for small to medium size clusters (anywhere from a handful of nodes to a few hundred nodes). The OP is going to need a head node anyway to run Slurm or Torque as the scheduler/resource-manager (yes, i have built clusters before).

    put a 2nd NIC in the head node, so the compute nodes can run on a private 192.168 network (you'll need a 24 or 48 port switch as well), and also install DHCP, tftp, and apache. Set up the last three to allow the compute nodes to netboot clonezillla....install everything you'll need on one compute node (openmpi, libatlas, octave, R, open source and proprietary scientific software as needed, etc) and use clonezilla to mass produce the rest (also allows you to quickly and easily add new nodes or replace failed nodes). LDAP or NIS will be needed for sharing account/auth details between machines.

    i built something quite similar to this last year (but using some sunfire 1RU opteron rackmount servers as the compute nodes)

    I'd go for an x4 CPU, they're not that much more than an x3 and the extra core is useful. 8GB RAM too, 2x4GB only costs about $40). given the budget, it's probably not worth getting a custom power supply for the tray-mounted motherboards, so each will need its own dedicated PSU

    each node is going to cost somewhere around $250 (very rough estimates: $50 for the m/b, $40 for 8GB RAM, $50 CPU, $50 PSU, $60 for 32GB SSD - but possibly a fair bit cheaper as a bulk purchase), and the head node will cost roughly triple that (you'll need a case w/ hot-swap bays for the drives - a Norco 4224 is probably overkill but at well under $400 for 4RU with 24 SAS/SATA hot-swap bays, it would be hard to find a significantly cheaper case even with less drive bays) so for $6K you can build a cluster with 20 x 4 core compute nodes plus a good head node for the scheduler & file server). 80 compute cores for $6K. that's good, even considering that with cheap crap motherboards you'll have a noticable failure rate. the cluster i built last year with name brand hardware cost closer to $50K. I could build a better system today (far less nodes with a lot more cores and RAM each), also with name brand hardware, for about $20K - $30K

    trays for the motherboards, the rack(s), and cooling will cost extra. as will licenses for any proprietary software they might need to run (could easily cost as much - or more! - as the hardware). if the OP's friend is at a university, she can probably scavenge an old rack or two from another dept, but even if she has to buy one new she could easily build 15+ compute nodes entirely within the $6K budget

After an instrument has been assembled, extra components will be found on the bench.

Working...