Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Supercomputing Hardware

Ask Slashdot: Parallel Cluster In a Box? 205

QuantumMist writes "I'm helping someone with accelerating an embarrassingly parallel application. What's the best way to spend $10K to $15K to receive the maximum number of simultaneous threads of execution? The focus is on threads of execution as memory requirements are decently low e.g. ~512MB in memory at any given time (maybe up to 2 to 3X that at the very high end). I've looked at the latest Tesla card, as well as the four Teslas in a box solutions, and am having trouble justifying the markup for what's essentially 'double precision FP being enabled, some heat improvements, and ECC which actually decreases available memory (I recognize ECC's advantages though).' Spending close to $11K for the four Teslas in a 1U setup seems to be the only solution at this time. I was thinking that GTX cards can be replaced for a fraction of the cost, so should I just stuff four or more of them in a box? Note, they don't have to pay the power/cooling bill. Amazon is too expensive for this level of performance, so can't go cloud via EC2. Any parallel architectures out there at this price point, even for $5K more? Any good manycore offerings that I've missed? e.g. somebody who can stuff a ton of ARM or other CPUs/GPUs in a server (cluster in a box)? It would be great if this could be easily addressed via a PCI or other standard interface. Should I just stuff four GTX cards in a server and replace them as they die from heat? Any creative solutions out there? Thanks for any thoughts!"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Parallel Cluster In a Box?

Comments Filter:
  • by zeldor ( 180716 ) on Saturday December 03, 2011 @01:42PM (#38250934)

    do you or them know how to program on a GPU?
    if its really embarrassingly parallel EC2 spot instances and the gnu program 'parallel' will work quite nicely.
    But if coding changes are required then the hardware is the least of your expenses.
     

  • Re:AMD (Score:5, Insightful)

    by tempest69 ( 572798 ) on Saturday December 03, 2011 @02:15PM (#38251172) Journal
    Because it's new, and finding someone who's done it to get some pointers is really hard.
    CUDA has been around a while, figuring it out isn't such a rough learning curve.

    Overall I'm a little suspicious of someone looking to use a GPU for more threads on a problem. As going the GPU route is a really committed step, and the programming gets a new level of complicated. Using multiple cards has some odd issues in CUDA, ie. If you exceed the card index it defaults to card-0, rather than crashing. There are more places to screw up with a GPU- transferring memory- getting blocks, threads, and weaves organized(if done properly it hides all sorts of latency in calculations, done poorly it's worse than a CPU)- avoiding memory contention (the memory scheme isn't bad, but it needs to be understood).

    So in most cases I'd first start with this chart http://www.cpubenchmark.net/cpu_value_available.html [cpubenchmark.net] and tell them to cut their teeth on a GPU with a smaller(cheaper) test case.
  • Re:AMD (Score:2, Insightful)

    by Anonymous Coward on Saturday December 03, 2011 @03:51PM (#38251990)

    Why not use AMD and OpenCL?

    Sure use two AMD 6990 with 3072 stream units each, for a total of 6144 ALUs per box (DP FPU) with OpenCL 1.1.
    Cost about $2500 per box! $700 per card plus $1000 for a CPU system with 1000W PSU.

  • Re:AMD (Score:2, Insightful)

    by Anonymous Coward on Saturday December 03, 2011 @06:20PM (#38252944)

    That's why you would use OpenCL instead. It's a bit newer, and is still a little rough around the edges, but it works on CPU's and GPU's, and in windows or 'nix.

Software production is assumed to be a line function, but it is run like a staff function. -- Paul Licker

Working...