QuantumMist writes: "I'm helping someone with accelerating an embarrassingly parallel application. What's the best way to spend $10K to $15K to receive the maximum number of simultaneous threads of execution? The focus is on threads of execution as memory requirements are decently low e.g. ~512mb in memory at any given time (maybe up to 2 to 3X that at the very high end). I've looked at the latest Tesla card, as well as the 4 Teslas in a box solutions, and am having trouble justifying the markup for what's essentially "double precision FP being enabled, some heat improvements, and ECC which actually decreases available memory (I recognize ECC's advantages though)." Spending close to $11K for the 4 Teslas in a 1u setup seems to be the only solution at this time? I was thinking that GTX cards can be replaced for a fraction of the cost, so should I just stuff 4 or more of them in a box? Note, they don't have to pay the power/cooling bill. Amazon is too expensive for this level of performance, so can't go cloud via EC2. Any parallel architectures out there at this price point, even for 5K more? Any good manycore offerings that I've missed e.g. somebody who can stuff a ton of ARM or other CPUs/GPUs in a server (cluster in a box)? It would be great if this could be easily addressed via a PCI or other standard interface. Should I just stuff 4 GTX cards in a server and replace them as they die from heat? Any creative solutions out there? Thanks for any thoughts!"