Parallella: an Open Multi-Core CPU Architecture 103
First time accepted submitter thrae writes "Adapteva has just released the architecture and software reference manuals for their many-core Epiphany processors. Adapteva's goal is to bring massively parallel programming to the masses with a sub-$100 16-core system and a sub-$200 64-core system. The architecture has advantages over GPUs in terms of future scaling and ease of use. Adapteva is planning to make the products open source. Ars Technica has a nice overview of the project."
Re:Hmmm... (Score:2, Insightful)
A big problem here is that classical GPU:s only have two kinds of I/O ports: Video output, and PCI Express. Neither is very good for an embedded application, unless you have a big power budget and also have a board with an x86 processor. (Unfortunately you need x86 since you need binary drivers for your GPU to get good GPGPU performance...)
Cheap or High Performance, PickOne (Score:3, Insightful)
Adapteva is creating false expectations here. Their chip won't deliver performance on par with GPUs (or CPUs, for that matter) and still be cheap. Why? Because it's not a thing that a startup can to in todays world of computing. For such a chip you need to use the latest CMOS processes and a huge team to design/optimize the ASIC (especially if it's meant to be a low power chip) -- both of which are extremely costly. If it was that easy, then we'd see more competition and not Intel, AMD, Nvidia and IBM as the only global players in the HPC arena.
If you're a small startup, then you'll be bound to 100nm processes (at best), and have to use automated layouts (not the hand-optimized ones e.g. Intel uses). Both reduce performance, increase power intake.
I work at the Chair for Computer Architecture at FAU. We have some of very brightest minds working at custom chips for industry solutions. This 2D CPU matrix that Adapteva proposes is something that my colleagues have played with years ago. It's a good approach and I personally believe that this will be the shape of CPUs to come. It started with the ring bus on the IBM Cell, now Intel's Nehalem has got an partitioned L3 cache connected with a... ring bus and Intel's Xeon Phi (MIC) even got a 2D on-chip grid network. But even my colleagues concede that a) on FPGAs you'll always be trailing GPUs concerning floating point performance (it's something FPGAs are particularly bad at) and b) even when designing an ASIC you'll always be beat by GPUs in terms of performance, assuming similar prices and power consumption. Those are simply beasts, optimized down to the bone. It's the result of a multi-billion mass market. That's also the reason why there is no next IBM Cell chip for a PlayStation 4: Cell was too expensive to develop to keep up with the competition. Its market is too small compared to the ubiquitous GPUs.
For teaching parallel computing I'd always suggest a GPU. The tools are there, the performance is great and you'll be able to use the knowledge gained in real-world projects.