Become a fan of Slashdot on Facebook


Forgot your password?
Supercomputing Hardware

Parallella: an Open Multi-Core CPU Architecture 103

First time accepted submitter thrae writes "Adapteva has just released the architecture and software reference manuals for their many-core Epiphany processors. Adapteva's goal is to bring massively parallel programming to the masses with a sub-$100 16-core system and a sub-$200 64-core system. The architecture has advantages over GPUs in terms of future scaling and ease of use. Adapteva is planning to make the products open source. Ars Technica has a nice overview of the project."
This discussion has been archived. No new comments can be posted.

Parallella: an Open Multi-Core CPU Architecture

Comments Filter:
  • Kickstarter (Score:4, Informative)

    by Trecares ( 416205 ) on Sunday October 07, 2012 @04:00PM (#41578375)

    I checked their front page and they have a kickstarter going to fund further development.

    Might want to check it out and chip in if you're interested. []

  • by viperidaenz ( 2515578 ) on Sunday October 07, 2012 @04:10PM (#41578439)
    and the architecture is also very limiting.

    16TFLOPS for $3000 or 0.09TFLOPS for $200. I'll stick to current hardware thanks. 178x more processing power for 15x more money. I would also prefer a "super computer" can address more than 4GB of RAM with more than 64bits of memory bandwidth. The architecture also limits the core cache to 64k.
  • Parallax Propeller (Score:5, Informative)

    by Y2K is bogus ( 7647 ) on Sunday October 07, 2012 @04:14PM (#41578455)

    The Parallax Propeller is a great multi-core chip to get started with. The chip is $7.95 and has 8 cores running at 80Mhz. You can pickup the Quickstart board at Radio Shack for $40, including an overpriced RS USB cable (they normally retail for $25).

    The Parallax Propeller is a much more economical way of getting started with multi-core programming. Parallax offers the PropTool, which provides SPIN and PASM language support. For C development you can get SimpleIDE which is a great IDE to get started with C programming on the Propeller, which uses a port of GCC.

  • by Anonymous Coward on Sunday October 07, 2012 @04:25PM (#41578521)

    They have released their SDK and architecture documentation, worth a read.
    Looks like an interesting platform, but the current performance indeed make me feel lacklusting ...

  • Re:Hmmm... (Score:5, Informative)

    by viperidaenz ( 2515578 ) on Sunday October 07, 2012 @04:32PM (#41578553)
    If you've got $100 to spare, a Radeon 7750 provides over 800GFLOPS. If you've got more money a 7970 will give you 4.3TFLOPS for $550.
    a GTX650 will give you 800GFLOPS for $100 and a GTX680 will give you 3TFLOPS for $500.
  • Re:Hmmm... (Score:2, Informative)

    by Anonymous Coward on Sunday October 07, 2012 @05:03PM (#41578739)

    Total on-chip, inter-core bandwidth is 64 GBytes/sec, with 8 GBytes/sec of off-chip bandwidth.

  • by IAmR007 ( 2539972 ) on Sunday October 07, 2012 @06:48PM (#41579389)
    I agree. 32 bit a PGAS memory model is silly. Giving each core its own 32 bit address space and using MPI for communication would be much more useful. Then, it could at least be a good learning tool for HPC programming techniques. Right now, it looks pretty useless.

    Even GPGPU is limited for what it can do for HPC. There's a lot more to HPC than raw mathematical power. Memory is often the bottleneck, not the FPUs. The reason we even deal with multiple processors is that the performance increase of single cores has nearly stalled, forcing the use of multiple processors. Communication between multiple cores/processors is a very complicated thing, as well, and getting good performance is a lot more complicated than hooking up a bunch of processors in a grid. For example, the supercomputer I work with has 90,112 2.3GHz cores and 90TB ram; 16 cores per chip in 704 blades, interconnected with a 3d torus network topology. It's the memory/cache size and speed and network topology that makes it a supercomputer. You could get the 800TFLOP/s in a much smaller package using GPUs, but the performance would be drastically less. Even with the 64 cores parallella could have, distributing the workload on a 64 core grid isn't easy. GPGPUs use work groups of smaller numbers of cores to make this sharing a bit more easy to manage. They should have at least made the interconnects a 2d torus rather than a grid, thereby reducing the maximum path length in half. In order to do stuff like quantum mechanics, a 5d torus is optimal. Memory access is the key. This is a bit like comparing apples to oranges, but that's exactly my point: the thing is not a supercomputer.
  • Re:Hmmm... (Score:3, Informative)

    by Anonymous Coward on Sunday October 07, 2012 @07:01PM (#41579523)
    As soon as you have branches in your GPU code, the performance drops like a brick. GPUs also only work well with sequential data. What it comes down to, is GPUs only do well with matrix math.
  • Re:Kickstarter (Score:4, Informative)

    by naeger ( 136710 ) on Sunday October 07, 2012 @07:28PM (#41579781)

    I really like the parallella project. Due to its low power consumption (2 watts for the 64-core version), it is the only option to bring significant processing power to mobile devices (e.g. mobile robots/quadrocopter/drones) and would be ideally suited to implement machine vision and neural network/machine learning algorithms for those mobile devices.

    That said, their kickstarter initiative has some serious flaws:

    1. They are only offering the 16-core version for a goal of $750k. The much more interesting 64-core version is available only if a whopping $3m goal is met. Way out of reach for such a specialized interest project. And everyone who reads information about the parallella reads about the "sexy" 64-core version everywhere but can only fund the "just nice" 16-core version. From the comments it is clear: everyone wants the 64-core version.
    2. There is only one interesting pledge: $99 for the 16-core version. No addons. No extras etc.
    3. The information from adapteva is lacking. Only today they made the documentation available. But still there are no demos and dozens of questions in the comments which are unaswered.

    Compare this to a greatly successful campaign like for example the Digispark (a low cost "mini-arduino"): a lower easily reachable goal, lots and lots of extras and addons developed together and in response to the backers and a constant information and communication with the backers. I wanted to spend $20 on this project but finally spent $70 because of all the addons and how responsive the team was to the backers. Digispark achieved more than 6000% of its initial goal!

    That said, what would I suggest for the Parallella kickstarter:

    1. Go for the 64-core version. Bring the goal from $3m down to say $1.5m by dropping the 16-core version (should save almos $1m) and some bank loan (if you can present >1000 backers who pay >$1.5m that should be no problem.
    2. Offer more than just a 64-core parallella for $199. Offer special version for a higher price. Offer a dual-64-core version (with two epiphanies on it). Offer a "compute cluster": a little laser cut box with a network, a power supply and slots for up to 8 parallellas. Offer those cluster equipped with 1-8 parallellas. Offer a "machine vision" parallella with a camera sensor attached to it .. and so on ....
    3. Be more open and communicating with the community. Answer all questions in the comments. Put up some polls what backers want. Provide demos/tutorials etc.

    Please don't take this personally. But i would really like to see this project succeed. .... and I want machine vision and a neural network brain for my quadrocopter (yep, world domination ... that's the plan!) ;)

  • Re:Hmmm... (Score:4, Informative)

    by viperidaenz ( 2515578 ) on Sunday October 07, 2012 @07:46PM (#41579931) [] OpenCL is supported by the PowerVR GPU's but it depends on the SoC vendor
  • by metatheism ( 1747884 ) on Sunday October 07, 2012 @10:19PM (#41580825)
    Have a look at The Register's article [] for some details.

    The Epiphany core has a mere 35 instructions – yup, that is RISC alright – and the current Epiphany-IV has a dual-issue core with 64 registers and delivers 50 gigaflops per watt. It has one arithmetic logic unit (ALU) and one floating point unit and a 32KB static RAM on the other side of those registers.

    Each core also has a router that has four ports that can be extended out to a 64x64 array of cores for a total of 4,096 cores. The currently shipping Epiphany-III chip is implemented in 65 nanometer processors and sports 16 cores, and the Epiphany-IV is implemented in 28 nanometer processes and offers 64 cores.

    The secret sauce in the Epiphany design is the memory architecture, which allows any core to access the SRAM of any other core on the die. This SRAM is mapped as a single address space across the cores, greatly simplifying memory management. Each core has a direct memory access (DMA) unit that can prefetch data from external flash memory.

    The initial design didn't even have main memory or external peripherals, if you can believe it, and used an LVDS I/O port with 8GB/sec of bandwidth to move data on and off the chip from processors. The 32-bit address space is broken into 4,096 1MB chunks, one potentially for each core that could in theory be crammed onto a single die if process shrinking continues.

"Well, it don't make the sun shine, but at least it don't deepen the shit." -- Straiter Empy, in _Riddley_Walker_ by Russell Hoban