Please create an account to participate in the Slashdot moderation system


Forgot your password?
Hardware Technology

Researchers Unveil Experimental 36-Core Chip 143

rtoz writes The more cores — or processing units — a computer chip has, the bigger the problem of communication between cores becomes. For years, Li-Shiuan Peh, the Singapore Research Professor of Electrical Engineering and Computer Science at MIT, has argued that the massively multicore chips of the future will need to resemble little Internets, where each core has an associated router, and data travels between cores in packets of fixed size. This week, at the International Symposium on Computer Architecture, Peh's group unveiled a 36-core chip that features just such a "network-on-chip." In addition to implementing many of the group's earlier ideas, it also solves one of the problems that has bedeviled previous attempts to design networks-on-chip: maintaining cache coherence, or ensuring that cores' locally stored copies of globally accessible data remain up to date.
This discussion has been archived. No new comments can be posted.

Researchers Unveil Experimental 36-Core Chip

Comments Filter:
  • by SirDrinksAlot ( 226001 ) on Monday June 23, 2014 @09:09AM (#47297289) Journal

    So what's special about this chip that Intel's Xeon Phi (first demonstrated in 2007 as Knights Landing with 80 or so cores) isn't already doing? Or is this just a rehash of 7 year old technology that's already in production? It sounds like a copy/paste of Intel's research.

    "Intel's research chip has 80 cores, or "tiles," Rattner said. Each tile has a computing element and a router, allowing it to crunch data individually and transport that data to neighboring tiles." - Feb 11, 2007

  • by Trepidity ( 597 ) <> on Monday June 23, 2014 @09:24AM (#47297359)

    Yes, as usual, the MIT press release oversells the research, while the original paper [pdf] [] is a bit more careful in its claims. The paper makes clear that the novel contribution isn't the idea of putting "little internets" (as the press release calls them) on a chip, but acknowledges that there is already a lot of research in the area of on-chip routing between cores. The paper's contribution is to propose a new cache coherence scheme which they claim has scalability advantages over existing schemes.

  • Interesting (Score:4, Informative)

    by Virtucon ( 127420 ) on Monday June 23, 2014 @09:31AM (#47297395)

    Cache coherency has been one of the banes of multicore architecture for years. It's nice to see a different approach but chip manufacturers are already getting high performance results without introducing additional complexity. The Oracle (Sun) Sparc T5 [] architecture has 16 cores with 128 threads running at 3.6Ghz. It gives a few more years to Solaris at least but it's still a hell of a processor. For you Intel fans the E7-2790 v2 [] sports 15 cores with 30 threads with a 37.5MB cache so they're doing something right because it screams and is capable of 85GB/s memory throughput.

    I'm sure the chip architects are looking at this research but somehow I think they're already ahead of the curve because these kinds of cores/threads are jumps ahead of where we were just a few years ago. Anybody remember the first Pentium Dual Core [] and The UltraSparc T1 []?

  • by Trepidity ( 597 ) <> on Monday June 23, 2014 @09:39AM (#47297429)

    The basic idea isn't new. What the paper is really claiming is new is their particular cache coherence scheme, which (to quote from the Conclusion) "supports global ordering of requests on a mesh network by decoupling the message delivery from the ordering", making it "able to address key coherence scalability concerns".

    How novel and useful that is I don't know, because it's really a more specialist contribution than the headline claims, to be evaluated by people who are experts in multicore cache coherence schemes.

  • by TheRaven64 ( 641858 ) on Monday June 23, 2014 @01:00PM (#47298847) Journal

    The core count isn't the interesting thing about this chip. The cores themselves are pretty boring off-the-shelf parts too. I was at the ISCA presentation about this last week and it's actually pretty interesting. I'd recommend reading the paper (linked to from the press release) rather than the press release, because the press release is up to MIT's press department's usual standards (i.e. completely content-free and focussing on totally the wrong thing). The cool stuff is in the interconnect, which uses the bounded latency of the longest path multiplied by single-cycle one-hop delivery times to define an ordering, allowing you to implement a sequentially consistent view of memory relatively cheaply.

    Since I'm here, I'll also throw out a plug for the work we presented at ISCA, The CHERI capability model: Revisiting RISC in an age of risk []. We've now open sourced (as a code dump, public VCS coming soon) our (64-bit) MIPS softcore, which is the basis for the experimentation in CHERI. It boots FreeBSD and there are a few sitting around the place that we can ssh into and run. This is pretty nice for experimentation, because it takes about 2 hours to produce and boot a new revision of the CPU.

  • by enriquevagu ( 1026480 ) on Monday June 23, 2014 @02:14PM (#47299371)

    Some knowledge about multicore cache coherence here. You are completely right, Slashdot's summary does not introduce any novel idea. In fact, a cache-coherent mesh-based multicore system with one router associated to each core was presented on the market years ago by a startup from MIT, Tilera []. Also, the article claims that today's cores are connected by a single shared bus -- that's far outdated, since most processors today employ some form of switched communication (an arbitrated ring, a single crossbar, a mesh of routers, etc).

    What the actual ISCA paper [] presents is a novel mechanism to guarantee total ordering on a distributed network. Essentially, when your network is distributed (i.e., not a single shared bus, basically most current on-chip network) there are several problems with guaranteeing ordering: i) it is really hard to provide a global ordering of messages (like a bus) without making all messages cross a single centralized point which becomes a bottleneck, and ii) if you employ adaptive routing, it is impossible to provide point-to-point ordering of messages.

    Coherence messages are divided in different classes in order to prevent deadlock. Depending on the coherence protocol implementation, messages of certain classes need to be delivered in order between the same pair of endpoints, and for this, some of the virtual networks can require static routing (e.g. Dimension-Ordered Routing in a mesh). Note a "virtual network" is a subset of the network resources which is used by the different classes of coherence messages to prevent deadlock. This is a remedy for the second problem. However, a network that provided global ordering would allow for potentially huge simplifications of the coherence mechanisms, since many races would disappear (the devil is in the details), and a snoopy mechanism would be possible -- as they implement. Additionally, this might also impact the consistency model. In fact, their model implements sequential consistency, which is the most restrictive -- yet simple to reason about -- consistency model.

    Disclaimer: I am not affiliated with their research group, and in fact, I have not read the paper in detail.

"My sense of purpose is gone! I have no idea who I AM!" "Oh, my God... You've.. You've turned him into a DEMOCRAT!" -- Doonesbury