Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
The Internet Hardware Technology

Multicore Chips As 'Mini-Internets' 132

An anonymous reader writes "Today, a typical chip might have six or eight cores, all communicating with each other over a single bundle of wires, called a bus. With a bus, only one pair of cores can talk at a time, which would be a serious limitation in chips with hundreds or even thousands of cores. Researchers at MIT say cores should instead communicate the same way computers hooked to the Internet do: by bundling the information they transmit into 'packets.' Each core would have its own router, which could send a packet down any of several paths, depending on the condition of the network as a whole."
This discussion has been archived. No new comments can be posted.

Multicore Chips As 'Mini-Internets'

Comments Filter:
  • by Anonymous Coward on Tuesday April 10, 2012 @10:34PM (#39640323)

    This technology that networks different cores can also serve another purpose, to prevent damage from core failure, and diagnose such failures. If the cores are connected to other cores, the same data can be processed by bypassing a damaged core, making over heating or manufacturing problems important, but almost treatable. Who knows, cores might even get replaceable.

    • What are the chances you damage the chip without damaging enough of it to be rendered inoperable?
      • by Osgeld ( 1900440 ) on Tuesday April 10, 2012 @11:19PM (#39640611)

        pretty good, few years ago I ran for months on a dual core with one blown out, worked fine until I fired up something that used both, then it would die.

        • by Electricity Likes Me ( 1098643 ) on Tuesday April 10, 2012 @11:31PM (#39640681)

          Also this is exactly what chip makers already do to a great extent: the binning of CPUs by speeds is not a targeted process. You make a bunch of a chips, test them, and then sell them as whatever clock speed they are robustly stable at.

          • by Osgeld ( 1900440 )

            yep, its also why overclocking is popular/popular, robustly stable, and stable are 2 different things depending on where they end up at and testing tolerances. That 2.5Ghz chip may run at 2.7Ghz just fine and dandy, but out of spec with regards to voltage or temperature, even by a little.

            you dont want dell refusing a gigantic pile of chips cause a few bad products, causing a quality alert, which is very costly and time consuming to both parties

          • by Joce640k ( 829181 ) on Wednesday April 11, 2012 @02:30AM (#39641443) Homepage

            Also this is exactly what chip makers already do to a great extent: the binning of CPUs by speeds is not a targeted process. You make a bunch of a chips, test them, and then sell them as whatever clock speed they are robustly stable at.

            Nope. The markings on a chip do NOT necessarily indicate what the chip is capable of.

            Chips are sorted by ability, yes, but many are deliberately downgraded to fill incoming orders for less powerful chips. Bits of them are disabled/underclocked even though they passed all stability tests simply because that's what the days incoming orders were for.

            • by TheLink ( 130905 )
              Also depends on how competitive the market is. Currently AMD isn't a strong competitor so Intel can do stuff like release software upgradeable CPUs. So no surprise if many recent Intel CPUs can be overclocked significantly. Seems like we're back in the days of 50% overclock (anyone remember the Celeron 300A?). Even Intel is officially selling overclockable CPUs.
      • by morgauxo ( 974071 ) on Wednesday April 11, 2012 @08:05AM (#39642903)
        Years ago I had a single core chip with a damaged FPU. It took me forever to figure out the problem, my computer could only run Gentoo. Windows and Debian, both which it had ran previously gave me all sorts of weird errors I had never seen before. I had to keep using it because I was in college and didn't have money for another one so I just got used to Gentoo. Even in Gentoo anything which wasn't compiled from scratch was likely to crash in weird ways. (a clue) I finally diagnosed the problem a couple years later when a family member gave me a disk that boots up and runs all sorts of tests on the hardware. It turned out Gentoo worked because when software compiled it recognized the lack of an FPU and compiled in floating point emulation like it was dealing with an old 486sx chip.

        So, anyway, if that can happen I would imagine damaging a single core of a multicore chip is quite possible.
    • by AdamHaun ( 43173 ) on Tuesday April 10, 2012 @11:23PM (#39640643) Journal

      This sort of technology already exists to an extent. TI's Hercules TMS570 [ti.com] microcontrollers have two CPUs that run in lockstep along with a bus comparison module. I think total fail-tolerance might take three CPUs, but this provides strong hardware fault detection in addition to the usual ECC and other monitoring/correction stuff.

      Note that run-time fault tolerance is mostly needed for safety-critical systems. The customers who buy these products do not do so to get better yield, they do so to guarantee that their airbags, anti-lock brakes, or medical devices won't kill anyone. As such, manufacturing quality is very high. Also, die size is significantly larger than comparable general market (non-safety) devices. This means they cost a small fortune. The PC equivalent would be MLC vs. SLC SSDs. Consumer products usually don't waste money on that kind of reliability unless they need it. Now a super-expensive server CPU, maybe...

      [Disclaimer: I am a TI employee, but this is not an official advertisement for TI. Do not use any product in safety-critical systems without contacting the manufacturer, or at least a good lawyer. I am not responsible for damage to humans, machinery, or small woodland creatures that may result from improper use of TI products.]

      • by Joce640k ( 829181 ) on Wednesday April 11, 2012 @02:51AM (#39641483) Homepage

        This sort of technology already exists to an extent. TI's Hercules TMS570 [ti.com] microcontrollers have two CPUs that run in lockstep along with a bus comparison module. I think total fail-tolerance might take three CPUs....

        This is just to detect when an individual CPU has failed. To build a fault-tolerant system you need multiple CPUs.

        nb. The 'three CPUs' thing isn't done for detection of hardware faults it's for software faults. The idea is to get three different programmers to write three different programs with a specified output. You then compare the outputs of the programs and if one is different it's likely to be a bug.

        • nb. The 'three CPUs' thing isn't done for detection of hardware faults it's for software faults.

          ...although it will detect non-catastrophic hardware faults as well, obviously.

        • by Thiez ( 1281866 )

          > nb. The 'three CPUs' thing isn't done for detection of hardware faults it's for software faults. The idea is to get three different programmers to write three different programs with a specified output. You then compare the outputs of the programs and if one is different it's likely to be a bug.

          Why would you need three CPUs when you can just have three threads that run on any number of CPUs?

        • nb. The 'three CPUs' thing isn't done for detection of hardware faults it's for software faults. The idea is to get three different programmers to write three different programs with a specified output. You then compare the outputs of the programs and if one is different it's likely to be a bug.

          Yes it is. Specifically, you need three to not only detect that one is misbehaving, but also to determine which is more likely to misbehave. This is if you can trust you comparison node. If you cannot, then in general you need at a minimum of 3n+1 nodes to detect 'n' nodes misbehaving given a Byzantine failure formulation. (That's why the Space Shuttle had 4 primary flight control computers all running the same software. And a fifth one that didn't, but that was different.) Many systems, e.g. in telecoms, s

  • This would work perfectly with a series of (very small) tubes.
  • way back machine (Score:5, Insightful)

    by Anonymous Coward on Tuesday April 10, 2012 @10:36PM (#39640341)

    I guess MIT has forgotten about the Transputer....

  • by GumphMaster ( 772693 ) on Tuesday April 10, 2012 @10:37PM (#39640353)

    I started reading an immediately had flashbacks to the Transputer [wikipedia.org]

    • by tibit ( 1762298 ) on Tuesday April 10, 2012 @10:46PM (#39640403)

      Alive and well as XMOS [xmos.com] products. I love those chips.

      • Yep, thought of XMOS immediately when I saw the title. 16 quad-core CPUs linked together in a 4D hypercube: https://www.xmos.com/products/development-kits/xmp-64 [xmos.com]
        • by tibit ( 1762298 )

          I don't see immediate use for the hypercube, but the individual 1, 2 and 4 core chips are phenomenal for implementing realtime ethernet devices, such as IEEE-1588 switches, realtime industrial ethernet protocols, etc. It's not hard to make a very low latency timestamping switch using one of these. The hardware assisted serialization, deserialization, and time-triggered sampling and update of ports lets you be quite creative because it decouples timing of the I/O with timing of the software. There are many a

          • Minor correction: It's 64KB of memory per core. There's also software libraries for interfacing with an external SRAM chip, but you need to use something like two 16-bit ports (or a 16 and an 8 for lower capacity chips) and a few 1-bit ports.
            • by tibit ( 1762298 )

              Good catch, I forgot to say I meant it per core (a core has up to 8 threads running on it). The "libraries" for SRAM are an overstatement, you need a dozen or two lines of XC for async sram, and maybe 2-3x that for synchronous one, even if you want it running in a separate thread and communicating via a channel with other threads. It's a good tutorial exercise, if one needs a tutorial that is.

              You're free to use a 4-bit port for SRAM control, of course, and it'll be sufficient for async SRAM. For sync SRAM y

    • Or, more recently, Intel's many-core prototypes used this. At the very least, the "Single-Chip Cloud Computer" used a mesh network, and I think Larrabee had such a thing as well...

    • by jd ( 1658 ) <imipakNO@SPAMyahoo.com> on Tuesday April 10, 2012 @11:46PM (#39640771) Homepage Journal

      The Transputer was a brilliant design. Intel came up with a next-gen variant, called the iWarp, but never did anything with it and eventually abandoned the concept.

      IIRC, each Transputer had four serial lines where each could be in transmit or receive mode. They each had their own memory management (16K on-board, extendable up to 4 gigs - it was a true 32-bit architecture) so there was never any memory contention. Arrays of thousands of Transputers, arranged in a Hypercube topology, were developed and could out-perform the Cray X-MP at a fraction of the cost.

      Having a similar communications system in modern CPUs would certainly be doable. It would have the major benefit over a bus in that it's a local communications channel so you always have maximum bandwidth. Having said that, a switched network would have fewer interconnects and be simpler to construct and scale since the switching logic is isolated and not part of the core. You can also multicast and anycast on a switched network - technically doable on the Transputer but not trivial. Multicasting is excellent for MISD-type problems (multi-instruction, single-data) since you can have the instructions in the L1 cache and then just deliver the data in a single burst to all applicable cores.

      (Interestingly, although PVM and MPI support collective operations of this kind, they're usually done as for loops, which - by definition - means your network latency goes up with the number of processes you send to. Since collective operations usually end in a barrier, even the process you first send to has this extra latency built into it.)

      It's also arguable that it would be better if the networking in the CPU was compatible with the networking on the main bus since this would mean core-to-core communications across SMP would not require any translation or any extra complexities in the support chips. It would also mean CPU-to-GPU communications would be greatly simplified.

    • Re: (Score:3, Interesting)

      by 91degrees ( 207121 )
      My Computer Architecture lecturer at University was David May - lead architect for the Transputer. Our architecture notes consisted of a treatise on transputer design.

      Now multi-processor is becoming standard, it's interesting to see the the same problems being rediscovered, and often the same solutions reinvented. Their next problem will be contention between two cores that happen to be running processes that require a lot of communication. Inmos had a simple solution to this one as well.

      Rather a
      • The reason they weren't a huge success was because nobody had found a need for them yet

        It was more the fact that processors at the time kept getting faster. The number of transistors doubled every 12-18 months, and this translated to at least a doubling in performance. As with other massively parallel systems, you needed to rewrite your software to take advantage of it, while you could just wait a year and your single-threaded system got faster. This is why multicore is suddenly interesting: chip designers have run out of obvious (and even not-so-obvious) things to do with extra transistor

  • by keekerdc ( 2504208 ) on Tuesday April 10, 2012 @10:38PM (#39640359)
    Ah, you're clever; but it's internets all the way down.
    • It is lolcats all the way down, in a pool of porn with an essence of "Me too" posts.

      Anyway, I think the original poster needs to read up on what the Internet is. It is a network of networks. A number of CPU's networking together is just a network. If you could mix many different systems together, it would be an Internet.

      If you could put a Intel cpu next to an AMD and they would just work together seamlessly, THAT would be an Internet.

      • I think the original poster needs to read up on what the Internet is

        i think armchair experts are really just wankers with big hats

  • Say what? (Score:2, Insightful)

    by Anonymous Coward

    Errr... the internal "bus" between cores on modern x86 chips already is either a ring of point to point links or a star with a massive crossbar at the center.

    • Re:Say what? (Score:5, Interesting)

      by hamjudo ( 64140 ) on Tuesday April 10, 2012 @10:51PM (#39640437) Homepage Journal

      Errr... the internal "bus" between cores on modern x86 chips already is either a ring of point to point links or a star with a massive crossbar at the center.

      The researchers can't be this far removed from the state of the art, so I am hoping that it is just a really badly written article. I hope they are comparing their newer research chips with their own previous generation of research chips. Intel and AMD aren't handing out their current chip designs to the universities, so many things have to be re-invented.

      • Re:Say what? (Score:4, Insightful)

        by TheRaven64 ( 641858 ) on Wednesday April 11, 2012 @06:06AM (#39642163) Journal

        The researchers can't be this far removed from the state of the art

        They aren't. The way this works is a conversation something like this:

        MIT PR: We want to write about your research, what do you do?
        Researcher: We're looking at highly scalable interconnects for future manycore systems.
        MIT PR: Interconnects? Like wires?
        Researcher: No, the way in which the cores on a chip communicate.
        MIT PR: So how does that work?
        Researcher: {long explanation}
        MIT PR: {blank expression}
        Researcher: You know how the Internet works? With packet switching?
        MIT PR: I guess...
        Researcher: Well, kind-of like that.
        MIT PR: Our researchers are putting the Internet in a CPU!!1!111eleventyone

    • What AC said. It's the one and only comment on this story you need to read.

  • by ArchieBunker ( 132337 ) on Tuesday April 10, 2012 @10:46PM (#39640409)

    ccNUMA?

    • Re:Sounds like... (Score:5, Interesting)

      by jd ( 1658 ) <imipakNO@SPAMyahoo.com> on Wednesday April 11, 2012 @12:02AM (#39640871) Homepage Journal

      For low-level ccNUMA, you'd want three things:

      • A CPU network/bus with a "delay tolerant protocol" layer and support for tunneling to other chips
      • An MTU-to-MTU network/bus which used a compatible protocol to the CPU network/bus
      • MTUs to cache results locally

      If you were really clever, the MTU would become a CPU with a very limited instruction set (since there's no point re-inventing the rest of the architecture and external caching for CPUs is better developed than external caching for MTUs). In fact, you could slowly replace a lot of the chips in the system with highly specialized CPUs that could communicate with each other via a tunneled CPU network protocol.

  • And then each router, which is a processing unit in its own right, could have multiple cores, which would exhibit the same drawbacks... until you put a network of processors inside that!

    • by ExploHD ( 888637 )

      And then each router, which is a processing unit in its own right, could have multiple cores, which would exhibit the same drawbacks... until you put a network of processors inside that!

      We need to go deeper!

  • Sounds like history... the history of the Hub in LAN technology.

    Maybe it's time to move to a Switch, that can keep multiple core-pairs communicating simultaneously.

    • by Osgeld ( 1900440 )

      I still think switches on tiny low traffic networks is a silly notion, though now that cost of switches are insignificant(and when was the last time you saw a hub for sale) I just go with the flow.

      Back in the day we had a client who dumped their hubs in each branch for much more expensive at the time switches, then whined that there was no advantage. I replied you insisted on putting your 2 386's and a dot matrix printer on it, and even threatened to take your biz elsewhere, you what you wanted, enj

      • Switches are much better when any two or more hosts on the network can use a significant percentage of the total bandwidth at once. Since about every device on a modern network can transfer at a full 100Mbps easily (at least until its memory buffers fill or empty on the slowest) a hubbed network would behave terribly. WiFi kind of works in the same way as the collision domain on a hub and you see this reflected in the raw throughput between hosts.

        The other thing about a hub is it sends all traffic to all ho

        • by Osgeld ( 1900440 )

          I agree to a point, in the 2000's people were going ape shit for switches, for what? 2-3 computers on a 768k internet connection?

          Heck even today thats all I really use (ok maybe 4 computer on a 10 meg connection, but not all at the exact same time) its actually rare that I transfer mass amounts of data over my home network, frankly its just faster to pop in a 120 gig hard disk and make backups rather want slog though the network, when I have to do a serious full backup of personal data.

          but heck I was talkin

  • Buses are so '90s (Score:5, Informative)

    by rrohbeck ( 944847 ) on Tuesday April 10, 2012 @10:54PM (#39640447)

    AMD uses HT and Intel has its ring bus, both of which use point-to-point links. Buses have serious trouble with the impedance jumps at the taps and clock skew between the lines, that's why nobody is using them in high speed applications any more. Even the venerable SCSI and ATA buses went the way of the dodo. The only bus I can see in my system is DDR3 (and I think that will go away with DDR4 due do the same problems.)

    • Bus? That is so 70's and 80's!

      What about the crossbar switch? They were in fashion in the 90's and are pretty much the core architecture of any multi CPU system.

      Next they'll be saying you can have multiple users on the same computer!!

      • by tamyrlin ( 51 )

        Actually, even the first computers used buses. For example the Z3, which was built in the early 40's, used buses to transport data. (Actually, the Z3 architecture was very advanced for its time and it is much closer to a modern simple processor than for example ENIAC.)

        Regarding the article summary I could note that it is not only researchers from MIT that says that a network-on-chip (NoC) is a promising concept for the future of chip design. Almost every researcher I've talked to seem to agree that NoCs of

  • by solidraven ( 1633185 ) on Tuesday April 10, 2012 @11:06PM (#39640521)
    That's just plain inefficient use of silicon area. They wish to waste some of that limited space on additional logic that isn't strictly necessary. And it will cause a significant bottleneck to be created. Did they forget about DMA controllers or something? You already need a DMA controller no mater what and it's perfectly capable of accessing the necessary memories as it is. Adding some extra capabilities to the DMA controller would be far more efficient in logic area size and most likely lead to a better performance compared to this bad idea.
    • by Theovon ( 109752 )

      Silicon AREA is cheap, and it's getting cheaper. Today's processors dedicate half their die space to CACHE. Transistors per die, cores per die, and transistors per core are all increasing at (different) exponential rates. And with power density increasing at a quadratic rate, we're already facing the dark silicon problem, where if we power on the entire chip at nominal voltage, we have trouble delivering the power, and we can't dissipate the heat.

      With 16 cores, a bus is tolerable. At 64, it's a liabilit

      • by tlhIngan ( 30335 )

        Silicon AREA is cheap, and it's getting cheaper. Today's processors dedicate half their die space to CACHE. Transistors per die, cores per die, and transistors per core are all increasing at (different) exponential rates. And with power density increasing at a quadratic rate, we're already facing the dark silicon problem, where if we power on the entire chip at nominal voltage, we have trouble delivering the power, and we can't dissipate the heat.

        Actually, no.

        Silicon area is *extremely* expensive. The large

      • That's where you're wrong, the cost per area is actually increasing significantly. It is indeed true that the cost per transistor is decreasing (at the moment at least). But since for example Intel and AMD want performance, so they're willing to trade in significant portions of area for an increase in speed. So the cost of your average desktop processor should actually keep increasing; something that luckily hasn't been happening considerably compared to the increasing costs of the new lithography machines
  • after the data is chopped up, formatted, sent down a narrow serial pipe is so much faster than directly over a parallel link, and besides no a TYPICAL chip has 2 to 4 cores, 6-8 would imply a higher end chip that currently is quite expensive and not in TYPICAL use by TYPICAL people.

    MIT please get out of the dreams lab once in a while

    • No "typical" consumer chip 10 years ago had even 4 cores.

      • by Osgeld ( 1900440 )

        who said anything about 10 years ago, and do you think in 10 years we will have typical consumer machines with "chips with hundreds or even thousands of cores"

        in 10 years we will be honestly lucky to have serious machines with "hundreds or even thousands of cores" on the same plane and not strung together with networking.

        • What are you even referring to?

          You're OP was implying this is all garbage because 6-8 cores is a high end chip, not a "typical" one.

          Yet 10 years is not a long time - in the past decade 4 would've been a high-end chip, and before that having 2 physical processors would've been significant as well.

          So I would think, there is in fact a great deal of importance to this kind of work seeing as how the number of cores per chip for consumer items has grown and grown. And then you undermine your own point by implying

          • by Osgeld ( 1900440 )

            what are you replying to, no where does it state "in 10 years"

            here just in case you missed it, the very first sentence of the headline

            ""Today, a typical chip might have six or eight cores, all communicating with each other over a single bundle of wires, called a bus"

            in case you missed it again let me point it out to you TODAY, A TYPICAL CHIP MIGHT HAVE SIX OR EIGHT CORES

    • by tamyrlin ( 51 )

      > MIT please get out of the dreams lab once in a while

      Actually, no chip-designer wants to use a network-on-chip if they can avoid it due to the added complexity. However, for future SoC designs with hundred of modules it will simply not be efficient to have direct parallel links between every module on the chip. A network will in many cases therefore be the best trade-off between silicon area, bandwidth, and energy efficiency.

      Also, note that a typical SoC used in for example a mobile phone already have s

  • I admit that despite being a technical user, I was not aware that only 2 chips are allowed to "talk" at a given time. I had (erroneously, it would seem) assumed that in order for a 3+-core chip to be fully useful, such a switch/router would have to already be in place.

    So, have Intel, AMD, and others simply tricked us into thinking that a 3+-core chip can actively use all its cores at once (as is the natural assumption), or am I misinterpreting something? If they have, why on earth didn't they include a "r

    • Re: (Score:2, Informative)

      by Anonymous Coward

      You are misinterpreting it. The chips CAN work independently. It is only when one needs to talk to another or use a shared resource (hard drive, main memory, network) that this becomes a potential issue. It is like a family of three sharing a single bathroom - not such a big deal, Bump that up to 20 using the same bathroom, and you start having serious issues.

      • by DaneM ( 810927 )

        OK, I see. Thanks for the clarification. (Why post such an intelligent remark as Anonymous Coward?) This being an issue concerned only with shared resources seems to make the lack of concurrent interaction less of an issue, but as with your family/bathroom analogy, it will (predictably) become a major problem as the number of cores/processors in a system continues to increase.

        So, while I yet wonder why this hasn't already been thought-of and solved, I can see that it hasn't been a place that a (typically

    • by Forever Wondering ( 2506940 ) on Wednesday April 11, 2012 @01:16AM (#39641205)

      I admit that despite being a technical user, I was not aware that only 2 chips are allowed to "talk" at a given time. I had (erroneously, it would seem) assumed that in order for a 3+-core chip to be fully useful, such a switch/router would have to already be in place.

      For [most] current designs, Intel/AMD have multilevel cache memory. The cores run independently and fully in parallel and if they need to communicate they do so via shared memory. Thus, they all run full bore, flat out, and don't need to wait for each other [there are some exceptions--read on]. They have cache snoop logic that keeps them up-to-date. In other words, all cores have access to the entire DRAM space through the cache hierarchy. When the system is booted, the DRAM is divided up (so each core gets its 1/N share of it).

      Let's say you have an 8 core chip. Normally, each program gets its own core [sort of]. Your email gets a core, your browser gets a core, your editor gets one, etc. and none of them wait for another [unless they do filesystem operations, etc.] Disjoint programs don't need to communicate much usually [and not at the level we're talking about here].

      But, if you have a program designed for heavy computation (e.g. video compression or transcoding), it might be designed to use multiple cores to get its work done faster. It will consist of multiple sections (e.g. processes/threads). If a process/thread so designates, it can share portions of its memory space with other processes/threads. Each thread takes input data from a memory pool somewhere, does some work on it, and deposits the results in a memory output pool. It then alerts the next thread in the processing "pipeline" as to which memory buffer it placed the result. The next thread does much the same. x86 architectures have some locking primitives to assist this. It's a bit more complex than that, but you don't need a "router". If the multicore application is designed correctly, any delays for sync between pipeline stages occur infrequently and are on the order of a few CPU cycles.

      This works fine up to about 16-32 cores. Beyond that, even the cache becomes a bottleneck. Or, consider a system were you have a 16 core chip (all on the same silicon substrate). The cache works fine there. But now suppose you want to have a motherboard that has 100 of these chips on it. That's right--16 cores/chip X 100 chips for a total of 160 cores. Now, you need some form of interchip communication.

      x86 systems already have this in the form of Hypertransport (AMD) or the PCI Express Bus (Intel) [there are others as well]. PCIe isn't a bus in the classic sense at all. It functions like an onboard store-and-forward point-to-point routing system with guaranteed packet delivery. This is how a SATA host adapter communicates with DRAM (via a PCIe link). Likewise for your video controller. Most current systems don't need to use PCIe beyond this (e.g. to hook up multiple CPU chips) because most desktop/laptop systems have only one chip (with X cores in it). But, in the 100 chip example, you would need something like this and HT and PCIe already do something similar. Intel/AMD are already working on any enhancements to HT/PCIe as needed. Actually, Intel [unwilling to just use HT], is pushing "Quick Path Interconnect" or QPI.

      • by DaneM ( 810927 )

        Thanks for the enlightening "sip from the fire hose," Forever Wondering. I appreciate the explanation.

        • Thanks for the enlightening "sip from the fire hose," Forever Wondering. I appreciate the explanation.

          You're quite welcome. It's refreshing to get a thank you on slashdot--and much appreciated.

          It seemed like you had shelled out good money for a multicore system and were concerned that you weren't getting your money's worth.

          In fact, Intel/AMD cores work even harder for you than that using several techniques:
          Hyperthreading (http://en.wikipedia.org/wiki/Hyper-threading)
          out of order execution (http://en.wikipedia.org/wiki/Out-of-order_execution)

          Because of this and the sheer speed of a 3+ GHz CPU, the m

      • by dkf ( 304284 )

        That's right--16 cores/chip X 100 chips for a total of 160 cores.

        16 * 100 = 160?

        You must be a hardware engineer. Did you work for Intel on the early Pentium floating point unit?

        • That's right--16 cores/chip X 100 chips for a total of 160 cores.

          16 * 100 = 160?

          You must be a hardware engineer. Did you work for Intel on the early Pentium floating point unit?

          Yep, I caught the math error, too, but only after posting. I was debating a one-liner reply to correct it, but didn't want to clutter things up with a reply just to correct the typo.

          I'm a computer engineer, which is 50% software, 50% hardware. While I could forgive a hardware engineer, a software engineer never makes misteaks [pun intended].

  • by holophrastic ( 221104 ) on Tuesday April 10, 2012 @11:37PM (#39640717)

    Yeah, great idea. Take the very fastest communication that we have on the entire planet, and replace it with the absolute slowest communication we have on the planet. Great idea. And with it, more complexity, more caches, more lookup tables, and more things to go wrong.

    The best part is that it's totally unbalanced. Internet protocols are based on a network that's ever-changing and totally unreliable. The bus, on the other hand, is best on total reliability and static.

    I'd have thought that a pool concept, or a mailbox metaphor, or a message board analog would have been more appropriate. Something where streams are naturally quantized and sending is unpaired from receiving. Where a recipient can operate at it's own rate uncommon to the sender.

    You know, like typical linux interactive sockets, for example. But what do I know.

    • by tamyrlin ( 51 )

      Actually, the networks used in Network-on-Chips are quite unlike the networks used for TCP/IP. For example, when you develop a System-on-Chip you have a very good idea of your workload, so you can optimize the network topology based on that information. The networks proposed in NoC research typically also have other features not found on the Internet such as guaranteed and in-order delivery of packets. (Which is fairly easy to do in a small network with low latencies.) In many cases you can also reserve ban

  • by Sarusa ( 104047 ) on Tuesday April 10, 2012 @11:49PM (#39640797)

    As mentioned in other comments, this has been done before. The method of message passing isn't as fundamental as one key point - that it is all explicit message passing.

    Intel and AMD x86/x64 CPUs use coherent cache between cores to make sure that a thread running on CPU 1 sees the same RAM as a thread running on CPU 3. This leads to horrible bottlenecks and huge amounts of die tied up in trying to coordinate the writes, maintain coherency between N cores (N-1 ^2 connections!), and it all just goes to hell pretty fast. Intel has this super new transactional memory rollback thing, but it's turd polishing.

    The next step is pretty obvious (see Barrelfish) and easy: no shared coherency. Everything is done with message passing. If two threads or processes (it doesn't really matter at that point) want to communicate they need to do it with messages. It's much cleaner than dealing with shared memory synchronization, and makes program flow much more obvious (to me at least - I use message queues even on x86/x64). If you need to share BIG MEMORY between threads, which is reasonable for something like image processing, you at least use messages to explicitly coordinate access to shared memory and the cores don't have to worry about coherency.

    This scales extremely well for at least a couple thousand CPUs, which is where the 'local internet' becomes useful.

    Where it becomes not easy is that almost all programs written for x86/x64 assume threads can share memory at will. They'd need to be rewritten for this model or would suddenly run a whole lot slower since you'd have to lock them to one core or somehow do the coordination behind their back. It'd be worth it for me!

    • by Anonymous Coward

      IPC has been a PITA and slow since decades, you don't want that to be the only option in the future.

      • by Sarusa ( 104047 )

        As part of this the messaging has to be as fast as possible, which is where the article comes in. Newer cores/chips designed for this kind of thing have multi-gigabytes/sec just for the messaging and tiny latencies.

        The threads/processes still shouldn't be so tightly coupled that they're talking more than working (or waiting), or something's probably wrong with the design. Even in a shared memory model it's probably spending massive amounts of time twiddling mutexes and trying to keep memory synced between t

    • by Anonymous Coward
      Seems like you are talking about switching from a "strong memory model" to a "weak memory model" and TBQH I know my share of developers that can barely handle multithreaded programming as it is... throwing this at them could be a disaster on the software side.
      • by dkf ( 304284 )

        Seems like you are talking about switching from a "strong memory model" to a "weak memory model" and TBQH I know my share of developers that can barely handle multithreaded programming as it is... throwing this at them could be a disaster on the software side.

        Depends on the model. If the model is "oh, you got one big space of memory; anything goes but you'd better sprinkle a few locks in" then yes, that will suck boulders when the hardware switches to message passing, but there are other parallelism models in use in programming. Those that have each thread as being essentially isolated and only communicating with the other threads by sending messages will adapt much more easily; that's basically MPI, and that's known to scale massively. It's also a heck of a lot

  • oh, come on. buses have been dead for years (sata and pcie are great examples of the prevalence of point-to-point links). no reason we can't think of cachelines as packets (bigger than ATM packets were!). how about hypertransport and QPI?

    • Everything you do deals with a bus somewhere. They're still hugely relevant, particularly in very dense, very fast electronics.

  • by bill_mcgonigle ( 4333 ) * on Wednesday April 11, 2012 @12:12AM (#39640911) Homepage Journal

    I can't seem to find the old story or my comment on it, but when Google acquired a 'stealth' startup a year or so ago the most interesting thing about it was that the primary investigator had a few patents for packet-switched CPU's.

  • Come on people. Cores share information and suddenly it's just like the internet? Are these journalist's experiences so narrow that they have no other analogy? It's just a fricking bus! There are networks that exist which are not "the internet". Using the term "internet" implies global connectivity. OK, I expect journalists to be ignorant but please are slashdot editors this confused about basic technology as well?

  • Didn't AMD just buy a company that did something similar to this? While not at the chip or core level it seems kinda realted

  • The problem is not the hardware but the software. The hardware has been parallel for ages, even locally (GPU, GPU-memory, CPU, memory, HDD, DMA - memory processor, ...).

    Software is a different problem across networked/parallel arena. If you really think about an SMS it is not much more than 'hello world'. You type it and you see text (no other function, other than transport which isn't really a function, has been done) and testing it should be easy. This is not even about parallelism but about communicat
  • isn't this a variation on Cell architecture? except, no one could figure out how to write the OS and compiler to fully realize the goal of programs that could be farmed out by the ARM CPU to the special processors on one chip, let alone farm to multiple cell's over a network.
  • The seminal paper proposing the use of switched/routed interconnection networks on-chip (NoCs) was published by Dally and Towels 11 years ago in DAC'01: Route packets, not wires: On-chip interconnection networks [ieee.org]. The idea of associating a router to each core and replicating it in "tiles" is not new either; Tilera [tilera.com] was (IIRC) the first company to sell processors based on a tiled design, which was an evolution of the RAW [mit.edu] research project. A related research project, the TRIPs [utexas.edu], replicated functional units on ea

  • cores should instead communicate the same way computers hooked to the Internet do

    apparently never heard of beowulf clustering

  • Then why do all Intel CPU's, except a very small amount of xeon CPU's, have only 4 cores max, even the new Ivy Bridge ones to be released this year, even though 5 years ago they also had chips with 4 cores already?

  • ... now my mother will finally have Internet in her computer!

  • I was going to say this seems to be the realisation that the Transputer had the answers decades ago, but it seems many others have said exactly the same thing. I shall resume my nap.,....

  • YOU HAVE INVENTED A BUS! It's time to start working on the first multitasking OS!

    What is it with idiots coming out of the woodwork presenting old (and often obsolete and abandoned such as virtualization) technologies as some kind of new development?

  • The idea only works until one of the cores starts sending spam. Hey core, want Vi@gra?

  • The Network on Chip has been around as a concept so long we even have an abbreviation (NoC). Maybe this isn't in commodity products, but basically if you want to do an NoC, you don't have to invent anything yourself. There are several conferences and journals that have been publishing papers on this for decades. But, OH, if a professor from MIT mentions it, it must be something NEW. Sheesh.

  • Looks like a deja-vu, considering MIT's Connection Machine [wikipedia.org]. While the interconnect will be less regular (not a hypercube), the message passing between cores will have to be routed in one way or the other, just as with the CM. So how is that news?
  • Sun pegged it right when they said "The Network Is The Computer."

    The specific speed of the network interconnect, the topology of the network fabric, and whether you normally think of it as a network connection are all that distinguish any multi-core system from a distributed cluster. Cloud computing begins to scratch the implications of this at the cluster/site level, and now it would seem some VLSI gearheads are thinking in the same abstract model at the chip level.

    Once you start thinking of all your

  • XDBus: a high-performance, consistent, packet-switched VLSI bus

    This paper appears in:
    Compcon Spring '93, Digest of Papers.
    Date of Conference: 22-26 Feb 1993
    Author(s): Sindhu, P.
    Xerox Palo Alto Res. Center, CA
    Frailong, J.-M. ; Gastinel, J. ; Cekleov, M. ; Yuan, L. ; Gunning, B. ; Curry, D.
    On Page(s): 338 - 344
    The XDBus is a low-cost, synchronous, packet-switched VLSI bus designed for use in high-performance multiprocessors. The bus provides an efficient coherency protocol which guarantees processors a c

Every nonzero finite dimensional inner product space has an orthonormal basis. It makes sense, when you don't think about it.

Working...