Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Programming Operating Systems Power Hardware News Build Science Technology

MIT's Swarm Chip Architecture Boosts Multi-Core CPUs, Offering Up To 18x Faster Processing (gizmag.com) 55

An anonymous reader writes from a report via Gizmag: MIT's new Swarm chip could help unleash the power of parallel processing for up to 75-fold speedups, while requiring programmers to write a fraction of the code that is usually necessary for programs to take full advantage of their hardware. Swarm is a 64-core chip developed by Prof. Daniel Sanchez and his team that includes specialized circuitry for both executing and prioritizing tasks in a simple and efficient manner. Neowin reports: "For example, when using multiple cores to process a task, one core might need to access a piece of data that's being used by another core. Developers usually need to write code to avoid these types of conflict, and direct how each part of the task should be processed and split up between the processor's cores. This almost never gets done with normal consumer software, hence the reason why Crysis isn't running better on your new 10-core Intel. Meanwhile, when such optimization does get done, mainly for industrial, scientific and research computers, it takes a lot of effort on the developer's side and efficiency gains may sometimes still be minimal." Swarm is able to take care of all of this, mostly through its hardware architecture and customizable profiles that can be written by developers in a fraction of the time needed for regular multi-core silicon. The 64-core version of Swarm came out on top after MIT researchers tested it out against some highly-optimized parallel processing algorithms, offering three to 18 times faster processing. The most impressive result was when Swarm achieved results 75 times better than the regular chips, because that particular algorithm had failed to be parallelized on classic multi-core processors. There's no indication as to when this technology will be available for consumer devices.
This discussion has been archived. No new comments can be posted.

MIT's Swarm Chip Architecture Boosts Multi-Core CPUs, Offering Up To 18x Faster Processing

Comments Filter:
  • Parallelization... (Score:5, Insightful)

    by Timothy2.0 ( 4610515 ) on Monday July 04, 2016 @07:42PM (#52445433)
    It's important for the average consumer to realize that not all processing tasks are easily parallizable, and some downright aren't. In those cases, additional cores aren't going to give you much in the way speed increases. Of course, your average consumer *doesn't* realize that, and when they go to their favourite big-box store for a new computer, the sales associate isn't going to sit down and discuss the reality of the situation either.
    • by HiThere ( 15173 )

      While true, multi-processor systems are considerably more responsive while busy with another task. So, e.g., you can be downloading upgrades, compressing files, and word processing all at the same time without penalty. Admittedly, it's hard to see how that particular scenario would be better with 100 cores than with 5 or 6. But a batch of them could be rendering an animation or some such.

      FWIW, I have a task in mind where 1,000 cores would not be overkill, but most users would never do it. However they m

    • I think the salesman handles the 'additional cores' question when they ask "What will you be using your computer for?" (while they size up the mark and frankly work on Internal Problem #1 "How much can I upsell to this sucker?")
    • The reason I want multiple cores is due to all the other processes that go on in the background. I know there are programs that shut down background tasks, but who knows?
    • The summary is bad. My understanding of what they did (after reading the article): implemented a shortest-path algorithm in software, but parallelizing it by putting a priority queue into hardware to allocate tasks.
      • How did they claim a 75x speedup using 64 cores?

        • I didn't read TFA, but there are quite a lot of algorithms that exhibit superlinear speedup. The costs for a parallel algorithm are typically related to communication, but there's nothing magical about a sequential algorithm that means that it doesn't have its own costs. Storing temporary results and managing the queue of work to do are still requirements for a sequential one and often the parallel version can benefit from better locality of reference and so make better use of caches.
        • by cdrudge ( 68377 )

          Don't underestimate the overhead expense of context switching.

        • by JanneM ( 7445 )

          When you subdivide a problem, each core works on a smaller subset. If those subsets fit into a cache that the bigger problem didn't, you can easily get superlinear increase as a result. In many cases you could actually rewrite the bigger problem to be more cache-friendly and get a similar speedup, so you generally don't make much of such "extra" performance increases.

    • by Anonymous Coward

      Most apps that need more processing benefit from multithreading, which you get with multiple cores. Parallel code is when a thread is broken down into mini threads and spread over multiple cores and then recombined to get the result. It creates overhead in a way, but for BIG number crunching it's especially useful.

      If you want to run a simple process as fast as possible, you just run it and it runs and it's done.You can't really benefit from parallel code for most tasks, as you say.

      But you seem to fail to re

    • One wonders how much of a speed boost those algorithms would have got if they'd written them to run on a fairly average GPU
  • by ThosLives ( 686517 ) on Monday July 04, 2016 @07:54PM (#52445479) Journal

    I guess the world is rediscovering that special-purpose chips will always be faster at their special purpose than a general-purpose chip will be.

    • The win for special-purpose chips has always been obvious. The recent change is in the economics. It used to be very expensive to have any functionality in an IC. One of the driving forces behind the original RISC and VLIW chips was to devote as much of your transistor budget to execution units and remove anything that didn't directly contribute to performance. Now, the economics are quite different. Transistors are cheap but power dissipation is hard. It's easy to stick more execution units on an SoC
  • i am dumb on this, but if 'hardware architecture' can be made to take care of avoiding conflicts and "direct how each part of the task should be processed and split up between the processor's cores", same can be done through software that imitate whatever 'hardware architecture' is doing?
    if this can be done, basically this software would be another step in compiling/assembling process?

    as i said, i am ignorant on this, but why not?

    • I've only had a quick look at their press release, is there a pre-print of their paper anywhere?

      This looks like a hardware implementation of something like "Grand Central Dispatch". Combined with transactional memory.

      The basic idea seems to be that you can take a serial-ish process, break it up into tasks. Start running the first few tasks that should obviously run first. Then if you have spare CPU cores, you can also start speculatively executing later tasks. But if these speculative tasks hit a conflict

      • by imgod2u ( 812837 )

        http://livinglab.mit.edu/wp-co... [mit.edu]

        They use individual cores to speculatively execute very short sequences of instructions, for instance, a function call or loop iteration. The algorithms they benchmark resemble the architecture -- where there's a lot of very small code sequences that aren't usually very dependent on each other, however the individual code sequences aren't large enough for traditional thread-based solutions with high-synchronization overhead to work.

        One wonders how this would compare to a loc

    • Sure it -could- be done in software. Essentially any design can be implemented as hardware, software, or a hybrid of the two. (A major problem for those complaining about "software patents".) I wouldn't be surpised if someone does take some of their ideas and implement them in software.

      In general, hardware will be faster and in some ways more reliable than a software implementation of the same algorithm. It also means software doesn't have to be recompiled for lots of different types of hardware, if the

      • Actually, not so much a problem for software patents. Software is, generally speaking, a general solution, an algorithm. That is to say, math - something explicitly exempted from patent protection because it would inherently be overbroad and cut off all further development in that direction. Hardware is a machine - a specific implementation. Make some slight modifications, and it's no longer protected by the original patent.

        If software patents followed the same rules as hardware patents they'd be far les

        • To to your point, look up "SystemC". It's the C programming language, used to write programs which are often compiled as pure hardware. Often, but not always - the same code can be rendered as either pure hardware or pure software. See also Verilog and PLAs. PLAs start and end their life as pure hardware devices. In between, connections in the hardware are destroyed to create a new hardware array as specified by programming language code.

          What you're missing is that any algorithm, most any code, can be c

          • The thing is - the compiler could potentially generate a long list of different binaries or hardware configurations that all result in the same functionality within some performance envelope. As hardware, every one of those different assemblies would potentially require a separate patent as it does the same thing in a different manner, and hardware patents only protect specific implementations. As a software patent though, as they stand now, you don't even need to offer the source code that could generate

    • i am dumb on this, but if 'hardware architecture' can be made to take care of avoiding conflicts and "direct how each part of the task should be processed and split up between the processor's cores", same can be done through software that imitate whatever 'hardware architecture' is doing?

      From reading the MIT page, I gather that it should be possible but it would result in substantial overhead. The bloom filter alone would also need it's own core.

      if this can be done, basically this software would be another step in compiling/assembling process?

      Yes, however, this would not be helpful for 99% of software because most software simply cannot benefit from parallel processing. The one area that benefits the most from parallel processing is graphics, specifically manipulation and rendering. That said, where this may be able to help is in creating a better GPU, so it should be no surprise that

      • I am going to assume that you are new to programming, because claiming that most software cannot benefit from parallel processing is hilariously false. It's just most programmers can't do it, or do it well that is the issue. Almost all software today can benefit from parallel processing, it's just a matter of how much, and if it is worth the expense of actually getting a programmer who can do it rather than throwing 10 code monkeys in a room to bang out barely functional code.

    • by Anonymous Coward

      If this hardware does something that could be done at compile time, it is IMHO indeed pretty useless. That's why I hope it is "runtime-smart", meaning that it reacts to data access conflicts as they actually happen while the program is running. That would be something that is much harder to achieve, in an efficient manner at least, through software. The talk about the profiles devs have to declare doesn't sound good to me: people who don't bother writing software that uses proper locking or libs implementin

  • Sounds much more like something that should be refinements to code generation than baked into chip architecture. That said it's good to see work being done on better parallel methods rather than just bigger.

  • by joris.w ( 1003616 ) on Tuesday July 05, 2016 @04:10AM (#52446693)
  • by DidgetMaster ( 2739009 ) on Tuesday July 05, 2016 @10:19AM (#52448243) Homepage
    There are two ways that multiple cores can help the average users. First, they allow multiple different processes to run at the same time. You can run a word processor, spreadsheet, browser, etc. all at once. Unless each of these processes are waiting on the same resource (e.g. all trying to write to the disk at the same time, or waiting for the user to press a key) then they can complete tasks much faster than a machine with fewer cores.

    Second, they allow a single program to do more than one thing at a time. Lots of programs will have a separate thread to handle the user interface while another does background tasks, but few will try and break big tasks into multiple pieces. For example, many database programs will be able to run several independent queries at the same time, but few will run a single query faster on a multi-core machine than on a single core one.

    I am working on a new data management system that does both. It can let lots of queries run at the same time, and it can break a single query into smaller pieces. The more cores the better. A query that takes 1 minute on a single core can often do the same thing in about 1/5 the time on a quad core (8 threads).
  • ...than other articles. No, really, "more" and "less" only work when comparing things.

  • I'm pretty sure the developers of Crysis did put in the work to parallelize it effectively. Game engines are one of the most heavily optimized types of software out there, and CryEngine is one of the fastest game engines out there.

  • Could you imagine a Beowulf cluster of these?

Technology is dominated by those who manage what they do not understand.

Working...