Slashdot is powered by your submissions, so send in your scoop


Forgot your password?

MIT Develops New Chip That Reduces Neural Networks' Power Consumption by Up to 95 Percent ( 55

MIT researchers have developed a special-purpose chip that increases the speed of neural-network computations by three to seven times over its predecessors, while reducing power consumption 94 to 95 percent. From a report: That could make it practical to run neural networks locally on smartphones or even to embed them in household appliances. "The general processor model is that there is a memory in some part of the chip, and there is a processor in another part of the chip, and you move the data back and forth between them when you do these computations," says Avishek Biswas, an MIT graduate student in electrical engineering and computer science, who led the new chip's development. "Since these machine-learning algorithms need so many computations, this transferring back and forth of data is the dominant portion of the energy consumption. But the computation these algorithms do can be simplified to one specific operation, called the dot product. Our approach was, can we implement this dot-product functionality inside the memory so that you don't need to transfer this data back and forth?"
This discussion has been archived. No new comments can be posted.

MIT Develops New Chip That Reduces Neural Networks' Power Consumption by Up to 95 Percent

Comments Filter:
  • by Mostly a lurker ( 634878 ) on Thursday February 15, 2018 @11:07AM (#56128304)

    The tensor processing units Google developed seem also very capable compared to regular processors. Does anyone know how MIT's new chips stack up against what Google already has in operation?

    • by DrTJ ( 4014489 ) on Thursday February 15, 2018 @11:19AM (#56128392)

      The MIT press release says next to nothing, unfortunately. AFAICT, they don't reference any published article, or any kind of link to more information, so it is hard to assess. I really wanted to know more so I'm a little disappointed with MIT.

      There are a few things that indicates that this is not even comparable to Google TPU:
      1. The lack of more information.
      2. They label it as a prototype.
      3. The top person link goes to a first year graduate student (making a real ASIC takes a slightly larger team, I hear).

      Without more detailed information, this is hard to distinguish from PR.

      • by Ayano ( 4882157 )
        It's probably funded by a company or the team/professor is working on Patenting it. This is typical of emergent technology not funded with public money.
        • Re: (Score:2, Troll)

          by burtosis ( 1124179 )

          It's probably funded by a company or the team/professor is working on Patenting it. This is typical of emergent technology not funded with public money.

          Not yet funded with public money. Don't worry, some dumbass grad student(s) who singlehandedly redesign and optimize processors for neural networks won't keep that company position for long. No, as soon as it looks remotely profitable, and after thier dumb ass works 100hrs/week for three years to get a thesis to market, they will comfortably find themselves out the door with a firm boot mark in thier butt as thier only asset remaining from the company. Don't worry, big money can commit finnancial crimes

    • I'd hope the MIT chip could do math better. 3-7 times faster, 5%-6% as much power draw? That's 0.7%-2% as much power consumption per computational operation.
    • by ShanghaiBill ( 739463 ) on Thursday February 15, 2018 @12:31PM (#56128870)

      Does anyone know how MIT's new chips stack up against what Google already has in operation?

      This seems to be different.

      Google's TPUs reduce power and increase speed, but are targeted for internal use in data centers. You can't buy one.

      This MIT chip is targeted toward home use and mobile devices.

      Both chips do fast low precision matrix ops. The TPU uses eight bit multipliers. TFA is poorly written, but it appears that the MIT chip does analog multiplication. From TFA: In the chip, a node’s input values are converted into electrical voltages and then multiplied by the appropriate weights. Summing the products is simply a matter of combining the voltages. Only the combined voltages are converted back into a digital representation and stored for further processing.

      If this is true, then that could be a huge boost in efficiency, but results would not be exactly repeatable: You could get different results for the exact same inputs.

      Another feature is that the neurons in each layer produce a single binary output. That is obviously simpler than the TPU's 8-bit outputs, and is analogous to how biological neurons work. But it limits which algorithms can be used. RBMs (Restricted Boltzmann Machines) use single bit outputs, and were used in the first successful "deep" networks, but have more recently fallen out of favor. Single bit outputs make backprop more difficult, although it sounds like this chip is targeted more for deployment than for learning.

      • by bondsbw ( 888959 )

        You could get different results for the exact same inputs.

        Great, now we can have insane AIs.

      • Price and computational utility aside, they sound GREAT for researching how biological neural networks work.

        • Price and computational utility aside, they sound GREAT for researching how biological neural networks work.

          I doubt that. This chip is designed to do fast and efficient matrix operations, which only work well if the neurons are in distinct and ordered layers. Biological brains don't do that . Also, biological brains learn by strengthening connections as they are used in a process very different from the backprop algorithm used in ANN, and it isn't clear if this new chip actually does any learning rather than just running a pre-programmed network.

          We will learn much more about biological brains from projects lik

  • by pablo_max ( 626328 ) on Thursday February 15, 2018 @11:09AM (#56128324)

    Just imagine a Beowulf cluster of these things ;)

  • mine cryptocurrencies?
    Some new ranking for chips in terms of minability.
  • by pablo_max ( 626328 ) on Thursday February 15, 2018 @11:21AM (#56128398)

    Looking at what is available today, I would have to say that today's smart world is incredibly stupid. Not to mention fractured with loads of standards, apps and do-dads.
    When Google took over Nest I had high hopes.
    I had imagined they would do something clever like install their phased array mics into the "smart" fire alarms that could be in almost every room. Then from anywhere you could ask google something. But no... you need to find some stupid crappy little speaker and keep shouting HEY, GOOGLE, HEY GOOGLE, HEY GOOGLE, until it finally can hear you.
    With these chips, they could take that idea even further. Install connect appliances, connected switches and sockets and then figure out the patterns of usage and voices to "learn your ways" and begin to antisipate thing.
    Oh.. Bob always turns on the TV right after he grabs a beer from the fridge around 6pm. The fridge just opened, so I will turn on the TV for him.. also it was cold today, so i will adjust the heat in that room so Bob's ass doesnt get to cold on his leather lazy-boy.
    I should think that is all totally possible today.

    • by mikael ( 484 )

      Add custom "voices" to these devices like Cylons, Cybermen, Daleks, Basil Fawlty, Village Idiot and Zen from Blakes Seven, and they would probably sell infinitely more when combined with locally processed speech recognition.

  • Since video cards have specialized processors that handle dot products (and all sorts of other matrix computations) like mad, how is what they are proposing much better than existing GPU's? In particular it seems like nVidia has been doing a ton of work to tailoring GPU's to be used with neural networks.

    • Re: (Score:2, Informative)

      by Anonymous Coward

      how is what they are proposing much better than existing GPU's?

      How about reading the summary?

      GPU's aren't exactly known for being energy efficient.
      This chip is more energy efficient since it doesn't need to move the data to a central processor that might even be on another chip.
      It distributes the ALU's among the memory so it doesn't have to move the data as far.

      Also to get an idea of the scale we are working with here, speed of light / 5 cm is about 6 GHz.
      If you want to work fast you don't want to move data long distances.
      There is a limit to how fast information can tr

    • For starters, video cards have too much precision.

    • by mikael ( 484 )

      GPU's have vector processor for their shader cores that do 32-bit and 64-bit floating point processing (IEEE-854 standard). To do a floating point calculation you have to align the two floating-point values by exponent, do the calculation and recalculate the new exponent and mantissa.

      Other solutions are to use fixed-point integers (8-bit, 16-bit and 32-bit) and even actual digital-to-analog conversion and back again.

      • and even actual digital-to-analog conversion and back again.

        Which is exactly what they do in the article. But the restriction of binary weight means the chip in design is incompatible with current mainstream neural networks.

  • The major vendors aren't nearly as interested in dropping the system hardware cost as they are in having plausible access to live microphone streams. Since the user is the product, and privacy is irrelevant, its now all about the data mining for advertising and related behavioral research. This also keeps the IP in the neural networks away from competitors and open source developers prying eyes. These chips might be used for some preprocessing, but these vendors want that data stream to continue as long a

  • Interesting development...
    But my understanding of this whole deal, and I might be wrong, is that we already have more than enough to make AIs local... this isn't a problem of capability, this is companies behind AI assistants trying to harvest as much data as possible from their costumers and turn a profit from it, and/or to use it for themselves.

  • So... how good is it at computing SHA256 hashes? ;)

    • Not at all useful. Neural nets require large memory bandwidth and multiplications. SHA256 hashing needs dedicated logic for SHA256 hashing, and very little memory bandwidth. Besides, there are already much better chips for hashing.

  • Color me skeptical (Score:5, Interesting)

    by Chris Katko ( 2923353 ) on Thursday February 15, 2018 @11:47AM (#56128560)

    That sounds like something an FPGA could do from the very beginning.

    The only new thing here would be possibly LARGER amounts of memory stored inbetween the fabric (reducing off-chip access, and increased number of LUTs not tied up as memory cells), and possibly like they said, combined "access and modify" operations.

    But I think the article itself doesn't understand what it's talking about then.

    And as general purpose as FPGA are in idea, they "custom adapted" to different tasks (and layout/fabric) since inception. So the question here is, are they talking about some kind of ASIC advancement that they didn't have before?

    >The chip can thus calculate dot products for multiple nodes — 16 at a time, in the prototype — in a single step, instead of shuttling between a processor and memory for every computation.

    This appears to be the only actual advancement/tech/change, being extruded out into an entire fluff article for college PR purposes.

    Personally, I'm way more interested in getting my hands on an "FPGA in CPU" ever since back in college when Altera was bought by Intel. Imagine a CPU that can be told to add CUDA cores when you start a game, or SHA cores when you start a server. Altera specializes is live reconfigurable FPGAs. FPGA's that can be "flashed" in whole or in part while still running.

    • by boa ( 96754 )

      Personally, I'm way more interested in getting my hands on an "FPGA in CPU"...

      Maybe something like this? []

    • FPGA aren't really good for massive amounts of multiplications. Modern FPGAs have dedicated multipliers, but they only have a few of them. And the reason they have dedicated multipliers is because the general FPGA fabric sucks at doing multiplications.

    • by Anonymous Coward

      ARM processor + FPGA fabric?

      And of course, the Virtex series has had (not incredibly well supported) partial reconfiguration on the fly for at least 10 years, and you can instantiate a CPU core of your choice.

    • by mikael ( 484 )

      Sun Microsystems used to have a patent for smart VRAM for video cards back in the 1990s. These put the basic OpenGL logic ops (and, or, xor etc...) onto the video memory so that entire blocks of pixels could be processed at accelerated pixblt speeds. Basically as fast as rows of chip memory were being pulled out and sent to the video output, and as deep as how many bit-planes you had. Back then that was 32-bits.

      FPGA's are used to simulate CPU and GPU cores. Sometimes they are bundled inside a PCI slot board

  • by wierd_w ( 1375923 ) on Thursday February 15, 2018 @12:10PM (#56128718)

    Such things include "Computational Ram" []

    There is also a very old idea of using memory elements directly to compute results, which is true memputing. (There are few examples of this, because it is costly as an architecture-- but your brain is a pretty good biological example. The same components are used for data storage, as well as data processing.)

    Given that such "Computational Ram" devices already exist in the wild, I fail to see why more novel hardware is needed, excepting as a refinement of concept?

    • by mikael ( 484 )

      Most brains are basically a sheet of computational neurons (outer periphery) while the interconnects warp this into a wrinkly structure like a Hilbert space filling curve in order to reduce connection distances. Internally, oxygenated blood is pumped outwards to bring in glucose and oxygen while taking away excess heat. Empty spaces are filled with fluid in order to maintain constant internal temperature.

    • Prior art from 1972

      A Logic-in-Memory architecture for large-scale-integration technologies []

      A computing machine is described which is structured around a distributed logic storage device called the Processing Memory. This machine, the Brookhaven Logic-In-Memory Processor (BLIMP), is meant only as a vehicle for simulating and evaluating its concepts, rather than for eventual fabrication. In particular, it is shown that the architecture used is very well suited to large-scale-integration (LSI) implementation

  • That could make it practical to run neural networks locally on smartphones

    I thought EVERY smartphone had a neural network chip in it already: that's how modern auto-focus works, unless I was misinformed.

    • by Anonymous Coward
      you were misinformed
      • Partly? Some quick research shows that many phones DO have neural-network code running on them, sometimes for passive auto-focus (active auto-focus, the type that uses range-finding sensors, does not seem to be very common on smartphones). Apple and Huawei actively advertise the use of NN tech in their phones, but not necessarily for auto-focus (face recognition and image enhancement seem to be what they are selling). The dedicated chip part is usually referred to as the ISP, or Image Signal Processor, a de

  • Oh just great! Looks like Iâ(TM)ll soon have to replace all my old appliances which employ âoefuzzy logicâ with nee appliances that employ neural networks.

"In my opinion, Richard Stallman wouldn't recognise terrorism if it came up and bit him on his Internet." -- Ross M. Greenberg