Google Announces 8x Faster TPU 3.0 For AI, Machine Learning

Google Announces 8x Faster TPU 3.0 For AI, Machine Learning (extremetech.com) 27

Posted by BeauHD on Wednesday May 09, 2018 @05:30PM from the shiny-and-new dept.

At its developer conference yesterday, Google announced third-generation TPUs (Tensor Processing Units) for AI and machine learning, which are eight times more powerful than the Google TPU 2.0 pods with up to 100 petaflops in performance. They're so power-hungry that they require water cooling -- something previous TPUs haven't required. ExtremeTech reports: So what do we know about TPU 3.0? Not much -- but we can make a few educated guesses. According to Google's own documentation, TPU 1.0 was built on a 28nm process node at TSMC, clocked at 700MHz, and consumed 40W of power. Each TPU PCB connected via PCIe 3.0 x16. TPU 2.0 made some significant changes. Unlike TPU v1, which could only handle 8-bit integer operations, Google added support for single-precision floats in TPU v2 and added 8GB of HBM memory to each TPU to improve performance. A TPU cluster consists of 180 TFLOPS of total computational power, 64GB of HBM memory, and 2,400GB/s of memory bandwidth in total (the last thrown in purely of the purposes of making PC enthusiasts moan with envy).

No word yet on other advanced capabilities of the processors, and they are supposedly still for Google's own use, rather than wider adoption. Pichai claims TPU v3 can handle 100 PFLOPS, but that has to be the clustered variant, unless Google is also rolling out a new tentative project we'll call "Google Stellar-Equivalent Thermal Density." We would've expected to hear about it, if that was the case. As more companies flock to the AI / ML banner, expect to see more firms throwing their hats into this proverbial ring.

Google Announces 8x Faster TPU 3.0 For AI, Machine Learning

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 27 Comments Log In/Create an Account

Comments Filter:

HBM memory (Score:2, Informative)

by Anonymous Coward writes:

High Bandwidth Memory memory
Microsoft's Approach Differs (Score:4, Interesting)

by lazarus ( 2879 ) writes: on Wednesday May 09, 2018 @05:49PM (#56584142) Journal

In this particular case they seem to be bucking the silicon trend:
"At its annual Build conference Monday, Microsoft will suggest companies with big AI ambitions should steer clear [wired.com] of chips like Google’s. It says machine learning is evolving so fast that it doesn’t make sense to burn today’s ideas permanently into silicon chips that could soon prove limiting or obsolete."

- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  *cough* 80x86 *cough*
  Microsoft will suggest companies with high-performance ambitions should steer clear [wired.com] of chips like Intel’s. It says high-performance computing is evolving so fast that it doesn’t make sense to burn today’s ideas permanently into silicon chips that could soon prove limiting or obsolete."
- Re:Microsoft's Approach Differs (Score:4, Interesting)
  
  by drinkypoo ( 153816 ) writes: <drink@hyperlogos.org> on Wednesday May 09, 2018 @07:14PM (#56584614) Homepage Journal
  
  In this particular case they seem to be bucking the silicon trend:
  They're not bucking anything. They're stumping for Azure. They want people to do their AI in the cloud, because there's a chance they'll do it in Microsoft's cloud.
  In any case, they're also wrong. If you want to do AI without the cloud, and you need high performance, you need specialized hardware. If you have a concept which can lead to a product right now, and it needs to work without the cloud, then you probably need this hardware (or something like it) right now.
  
  - Re: (Score:2)
    
    by mikael ( 484 ) writes:
    
    For their business interests they are right. They can't pull AI back onto the desktop or the mobile device if the algorithms are locked into custom instruction sets or languages. For the customer, they get a higher performance/price ratio with custom cloud hardware.
- Re: (Score:2)
  
  by religionofpeas ( 4511805 ) writes:
  
  Microsoft also tried to do music players and phones differently.
- Re: (Score:2)
  
  by phantomfive ( 622387 ) writes:
  
  TensorFlow is generic enough that it will likely be around for at least a decade, and these particular chips will be obsolete by new technology long before then. It's a sweet framework.
  
  Also, I don't know if it's accurate to say machine learning is evolving quickly......it would be more accurate to say researchers are exploring the solution space that recently became accessible as a result of recent increases in processing power. To really make an evolution we'd have to figure out how to break out of that
- Re: (Score:2)
  
  by Rockoon ( 1252108 ) writes:
  
  Are they right? it depends what you are looking for.
  
  The solution Google is offering is actually overly generic to the point of needing large die areas to solve anything useful in a reasonable amount of time. Google's example isnt one of efficiency. They are still throwing large racks of silicon at the same problems, and any honest comparison is surely going to include a discussion of cost per solution.
  - Re: (Score:2)
    
    by religionofpeas ( 4511805 ) writes:
    
    Microsoft's solution uses FPGAs. That's even more generic and inefficient. Neural net processing requires fast multiplications as well as memory access, two areas in which FPGAs are not particularly good.
28nm?! (Score:2)

by DontBeAMoran ( 4843879 ) writes:

What is this, 2011?
- Re: (Score:2)
  
  by religionofpeas ( 4511805 ) writes:
  
  The 28nm chip was the first TPU, not this latest version.
Compared to Top500 Supercomputers (Score:2)

by chadkennedyonline ( 1283278 ) writes:

So at the 100 PFLOPS stated in the article, this thing ties with the worlds top supercomputer (https://en.wikipedia.org/wiki/TOP500#Top_10_ranking)? That's pretty nuts.
- Re: (Score:2)
  
  by slew ( 2918 ) writes:
  
  So at the 100 PFLOPS stated in the article, this thing ties with the worlds top supercomputer (https://en.wikipedia.org/wiki/TOP500#Top_10_ranking)?
  That's pretty nuts.
  Actually, this is 100 P- DL -FLOPS (DL=deep learning meaning 8-bit with shared exponent). Although the second generation (and presumably third gen) TPU can also do 16-bit floating point (and maybe FP32) for training, the quoted (i.e., not-to-be-exceed) number is the deep learning flops for inference/recall...
  In contrast, a typical supercomputer generally describes their performance for IEEE 64-bit double precision floating point (FP64)
  No doubt the later generations of TPUs will support some reasonable lev

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Google Announces 8x Faster TPU 3.0 For AI, Machine Learning (extremetech.com) 27

Google Announces 8x Faster TPU 3.0 For AI, Machine Learning More Login

Google Announces 8x Faster TPU 3.0 For AI, Machine Learning

HBM memory (Score:2, Informative)

Microsoft's Approach Differs (Score:4, Interesting)

Re: (Score:1)

Re:Microsoft's Approach Differs (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

28nm?! (Score:2)

Re: (Score:2)

Compared to Top500 Supercomputers (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot