Tesla Unveils Dojo Supercomputer: World's New Most Powerful AI Training Machine (electrek.co) 32

Posted by BeauHD on Friday August 20, 2021 @08:02PM from the impressive-tech dept.

New submitter Darth Technoid shares a report from Electrek: At its AI Day, Tesla unveiled its Dojo supercomputer technology while flexing its growing in-house chip design talent. The automaker claims to have developed the fastest AI training machine in the world. For years now, Tesla has been teasing the development of a new supercomputer in-house optimized for neural net video training. Tesla is handling an insane amount of video data from its fleet of over 1 million vehicles, which it uses to train its neural nets.

The automaker found itself unsatisfied with current hardware options to train its computer vision neural nets and believed it could do better internally. Over the last two years, CEO Elon Musk has been teasing the development of Tesla's own supercomputer called "Dojo." Last year, he even teased that Tesla's Dojo would have a capacity of over an exaflop, which is one quintillion (1018) floating-point operations per second, or 1,000 petaFLOPS. It could potentially makes Dojo the new most powerful supercomputer in the world.

Ganesh Venkataramanan, Tesla's senior director of Autopilot hardware and the leader of the Dojo project, led the presentation. The engineer started by unveiling Dojo's D1 chip, which is using 7 nanometer technology and delivers breakthrough bandwidth and compute performance. Tesla designed the chip to "seamlessly connect without any glue to each other," and the automaker took advantage of that by connecting 500,000 nodes together. It adds the interface, power, and thermal management, and it results in what it calls a training tile. The result is a 9 PFlops training tile with 36TB per second of bandwight in a less than 1 cubic foot format. But now it still has to form a compute cluster using those training tiles in order to truly build the first Dojo supercomputer. Tesla hasn't put that system together yet, but CEO Elon Musk claimed that it will be operational next year.

Tesla Unveils Dojo Supercomputer: World's New Most Powerful AI Training Machine

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 32 Comments Log In/Create an Account

Comments Filter:

Matrix Dojo. (Score:2)

by Ostracus ( 1354233 ) writes:

A rather impressive piece of hardware [youtu.be].
- Re: (Score:2)
  
  by leptons ( 891340 ) writes:
  
  And it's nothing new - this is practically the exact same concept as the 1980's "transputer" chip.
  - Re: (Score:2)
    
    by Mr0bvious ( 968303 ) writes:
    
    Except, newer and faster.
sure if you ignore TPU, Cerebras and others (Score:3, Informative)

by f00zbll ( 526151 ) writes: on Friday August 20, 2021 @08:30PM (#61713399)

It's cool Tesla is building a huge data center to do ML training, but Google already has TPU v3 running in their data centers. TPU v4 is rolling out now. As usual, ignore the hype and look at the facts. Tesla is building 1 huge data center. Google has data centers around the world and tpu is available in US, asia and EU.

- - Re: (Score:2)
    
    by retchdog ( 1319261 ) writes:
    
    google's alphago was a really big deal in the field and they basically brute-forced the first release to beat everyone else who had to be clever about it. (they later developed a much more efficient training algorithm.) i suspect it paid off in terms of attracting talent and advertising for their cloud/ML services.
    it's easy to say "throwing compute at it isn't the solution," except for when it is. as for wasting energy, well, all competition wastes energy strictly speaking; nonetheless it's what humans do.
I thought Musk was afraid of AI? (Score:1)

by memory_register ( 6248354 ) writes:

But he is building one? I am confused.
- Re: (Score:2)
  
  by ShanghaiBill ( 739463 ) writes:
  
  But he is building one? I am confused.
  Musk is scared of AGI [wikipedia.org].
  A chip for image processing using gradient descent DL is unlikely to lead to AGI.
- Re: (Score:2)
  
  by DontBeAMoran ( 4843879 ) writes:
  
  He did say that they're going to make their androids run slower than humans and also make sure we can over-power them physically.
- Re: (Score:2)
  
  by tomhath ( 637240 ) writes:
  
  Yes. But he loves publicity stunts.
Technical analysis (Score:2)

by guardiangod ( 880192 ) writes:

https://semianalysis.com/tesla... [semianalysis.com]
The article claims 11 cabinets are enough to get 1.1 Exoflops .
- Re: (Score:1)
  
  by dowhileor ( 7796472 ) writes:
  
  So a machine to outsource Machine Learning? Or a Machine Learning Hub? And so much power?
Beyond hype (Score:4, Informative)

by u19925 ( 613350 ) writes: on Friday August 20, 2021 @08:42PM (#61713425)

Let us compare with Fugaku supercomputer:
Fugaku about 100k general purpose chips --> 500 Peta flops actual measurements
Tesla 3k custom AI chips --> 1.1 eflop. But these 8/16 bits. Ratio of 32 bit to 8/16 bit is 1:16. So actual is 70 Peta flops 32bit
This gives close to 4x theoretical perf advantage over a 2 year old chip (introduced in 2020, Tesla is expected to be available next year).
So still a good performance but not as much as hyped.

- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  Since Dojo is designed for a particular algorithm, I would bet its power efficiency absolutely destroys Fugaku for doing what Dojo does.
  - Re: (Score:3)
    
    by u19925 ( 613350 ) writes:
    
    Sure if you math poor.
    3k cpu x 400 w/cpu = 1.2 MW. Its performance is about 1/7th. So at Fugaku speed, it is 8.4 MW and this is just the CPU chips. Fugaku is 30 MW total. Go figure. At most 2x improvement over 2 year old machine. That is not called "absolutely destroys".
- It's FP32 Re:Beyond hype (Score:2)
  
  by ET3D ( 1169851 ) writes:
  
  These are not "8/16 bits". It's exaflop at FP32. (According to Elon Musk's tweet.)
  - Re: (Score:3)
    
    by ET3D ( 1169851 ) writes:
    
    Well, looking at the details, he may have exaggerated with that twit. I'll have to look at the calculation.
  - Re: (Score:2)
    
    by u19925 ( 613350 ) writes:
    
    Either you didn't read the tweet or he put it wrong. The article says 22 TFlops (32 bit) per CPU and there are 3000 cpus. So it is 66 petaflops. Anyway, I did further research on Fugaku. See https://www.fujitsu.com/global... [fujitsu.com] Based on this, it is 6.8 TFlops per cpu (32 bit). So Dojo is about 3 times faster for 32 bit. May be 2 times power efficient. But this is in comparison to 2 year old general purpose 64 bit chip (Tesla chip doesn't even mention it).
    See the image https://electrek.co/wp-content... [electrek.co] It says
Will this prevent . . . (Score:1)

by quonset ( 4839537 ) writes:

his cars from plowing into emergency vehicles [cnn.com] at crash sites?
- Re: (Score:2)
  
  by DontBeAMoran ( 4843879 ) writes:
  
  I think the plan is to understand why his cars are plowing into emergency vehicles at crash sites.
  - Re: (Score:2)
    
    by mrclevesque ( 1413593 ) writes:
    
    "Musk tweeted last month that Tesla's advanced camera-only driver assistance system, known as "Tesla Vision," will soon "capture turn signals, hazards, ambulance/police lights & even hand gestures."
    https://www.reuters.com/busine... [reuters.com]
    - - Re: (Score:2)
        
        by marktoml ( 48712 ) writes:
        
        It was also NOT in general use either. All of the publicity to date has been around idiots with autopilot thinking THAT was FSD...
        
        Re: (Score:2)
        
        by mrclevesque ( 1413593 ) writes:
        
        But call it autopilot, FSD, or driver assist, you are agreeing that Tesla's system was blind to turn signals, hazards, and ambulance/police lights right ?
  - Re: (Score:2)
    
    by feedayeen ( 1322473 ) writes:
    
    The radar has to ignore objects that are stationary and the camera doesn't seem to be contributing to the model of cars. I can take my car down the road and it'll see almost every car next to or in front even at stop lights. It take it down a residential road with cars parked on the side and it completely ignores them. Non-radar targets are fine however so it can correctly map trash cans, traffic cones, and a few other objects that are also stationary.
    https://youtu.be/jQioNtg4oq4?t... [youtu.be]
    Few second clip driving
    - - Re: (Score:2)
        
        by feedayeen ( 1322473 ) writes:
        
        "Non-radar targets are fine however so it can correctly map trash cans, traffic cones, and a few other objects that are also stationary."
        It sees those hazards. It may be intermittent at times but largely any non-car object is detected which causes my confusion about why it doesn't seem to see cars if they're not visible to the radar system.
- Re: (Score:2)
  
  by timeOday ( 582209 ) writes:
  
  If that car has some sort of black box they can learn from, then yes. Perversely, that data is gold because of its value in handling oddball confusing situations like nighttime emergency setups.
Is this approach fundamentally flawed? (Score:2)

by Lake Level ( 6897432 ) writes:

Learning from Billions of miles of travelling car video might not be good enough without some sort of predictive 3d modelling. Let's say that I am driving in a residential neighborhood with cars parkin along the roadside and I see a child chasing a ball, running toward the street. I am making predictive analog calculations in my mind as to whether this kid can possibly run out in front of me from between the cars in the future. There may not be enough incidents like this to learn from in the billions of mil
- Re: (Score:2)
  
  by mbkennel ( 97636 ) writes:
  
  Their new system is doing predictive 3d modeling, unlike their old one. It's taking images and finding embeddings in a physical 3d space; it's rather impressive.
  Solving for that problem is a primary goal of the newer modeling process that they discussed at their "AI day". Their system does now presume and is trained to emphasize object constancy despite intermittent occlusion, and there are planning neural networks which make predictions for self as well as other objects.
  I think there is enough in the da

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Tesla Unveils Dojo Supercomputer: World's New Most Powerful AI Training Machine (electrek.co) 32

Tesla Unveils Dojo Supercomputer: World's New Most Powerful AI Training Machine More Login

Tesla Unveils Dojo Supercomputer: World's New Most Powerful AI Training Machine

Matrix Dojo. (Score:2)

Re: (Score:2)

Re: (Score:2)

sure if you ignore TPU, Cerebras and others (Score:3, Informative)

Re: (Score:2)

I thought Musk was afraid of AI? (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Technical analysis (Score:2)

Re: (Score:1)

Beyond hype (Score:4, Informative)

Re: (Score:2)

Re: (Score:3)

It's FP32 Re:Beyond hype (Score:2)

Re: (Score:3)

Re: (Score:2)

Will this prevent . . . (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Is this approach fundamentally flawed? (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot