Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
AI Hardware

Tesla Unveils Dojo Supercomputer: World's New Most Powerful AI Training Machine (electrek.co) 32

New submitter Darth Technoid shares a report from Electrek: At its AI Day, Tesla unveiled its Dojo supercomputer technology while flexing its growing in-house chip design talent. The automaker claims to have developed the fastest AI training machine in the world. For years now, Tesla has been teasing the development of a new supercomputer in-house optimized for neural net video training. Tesla is handling an insane amount of video data from its fleet of over 1 million vehicles, which it uses to train its neural nets.

The automaker found itself unsatisfied with current hardware options to train its computer vision neural nets and believed it could do better internally. Over the last two years, CEO Elon Musk has been teasing the development of Tesla's own supercomputer called "Dojo." Last year, he even teased that Tesla's Dojo would have a capacity of over an exaflop, which is one quintillion (1018) floating-point operations per second, or 1,000 petaFLOPS. It could potentially makes Dojo the new most powerful supercomputer in the world.

Ganesh Venkataramanan, Tesla's senior director of Autopilot hardware and the leader of the Dojo project, led the presentation. The engineer started by unveiling Dojo's D1 chip, which is using 7 nanometer technology and delivers breakthrough bandwidth and compute performance. Tesla designed the chip to "seamlessly connect without any glue to each other," and the automaker took advantage of that by connecting 500,000 nodes together. It adds the interface, power, and thermal management, and it results in what it calls a training tile. The result is a 9 PFlops training tile with 36TB per second of bandwight in a less than 1 cubic foot format. But now it still has to form a compute cluster using those training tiles in order to truly build the first Dojo supercomputer. Tesla hasn't put that system together yet, but CEO Elon Musk claimed that it will be operational next year.

This discussion has been archived. No new comments can be posted.

Tesla Unveils Dojo Supercomputer: World's New Most Powerful AI Training Machine

Comments Filter:
  • A rather impressive piece of hardware [youtu.be].

  • by f00zbll ( 526151 ) on Friday August 20, 2021 @07:30PM (#61713399)
    It's cool Tesla is building a huge data center to do ML training, but Google already has TPU v3 running in their data centers. TPU v4 is rolling out now. As usual, ignore the hype and look at the facts. Tesla is building 1 huge data center. Google has data centers around the world and tpu is available in US, asia and EU.
  • But he is building one? I am confused.
  • https://semianalysis.com/tesla... [semianalysis.com]

    The article claims 11 cabinets are enough to get 1.1 Exoflops .

  • Beyond hype (Score:4, Informative)

    by u19925 ( 613350 ) on Friday August 20, 2021 @07:42PM (#61713425)

    Let us compare with Fugaku supercomputer:
    Fugaku about 100k general purpose chips --> 500 Peta flops actual measurements
    Tesla 3k custom AI chips --> 1.1 eflop. But these 8/16 bits. Ratio of 32 bit to 8/16 bit is 1:16. So actual is 70 Peta flops 32bit

    This gives close to 4x theoretical perf advantage over a 2 year old chip (introduced in 2020, Tesla is expected to be available next year).

    So still a good performance but not as much as hyped.

    • Since Dojo is designed for a particular algorithm, I would bet its power efficiency absolutely destroys Fugaku for doing what Dojo does.
      • by u19925 ( 613350 )

        Sure if you math poor.

        3k cpu x 400 w/cpu = 1.2 MW. Its performance is about 1/7th. So at Fugaku speed, it is 8.4 MW and this is just the CPU chips. Fugaku is 30 MW total. Go figure. At most 2x improvement over 2 year old machine. That is not called "absolutely destroys".

    • These are not "8/16 bits". It's exaflop at FP32. (According to Elon Musk's tweet.)

      • by ET3D ( 1169851 )

        Well, looking at the details, he may have exaggerated with that twit. I'll have to look at the calculation.

      • by u19925 ( 613350 )

        Either you didn't read the tweet or he put it wrong. The article says 22 TFlops (32 bit) per CPU and there are 3000 cpus. So it is 66 petaflops. Anyway, I did further research on Fugaku. See https://www.fujitsu.com/global... [fujitsu.com] Based on this, it is 6.8 TFlops per cpu (32 bit). So Dojo is about 3 times faster for 32 bit. May be 2 times power efficient. But this is in comparison to 2 year old general purpose 64 bit chip (Tesla chip doesn't even mention it).

        See the image https://electrek.co/wp-content... [electrek.co] It says

  • his cars from plowing into emergency vehicles [cnn.com] at crash sites?

    • I think the plan is to understand why his cars are plowing into emergency vehicles at crash sites.

      • "Musk tweeted last month that Tesla's advanced camera-only driver assistance system, known as "Tesla Vision," will soon "capture turn signals, hazards, ambulance/police lights & even hand gestures."

        https://www.reuters.com/busine... [reuters.com]

      • The radar has to ignore objects that are stationary and the camera doesn't seem to be contributing to the model of cars. I can take my car down the road and it'll see almost every car next to or in front even at stop lights. It take it down a residential road with cars parked on the side and it completely ignores them. Non-radar targets are fine however so it can correctly map trash cans, traffic cones, and a few other objects that are also stationary.

        https://youtu.be/jQioNtg4oq4?t... [youtu.be]

        Few second clip driving

    • If that car has some sort of black box they can learn from, then yes. Perversely, that data is gold because of its value in handling oddball confusing situations like nighttime emergency setups.
  • Learning from Billions of miles of travelling car video might not be good enough without some sort of predictive 3d modelling. Let's say that I am driving in a residential neighborhood with cars parkin along the roadside and I see a child chasing a ball, running toward the street. I am making predictive analog calculations in my mind as to whether this kid can possibly run out in front of me from between the cars in the future. There may not be enough incidents like this to learn from in the billions of mil
    • by mbkennel ( 97636 )

      Their new system is doing predictive 3d modeling, unlike their old one. It's taking images and finding embeddings in a physical 3d space; it's rather impressive.

      Solving for that problem is a primary goal of the newer modeling process that they discussed at their "AI day". Their system does now presume and is trained to emphasize object constancy despite intermittent occlusion, and there are planning neural networks which make predictions for self as well as other objects.

      I think there is enough in the da

It's currently a problem of access to gigabits through punybaud. -- J. C. R. Licklider

Working...