World's Fastest AI Supercomputer Built from 6,159 NVIDIA A100 Tensor Core GPUs (nvidia.com) 57
Slashdot reader 4wdloop shared this report from NVIDIA's blog, joking that maybe this is where all NVIDIA's chips are going:
It will help piece together a 3D map of the universe, probe subatomic interactions for green energy sources and much more. Perlmutter, officially dedicated Thursday at the National Energy Research Scientific Computing Center (NERSC), is a supercomputer that will deliver nearly four exaflops of AI performance for more than 7,000 researchers. That makes Perlmutter the fastest system on the planet on the 16- and 32-bit mixed-precision math AI uses. And that performance doesn't even include a second phase coming later this year to the system based at Lawrence Berkeley National Lab.
More than two dozen applications are getting ready to be among the first to ride the 6,159 NVIDIA A100 Tensor Core GPUs in Perlmutter, the largest A100-powered system in the world. They aim to advance science in astrophysics, climate science and more. In one project, the supercomputer will help assemble the largest 3D map of the visible universe to date. It will process data from the Dark Energy Spectroscopic Instrument (DESI), a kind of cosmic camera that can capture as many as 5,000 galaxies in a single exposure. Researchers need the speed of Perlmutter's GPUs to capture dozens of exposures from one night to know where to point DESI the next night. Preparing a year's worth of the data for publication would take weeks or months on prior systems, but Perlmutter should help them accomplish the task in as little as a few days.
"I'm really happy with the 20x speedups we've gotten on GPUs in our preparatory work," said Rollin Thomas, a data architect at NERSC who's helping researchers get their code ready for Perlmutter. DESI's map aims to shed light on dark energy, the mysterious physics behind the accelerating expansion of the universe.
A similar spirit fuels many projects that will run on NERSC's new supercomputer. For example, work in materials science aims to discover atomic interactions that could point the way to better batteries and biofuels. Traditional supercomputers can barely handle the math required to generate simulations of a few atoms over a few nanoseconds with programs such as Quantum Espresso. But by combining their highly accurate simulations with machine learning, scientists can study more atoms over longer stretches of time. "In the past it was impossible to do fully atomistic simulations of big systems like battery interfaces, but now scientists plan to use Perlmutter to do just that," said Brandon Cook, an applications performance specialist at NERSC who's helping researchers launch such projects. That's where Tensor Cores in the A100 play a unique role. They accelerate both the double-precision floating point math for simulations and the mixed-precision calculations required for deep learning.
More than two dozen applications are getting ready to be among the first to ride the 6,159 NVIDIA A100 Tensor Core GPUs in Perlmutter, the largest A100-powered system in the world. They aim to advance science in astrophysics, climate science and more. In one project, the supercomputer will help assemble the largest 3D map of the visible universe to date. It will process data from the Dark Energy Spectroscopic Instrument (DESI), a kind of cosmic camera that can capture as many as 5,000 galaxies in a single exposure. Researchers need the speed of Perlmutter's GPUs to capture dozens of exposures from one night to know where to point DESI the next night. Preparing a year's worth of the data for publication would take weeks or months on prior systems, but Perlmutter should help them accomplish the task in as little as a few days.
"I'm really happy with the 20x speedups we've gotten on GPUs in our preparatory work," said Rollin Thomas, a data architect at NERSC who's helping researchers get their code ready for Perlmutter. DESI's map aims to shed light on dark energy, the mysterious physics behind the accelerating expansion of the universe.
A similar spirit fuels many projects that will run on NERSC's new supercomputer. For example, work in materials science aims to discover atomic interactions that could point the way to better batteries and biofuels. Traditional supercomputers can barely handle the math required to generate simulations of a few atoms over a few nanoseconds with programs such as Quantum Espresso. But by combining their highly accurate simulations with machine learning, scientists can study more atoms over longer stretches of time. "In the past it was impossible to do fully atomistic simulations of big systems like battery interfaces, but now scientists plan to use Perlmutter to do just that," said Brandon Cook, an applications performance specialist at NERSC who's helping researchers launch such projects. That's where Tensor Cores in the A100 play a unique role. They accelerate both the double-precision floating point math for simulations and the mixed-precision calculations required for deep learning.
Re: Mostly useless (Score:2)
Re: (Score:2)
"AI" cannot do either of these things. The only thing "AI" can do is replicate decisions when there is an ample body of examples, and "AI" always does so with reduced quality.
Re: (Score:2)
i dont think you either map out the universe with ai, or study sub atomic particles 'for green energy' (is that really a thing?).
I suspect they are talking about zero point energy.
Which is strange, because many rednecks in the weird area of Youtube have already discovered zpe and perpetual motion.
Re: (Score:2)
It will help piece together a 3D map of the universe, probe subatomic interactions for green energy sources and much more.
Material science and particle physics. No ZPE required.
Re: (Score:1)
i dont think you either map out the universe with ai, or study sub atomic particles 'for green energy' (is that really a thing?). i may be wrong
OK, think how data is stored for the purposes of google earth.
Now, since it doesn't require any math to do, please plot a vector in 3d space to calculate the position of this new object just detected an hour ago, relative to existing map objects velocities and spatial positions, to predict where it should be in 10,11,12,13, and 14 hours from now.
Oh and we need the results by our next observation window in an hour.
As for the particle physics, they are brute forcing material properties from adjusting another
Re: (Score:1)
Re:Mostly useless (Score:4, Informative)
Ah, read the article. It has lots of details about what this supercomputer is going to be used for. It isn't making politicians feel good about themselves, it is primarily for rapid, daily analysis of the massively detailed astronomy photos that are coming on line, but there are a good handful of secondary users as well who have already queued up.
Re: (Score:1)
So basically useless except to satisfy some specialized curiosity? Figures.
Re:Mostly useless (Score:5, Funny)
Stand back everyone. If he handwaves any harder he'll blow the forum over.
Re: (Score:2)
Sure, some small advances may be made in the misnamed "AI" field, but mostly research there is not a question of computation power and that has been the case for some time. This thing merely serves for some politicians to maintain the illusion they are doing something useful.
This is not entirely true. A PhD student will be able to take on a problem proportional to computing resources available. This will open up some areas of research that would not be tackled otherwise.
Re: (Score:2)
The summery actually does a good job listing all the use cases of which "AI" is a small part. Deep learning is more descriptive of a larger field. [wikipedia.org] Not that "AI" [wikipedia.org] as a larger body can't be done on this machine; after all it's a computer, just that it's strengths favor some problems more than others.
Re:Mostly useless (Score:5, Informative)
NERSC is mostly doing science using supercomputer rather than research in supercomputers. (Though they also do HPC research, often publish in SC.) Machine Learning is pretty good at solving lots of classic science problems. The best protein folding methods we have today are ML based.
Astronomy uses a lot of ML nowadays. We generate more astronomy data per day than we can process in a day. So astronomers use ML to classify which part of the data is potentially interesting against parts of the data that is not. That enables narrowing down the amount of data worth looking at with more expensive methods.
In AI in general, more computational power is useful. Each time we decrease the turnaround time of training a model, it enables a more responsive development cycle. There was a great talk on that at GTC13 by a facebook engineer.
So yeah, we do need that computational power.
Re: (Score:2)
Re: (Score:2)
ok.
but can it give me 120 frames per second when i play call of duty war zone.
otherwise.
so what
In other news (Score:5, Informative)
The real news here is that while Perlmutter was completed mostly on time, Aurora - featuring Intel's Ponte Vecchio accelerators - was not. Perlmutter features AMD EPYC 7003 CPUs:
https://www.amd.com/en/press-r... [amd.com]
Here we have AMD and nVidia rolling out their hardware more-or-less on schedule while Intel continues to struggle to produce anything meaningful in the HPC market. Or the cloud/hyperscalar market. Or really any market other than 4c laptops. Speculation was that Frontier might go online before Aurora, and that's looking to be increasingly likely:
https://www.hpcwire.com/2020/1... [hpcwire.com]
Re: (Score:3)
Actually they're 7003-series CPUs: specifically, the EPYC 7763.
Re: (Score:1)
With maybe a bit of cryptocurrency mining on the side.
WHy (Score:1)
Re: (Score:2)
+15 for management of the others?
Re: (Score:2)
That's 6159 kids with no GPU because all the silicon went into this machine.
Re: (Score:2)
"Kids" :D
Re: (Score:3)
Why 6159 ? What a strange number.
Not really. None of these systems have been built in powers of two for about 20 years. Really people think in terms of cabinets and what you fit in there. The specs of the machine is here:
https://docs.nersc.gov/systems... [nersc.gov]
It is 1536 compute nodes in 12 cabinets with 4 GPUs each. and one GPU in each of the 15 login nodes.
How? (Score:2)
Re: How? (Score:2)
Your question seems to lack specificity. It's as if you are asking how data is shared. Maybe it's just me but the HPC answer to this is pretty straightforward.
State is shared as infrequently as possible by finding problems that can be easily divided into smaller sets of work with many computations. I didn't RTFA but systems with unique shared memory designs are generally less common. Since these kinds of problems require state to be distributed to individual machines rarely, the network might not even n
Re: (Score:2)
Re: (Score:2)
The fact that you put quotes around computer means you're aware that the idea of multiple computers acting as a single computer is a subjective judgement. Is it not enough to say they're working on a common problem and the GPUs don't require their working sets of data to be input and/or collected manually? Folding @ home was commonly described as a computer because it was a bunch of computers independently working on the same problem, which was coordinated/managed from one place. These are pretty abstract c
Re: (Score:2)
Re: (Score:2)
You don't. It's a typical cluster, including the GPUs. If you want to use more than one GPU at a time you need to write your code to do so.
This works pretty well for deep learning because training involves repeatedly showing a bunch of examples and computing gradients. You can run that in parallel and just average the gradients with only a slight loss in efficiency.
Re: (Score:2)
Re: (Score:3)
Essentially yes.
It is similar to what you would build at home by connecting a few machines with GPUs with an ethernet cable. Now they are using "fancy" ethernet that enable GPU to GPU communications even if they are not in the same compute node. (They call it RDMA.)
You program a system like this very similarly to how you would program any distributed application. Usually MPI or Hadoop Spark.
Actually, most multi GPU applications even if they are sitting in the same machine are programmed in a distributed way
Re: (Score:2)
Typically you have a few to a few dozen cores plus several GPUs in a node, all of which are connected by a very fast bus, like a single "computer." Then you usually connect the nodes with a fast interconnect. It looks like Perlmutter uses Cray Slingshot for that, which looks sort of like ethernet, but 200 Gb/s, 1.2 billion packets/s, and with switches that can do 12.8 Tb/s.
But yes, all of the top supercomputers are clusters.
Re: How? (Score:2)
One thing I think others haven't mentioned that might better fill in the gap is the existence of a Job Manager. The Supercomputer is effectively a bunch of machines networked together. They mentioned in the summary and expansion still planned which sounded to be a different site, so what we call a supercomputer is really a bunch of networked machines you can leverage together to solve a problem. Those machines are generally a single location but I don't think it's a requirement. The machines are leveraged
Re: (Score:2)
The same way we linked machines together back in the nineties, by carving a job up into smaller pieces and handing those pieces off to nodes. Back then it was predominantly DQS, a nifty system that would send your job to nodes with the necessary keywords to support them automatically. Today I have no idea what software is used specifically, though DQS still works :)
Re: (Score:2)
Kubernetes calls them labels. But you jobs are containers and are distributed according to how you've labeled the nodes in your cluster. Standing on the shoulders of giants ...
Re: (Score:2)
RMDA (remote direct memory access) plays a big part [nvidia.com] of their current HPC lineup. Basically one GPU can access another GPU's memory over a switched fabric. For GPUs that are close by, such as in the same system or at least the same rack, there is NVLink [nvidia.com], which would have a much lower latency than an RDMA.
The hard part of course is in the software making good decisions on where to keep data so that it is not costly to fetch when it is needed.
named after Saul Permutter (Score:3)
From the article:
Dark energy was largely discovered through the 2011 Nobel Prize-winning work of Saul Perlmutter, a still-active astrophysicist at Berkeley Lab who will help dedicate the new supercomputer named for him.
Re: named after Saul Permutter (Score:2)
I had to look it up (Score:2)
I was going to write the same thing, as I had to look it up - I had found it quite odd they would make a supercomputer to run Perl.
Re: (Score:1)
I used to be affiliated with the Lab and took a lot of data for the "follow-on" research mentioned in the press release, collaborating with Saul and Rollin and a few dozen other people, but I'm used to living people getting things like this named after them, so I scrambled to Google, fearing he had died. Glad to hear he is indeed still alive, and got to kick off the first compute job.
Sounds perfect... (Score:2)
The AI is alive (Score:2)
It mills Bitcoin on the side for its dark purposes.
The next NERSC announcement: (Score:1)
but can it play doom? (Score:1)
it might be better used by a kid to play fortnight.
A Beuwolf Cluster (Score:2)
That explains the gpu shortage? (Score:1)
So that's where my god damn GPU stock went....
Who wants to be the person to explain ... (Score:3)