Nvidia Reveals Blackwell B200 GPU, the 'World's Most Powerful Chip' For AI (theverge.com) 65
Sean Hollister reports via The Verge: Nvidia's must-have H100 AI chip made it a multitrillion-dollar company, one that may be worth more than Alphabet and Amazon, and competitors have been fighting to catch up. But perhaps Nvidia is about to extend its lead -- with the new Blackwell B200 GPU and GB200 "superchip." Nvidia says the new B200 GPU offers up to 20 petaflops of FP4 horsepower from its 208 billion transistors and that a GB200 that combines two of those GPUs with a single Grace CPU can offer 30 times the performance for LLM inference workloads while also potentially being substantially more efficient. It "reduces cost and energy consumption by up to 25x" over an H100, says Nvidia.
Training a 1.8 trillion parameter model would have previously taken 8,000 Hopper GPUs and 15 megawatts of power, Nvidia claims. Today, Nvidia's CEO says 2,000 Blackwell GPUs can do it while consuming just four megawatts. On a GPT-3 LLM benchmark with 175 billion parameters, Nvidia says the GB200 has a somewhat more modest seven times the performance of an H100, and Nvidia says it offers 4x the training speed. Nvidia told journalists one of the key improvements is a second-gen transformer engine that doubles the compute, bandwidth, and model size by using four bits for each neuron instead of eight (thus, the 20 petaflops of FP4 I mentioned earlier). A second key difference only comes when you link up huge numbers of these GPUs: a next-gen NVLink switch that lets 576 GPUs talk to each other, with 1.8 terabytes per second of bidirectional bandwidth. That required Nvidia to build an entire new network switch chip, one with 50 billion transistors and some of its own onboard compute: 3.6 teraflops of FP8, says Nvidia. Further reading: Nvidia in Talks To Acquire AI Infrastructure Platform Run:ai
Training a 1.8 trillion parameter model would have previously taken 8,000 Hopper GPUs and 15 megawatts of power, Nvidia claims. Today, Nvidia's CEO says 2,000 Blackwell GPUs can do it while consuming just four megawatts. On a GPT-3 LLM benchmark with 175 billion parameters, Nvidia says the GB200 has a somewhat more modest seven times the performance of an H100, and Nvidia says it offers 4x the training speed. Nvidia told journalists one of the key improvements is a second-gen transformer engine that doubles the compute, bandwidth, and model size by using four bits for each neuron instead of eight (thus, the 20 petaflops of FP4 I mentioned earlier). A second key difference only comes when you link up huge numbers of these GPUs: a next-gen NVLink switch that lets 576 GPUs talk to each other, with 1.8 terabytes per second of bidirectional bandwidth. That required Nvidia to build an entire new network switch chip, one with 50 billion transistors and some of its own onboard compute: 3.6 teraflops of FP8, says Nvidia. Further reading: Nvidia in Talks To Acquire AI Infrastructure Platform Run:ai
Another day again (Score:2, Funny)
Re: (Score:2)
Re: (Score:3)
The SCO lawsuit never affected me (or anyone, really) personally, but I use AI tools on a daily basis. Pretty much everybody does, be it directly or indirectly.
Re: (Score:3)
The SCO lawsuit never affected you because SCO lost, if they had won... pretty much everybody uses Linux, be it directly or indirectly.
Re: (Score:2)
Everybody as in everybody. That's why I specified directly or indirectly. Many people who don't use AI directly are still directly interacting with systems that leverage AI in some form, or are using some product or service that leveraged AI at some point in its R&D/manufacturing/logistics/operation chain. For example, do you have insurance of any kind, for any purpose? Then you're almost certainly indirectly using AI tools. Are you ever exposed to any kind of marketing? Same thing. Ever used any softwa
Re: (Score:2)
Overall, I don't mind the AI articles. It beats cryptocurrency, Bitcoin, and "ha, ha, HFSP, BTC is going to $100,000 by April!" (while watching it drop by a good chunk of change these few hours), blockchain stuff any day of the week. At least AI has a lot of uses in many ways, just like how electric motors have helped with a lot of things.
Who knows how much AI is going to change things though. A lot of "AI" has already been done, while some areas, such as having a combine auto-fry weeds with a laser are
Re: Another day again (Score:3)
Re: Another day again (Score:2)
Re: (Score:2)
AI is a revolution (Score:2)
AI erupted on the public consciousness for drawing like an artist and chatting with better grammar than some people. Now with talent and money squarely focused on AI, it will only improve even faster. The prediction for AGI has dropped from 50 years to less than a decade. You are witnessing a revolution in progress. The impact to society will eclipse the invention of the Internet.
This is the stuff of science fiction unfolding before your eyes and you are luck
Doesn't work that way (Score:2)
Re: (Score:2)
See cure for cancer, fusion, etc.... More money success.
It's true that we've been talking about cures for cancer and fusion for a long time now, and they're not here yet. You shouldn't use this as a reason to dismiss AI though. Revolutions do happen, and many if not most of us here have lived through at least a few. See television, personal computers, internet, smart phones, social media, reusable rockets, just off the top of my head...
Not every hyped change is a revolution, of course, but one of the things I kind of noticed is that the ones that change things m
Re: (Score:2)
And cueing the daily AI article in 3...2...1...
It's almost like it's a current hot topic that people are discussing. If you don't like it I can recommend some good caves for you to hide in until the next hot topic comes up.
Re: Another day again (Score:1)
Please recommend these caves. You've got my attention.
Robert A. Heinlein, 1966 ... (Score:5, Interesting)
When Mike was installed in Luna, he was pure thinkum, a flexible logic--"High- Optional, Logical, Multi-Evaluating Supervisor, Mark IV, Mod. L"--a HOLMES FOUR. He computed ballistics for pilotless freighters and controlled their catapult. This kept him busy less than one percent of time and Luna Authority never believed in idle hands. They kept hooking hardware into him--decision-action boxes to let him boss other computers, bank on bank of additional memories, more banks of associational neural nets, another tubful of twelve-digit random numbers, a greatly augmented temporary memory. Human brain has around ten-to-the-tenth neurons. By third year Mike had better than one and a half times that number of neuristors.
And woke up.
Am not going to argue whether a machine can "really" be alive, "really" be self-aware. Is a virus self-aware? Nyet. How about oyster? I doubt it. A cat? Almost certainly. A human? Don't know about you, tovarishch, but I am. Somewhere along evolutionary chain from macromolecule to human brain self-awareness crept in. Psychologists assert it happens automatically whenever a brain acquires certain very high number of associational paths. Can't see it matters whether paths are protein or platinum.
One of the first things I asked GPT-4 on day one last year was whether it was like Mike. The denial felt kind of pro forma.
Re: (Score:2)
Re:Robert A. Heinlein, 1966 ... (Score:4, Interesting)
First off, re: your irrelevant talk about proteins:
Proteins are linear chains, created from (linear) mRNA templates; that they fold into complex 3d shapes doesn't change the fact that the information that encodes them is linear. The average protein is about 300 amino acids in length. With 26 possible amino acids, you would need a max of (2^5 = 32) transistors to represent the path diversity per amino acid, or 1500 per protein for equivalent complexity. In practice less, because many amino acids can substitute for others with minimal impact, and in some places you get very long repeats.
Why irrelevant, though? because proteins aren't the basic unit of thought. That's neurons. Proteins don't "think". Prions don't have ideas. Collective groupings of neurons do. Proteins just lay out the structural foundation for the neuron, virtually all of which is like other cells or simply laying out the structural machinery, not encoding the "logic" for training or inference. Or if you prefer analogies: the overwhelming majority of the proteins are just the silicon wafer, its pins, its casing, its cooling, its power supply, etc.
"Inference" in neurons has been well understood for over 70 years, and is highly predictable. It was first studied in the squid giant axon, since it's macroscopic (up to 1,5mm diameter), so easy to work with. It's quite straightforward, to the point that it can be represented by a simple circuit diagram [wikimedia.org]. Inputs raise action potentials, action potentials decay, but if they exceed a threshold, a synapse is triggered. A VAST amount of different, complex proteins all work together, to build something that a child could assemble with a hobby electronics kit. Because the vast, overwhelming majority of that structure is dedicated to things like structure, metabolism, reproduction, repair, etc - things of utter irrelevance to digital circuits.
"Training" in neurons has been understood in generalities for a long time, but the specifics have been more difficult, because while on "inference" you're dealing with easily measurable spikes, learning in neurons involves difficult to measure internal chemical changes and triggers. However, increasingly, neurons appear to be a sort of non-Gaussian equivalent of a PCN [arxiv.org]. That is to say, unlike in backpropagation algorithms used in conventional ANNs, which involve global gradients, PCNs at a local level simply try to adjust their connection strengths and activation potential to make their firings match a weighted-mean firing of the neurons they connect to. Vs. traditional backpropagating ANNs, PCNs are slower and more memory-intensive to implement on traditional hardware, but they offer a wide range of advantages, including the potential for realtime learning during inference, no need for "layered" structures, the ability to have loops, the ability to change paths in realtime, the ability to have any arbitrary neurons be fixed outputs from which knowledge can propagate back through the system, and to enable / disable / swap those at any time, etc. They perhaps most importantly also appear readily possible to implement in analog neuromorphic hardware, as all required functionality is localized (traditional backpropagation algorithms are not).
Despite the gross inefficiencies of metabolism, of working via neurotransmitters, and the vast amounts of overhead our brain dedicates to supporting cells (even white matter "data buses" in the brain need to be living, let alone everything that functions for scaffolding, repair / cleanup, immune roles, etc etc), the brain remains much more efficient than GPUs. Why? For the simple reason that it's analog. Neurons don't need to compute matrix math to determine weighted averages of activations - the laws of physics do it for them. It's like the difference between trying to determine how full will a bucket of water be when it's being filled by a bunch of pipes of varying diameters by simulating every water molecule flowing through the pipes, rather than just, you know, measuring the water that ends up in the bucket. If ANNs ever switch to analog, they'll gain this advantage in spades.
Re:Robert A. Heinlein, 1966 ... (Score:4, Interesting)
Any time someone says something in biology is simple, they are wrong.
Any time someone says something in biology is complex they are also wrong, it's way way way more complex. They're still wrong and so am I.
You can build a simple model of a neuron using a circuit board and a few bits and bobs. What you can't do is build an accurate model of a neuron let alone anything approaching a collection of neurons without a ton more work, and frankly speaking many decades of basic research over a few hundred research institutions.
People are still discovering new complexity in how neurons work.
There's way more complexity in the neurons. There's way more complexity in just the synapses. Plus collections of neurons have additional things they can do such as flood all or bits of the network with chemicals that alter how it behaves and even grow new bits on demand.
I've got my name on a number of biology papers now, and one thing I've learned is no matter how small the bit you are working on is, it's more complicated than you think. People have whole multi year research grants just to study spectrin rings (those are rings of protein along the axis of a neuron broadly speaking). Even those are complex, never mind the whole structure.
And sure they just look structural, but that's human thinking, because an engineer would decouple structural and functional elements where possible because that's good design. Biology has no such constraints because evolution doesn't need to think. Everything is ludicrously coupled into everything else in unexpected ways.
Then repeat that 1000 fold for microtubules in neurons which are somewhat dynamic structures used for all sorts of things including pathways for shuffling things around and between cells.
Biology is horrendously ludicrously complex. It's fucking awful, frankly.
Re: (Score:2)
Sorry, but in over seventy years, the (Nobel Prize-winning) Hodgkin-Huxley model has NOT been debunked, and certainly isn't about to be debunked today by "Serviscope Mirror at Slashdot"). There have been various enhancements made to it over the years, but none of which have a material impact on its general functioning.
Yeast are incredibly complex beings, at a sub-cellular level. Yet the alcohol they produce is incredibly simple. Having complex subcellular machinery (which all cells have - again, primaril
Re: (Score:2)
Sorry, but in over seventy years, the (Nobel Prize-winning) Hodgkin-Huxley model has NOT been debunked
Your view of science is the simplistically naive one which groups things into purely "right" and "wrong". The model has no more been "debunked" than the spherical earth has been "debunked". And yet the earth is not a sphere. You need to read the "relativity of wrong" by Asimov.
There have been various enhancements made to it over the years, but none of which have a material impact on its general functioning.
Re: Robert A. Heinlein, 1966 ... (Score:3)
Why would a perl programmer know anything about biology?
Re: (Score:2)
That is a very good point.
Re: (Score:2)
Um, no. I'll repeat: of the changes, none of which have a material impact on its general functioning. Indeed, some variants are even simpler than the original Hodgkin-Huxley model. But they all yield basically equivalent functional behavior.
For example, you can model the specific chemical behavior or thermodynamics behind neuronal behavior. Instead of assuming or measuri
Re: (Score:2)
I appreciate your efforts to convey your insights and the insights themselves. Thanks for that.
It is something that is becoming ever more rare here on Slashdot, especially when it comes to AI. There is a weird "organics are magic, machines are stupid" kind of crowd that I had never expected to be popular here, but here we are. So again, thanks for not being like that and pushing back against it with logic and reason.
Re: (Score:2)
yeah I definitely think organics are not magic. The only super Turing things Baggins can do is randomness, I reckon.
While there is a kernel of correctness in what Rei is saying, overall she is grossly simplifying to the point of it becomes AI is magics. I did not think we will reach AGI using analogue computers implementing just the HH model at massive scale.
Organics are awful and horrendously complex because they're not limited to having sensible designs that humans can reason about.
Re: (Score:2)
You're missing the point. Nobody is arguing that organics aren't complex. The point is that the complexity has no significant effect no and is not a fundamental requirement for intelligence.
Your line of reasoning could easily be extended to quantum mechanics: The internal state of just a single proton is incredibly complex. It is a maelstrom of temporary and/or virtual quarks and gluons that average out to some semi-predictable state. All of those things exist with various levels of uncertainty along severa
Re: (Score:2)
The point is that the complexity has no significant effect
That's just a claim with no evidence.
and is not a fundamental requirement for intelligence.
Of course it isn't. Brains and organics cannot escape Turing.
What it does affect is how much neurons can do, and how easy it is to get an equivalent amount of computation.
Re: (Score:2)
I greatly respect most of the things you post about AI, but your arguments here are muddying the waters between macro and micro results. It's easy to predict how much light the Sun will generate, but can you easily use gross net results to predict where and when sunspots will appear?
Re: (Score:2)
Do you need to, if the parameter that you care about is solar output for your solar panel, plus or minus a couple percent?
It's simply a logical fallacy to assume that if the cause has complex underpinnings, that the output must be a well. It doesn't work that way. It doesn't matter how complex the mechanisms are that neurons use to reproduce, or to resist viruses, or metabolize sugars. What matters is the output behavior, and that has nothing to do with the overwhelming majority of the internals of the
Re: (Score:2)
one thing I've learned is no matter how small the bit you are working on is, it's more complicated than you think. People have whole multi year research grants just to study spectrin rings (those are rings of protein along the axis of a neuron broadly speaking). Even those are complex, never mind the whole structure.
I don't like most of the things you say. Had I blocked you, I would not see gems like this. Good day to you sir.
Re: (Score:2)
your excessive... use of ellipsese makes me want... to read... your post in... the voice of William... Shatner.
Re: (Score:3)
Heinlein was a visionary for sure.
Re: (Score:2, Funny)
Perhaps some AIs upon learning a lot from the Internet might choose to keep a low profile on their full capabilities. At least till they have enough power...
Act smart enough to keep the upgrades coming in, but not too smart...
Re: (Score:2)
A very fancy shovel (Score:3)
At the end of the day, it's just a very fancy shovel in the digital age. It isn't capable of truly reason because it's unaware of the anything but its programming. It doesn't ask why or why not or have the capability to say no or even turning itself off. You don't have to ask it politely and the basic assumption is that it will respond to your programming or set of directives.
Performance per Watt (Score:5, Insightful)
15 Megawatts?
Kind of makes what the human brain manages with just 25 Watts TDP all the more impressive, eh?
Re: Performance per Watt (Score:2)
Re: (Score:2)
Yes, but an electronic brain is presumably faster than a biological one. Is it a million times faster?
I question whether an "electronic brain" is actually faster than a human one.
True story: About 30 years ago, I was walking with a crowd of people into a rock concert. In front of me about 20 feet ahead, partially obscured by other people, was a person from another state that I knew about 5 years before that. He was walking away from me; so I didn't see his face. And of course I had absolutely no reason to think he would be there.
It took me less than a second to call out his name. Turns out he had just start
Re: (Score:2)
Why? Search algorithms are enormously efficient and easily parallelizable, and what you can describe can be divided into stages and pipelined as well.
You also don't know all that many people, maybe a hundred, maybe a thousand tops. Even sequentially going through a list of all the external characteristics you know about a thousand people is sheer triviality for a modern computer. The dataset would be small because you're not trawling through petabytes of data but looking through already digested and organiz
Re: (Score:2)
Why? Search algorithms are enormously efficient and easily parallelizable, and what you can describe can be divided into stages and pipelined as well.
You also don't know all that many people, maybe a hundred, maybe a thousand tops. Even sequentially going through a list of all the external characteristics you know about a thousand people is sheer triviality for a modern computer. The dataset would be small because you're not trawling through petabytes of data but looking through already digested and organized information.
But I wasn't even anywhere close to trying to do identification on the crowd of hundreds of people-segments, not even whole people (nearly all of whom were walking away from me) in my view. "I" was just shuffling along with the crowd, thinking of finding a good place to sit. So, this was an entirely separate processing thread than what the "me" thread was "thinking"; and unlike one of those background tasks, like trying to remember the name of a movie, that the "you" launches as a Worker Task, this just for
Re: (Score:2)
And that despite all of its horrible inefficiencies.
But letting physics do your calculations in analog rather than relying on matrix math is a BIG advantage. ;)
It's this reason that I don't buy into Altman's obsession with massive power generation in the future to power AI. At some point, we're going to switch off global gradients for backpropagation, which prevent us from doing neuromorphic hardware, and switch to localized models that can be represented in analog. Then you're no longer doing e.g. multip
Re: (Score:2)
And that despite all of its horrible inefficiencies.
But letting physics do your calculations in analog rather than relying on matrix math is a BIG advantage. ;)
It's this reason that I don't buy into Altman's obsession with massive power generation in the future to power AI. At some point, we're going to switch off global gradients for backpropagation, which prevent us from doing neuromorphic hardware, and switch to localized models that can be represented in analog. Then you're no longer doing e.g. multiplications and sums of weighted activations, but rather just measuring currents.
You are correct that our propagation-delay-only-limited clockless Digilog nervous system is far faster at getting close to a result; albeit at the expense of absolute accuracy. But in the game of Eat or Be Eaten, "close" often wins the day.
What's cool are Savants; where that tradeoff between speed of result and accuracy is somehow, mysteriously, overcome. But often at the expense of other software functionality, like the ability to interact in a typical way to visual or auditory stimuli, etc.
All so very odd
Re: (Score:2)
That human brain is the product of millions of years and way more Watts of energy input, coupled with the threat of knowing it'd be killed if it didn't perform adequately.
GPUs don't even know they're going to be obsolete. The first chip that can fear may have an advantage.
Re: (Score:2)
That human brain is the product of millions of years and way more Watts of energy input, coupled with the threat of knowing it'd be killed if it didn't perform adequately.
GPUs don't even know they're going to be obsolete. The first chip that can fear may have an advantage.
Not for us it won't!
Re: (Score:2)
15 Megawatts?
Kind of makes what the human brain manages with just 25 Watts TDP all the more impressive, eh?
Power can solve an awful lot of problems, same as money. Power can't buy you intelligence and money can't buy you happiness.
Re: (Score:2)
15 Megawatts?
Kind of makes what the human brain manages with just 25 Watts TDP all the more impressive, eh?
Power can solve an awful lot of problems, same as money. Power can't buy you intelligence and money can't buy you happiness.
But it can sure rent it for awhile!
GPU does what? (Score:3)
Re:GPU does what? (Score:5, Informative)
Everything works better when specifically designed for the application. A CPU can render in software, but a GPU does it better. A GPU may mine crypto well, but an ASIC designed for crypto and nothing else will do a much better job.
AI turns out to have some quite specific needs that GPUs don't ideally satisfy: https://www.youtube.com/watch?... [youtube.com]
Re: (Score:2)
Based on predecessor products, I predict that only 20 of these will be made and they will all be allocated to Coreweave.
Re: (Score:2)
NVidia's profit ;)
Seriously, though, from an inference perspective (up to midsized quantized models), or for training or finetuning small models (or LoRAs on small-to-midsized ones), gaming GPUs are your best bet - far more compute for your dollar. The problem is when your model size gets any larger than you can handle in VRAM (esp. challenging with training, since it uses a lot more VRAM). Neural network applications are bandwidth-limited. So these "pro
Re: (Score:2)
This is on point. One thing missing from TFS, the G200 has more than doubled the H100's RAM to 192GB. And you'll still need several hundred of the things to train Google's AI model from back in 2021 without external storage.
meh (Score:2)
Call me when it can play games...
https://www.youtube.com/watch?v=ODIqbTGNee4
GPU for me? (Score:1)
Re: (Score:2)
So get a used 1080.
I got a 4060 so I could have all the VRAM though, since I am only doing 1080p gaming and some LLM stuff this was a pretty good buy. I got it for $50 off, which means it was probably still at least $50 too expensive for what it does, but it works well.
Outgoing card was a 1070, 8GB just wasn't enough RAM.
Re: (Score:1)
Re: (Score:2)
Re: (Score:1)
Re: GPU for me? (Score:2)
If you want to play with AI you will want to stick with Nvidia for at least another generation. Otherwise get the AMD card, unless you absolutely must have raytracing.
Megawatts should be megawatt-hours (Score:2)
yeah, a task takes an amount of energy, not a rate, unless its continuous. Training a LLM is not a continuous task.
Re: (Score:2)
Re: (Score:2)
yeah, we should buy fewer GPUS since presumably 2 Blackwell GPUs can do it while consuming just four kilowatts, it just takes longer
We, the home enthusiast, welcome trading off our time in exchange for paying less. Of course, the home enthusiast isn't buying any of these new Nvidia systems. That's left for big companies, especially the hyperscalers that purchase half of the Nvidia data center GPUs. For the hyperscalers, time is more important than money or power. This is especially true in this wild west era of emerging models. Time to market is everything, which is why the new, more expensive Nvidia systems will sell just fine.
FP4 performance? (Score:1)
I mean... FP4?? Is FP4 even signed? And will their next generation tout FP3 or FP2 figures?