Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Hardware Technology

Nvidia Upgrades Processor as Rivals Challenge Its AI Dominance (bloomberg.com) 39

Nvidia, the world's most valuable chipmaker, is updating its H100 artificial intelligence processor, adding more capabilities to a product that has fueled its dominance in the AI computing market. From a report: The new model, called the H200, will get the ability to use high-bandwidth memory, or HBM3e, allowing it to better cope with the large data sets needed for developing and implementing AI, Nvidia said Monday. Amazon's AWS, Alphabet's Google Cloud and Oracle's Cloud Infrastructure have all committed to using the new chip starting next year.

The current version of the Nvidia processor -- known as an AI accelerator -- is already in famously high demand. It's a prized commodity among technology heavyweights like Larry Ellison and Elon Musk, who boast about their ability to get their hands on the chip. But the product is facing more competition: AMD is bringing its rival MI300 chip to market in the fourth quarter, and Intel claims that its Gaudi 2 model is faster than the H100. With the new product, Nvidia is trying to keep up with the size of data sets used to create AI models and services, it said. Adding the enhanced memory capability will make the H200 much faster at bombarding software with data -- a process that trains AI to perform tasks such as recognizing images and speech.

This discussion has been archived. No new comments can be posted.

Nvidia Upgrades Processor as Rivals Challenge Its AI Dominance

Comments Filter:
  • .... for us mere mortals? :P

    The last series "upgrade" on consumer cards was actually a downgrade for AI people - the 4000x series didn't increase VRAM from 24GB, but to the contrary, ditched NVLink support.

    • Guess the only thing left now is AMD.

      • by Rei ( 128717 )

        For AMD, we need better software support. They can't always be lagging years behind.

        NVidia recognized the importance of developing software stacks long ago, and it paid off in spades. AMD always seems to have viewed it as "someone else's problem".

        If I were to buy an AMD card today for AI tasks, I'd know that my life is going to be headaches. It's not worth it.

        • AMD outsourced it to the public, which led to AMD drivers for Linux actually being quite usable compared to the open source nVidia versions (which suck donkey balls).

          • It's true, Nouveau isn't great.
            However, the closed source NV drivers are far above the open-source AMD drivers in terms of quality.
            It's unfortunate, but anyone who uses both will agree.
            • Not in terms of auditability.

              Even AMD cards have firmware, but common nvidia card firmware blobs are 62mb+. They're encrypted, signed, and cards will only run firmwares signed by nvidia. Reverse engineering is not practical. Nouveau was effectively dead when this was done. Plus side, hard for china to reverse too.

              A great vector for state actors who can instruct nvidia to make/sign a compromised firmware and gag them from saying anything though. When your video card runs it's own OS you can't control that's

              • When your video card runs it's own OS you can't control that's pretty crazy.

                You just described every single GPU in production- from AMD, Intel, and even Apple Silicon's iGPU block.
                Crazy it may be, but what it is, is normal.
                GPUs are far too complex to be setup with simple registers. They have CPUs on them that run operating systems that handle the state of the GPU resources, and communications with the other bus masters.

                • Would you contend that AMD cards are less or equally as auditable as nvidia?

                  If so, this is a rabbit hole.

                  There are shades between completely open and completely black box.

                  • I would say equally.
                    They both require firmware blobs to be uploaded to them that are signed and non-modifiable.
                    I don't know if they're encrypted- but that's because I have no fucking idea what the instruction set for the CPUs are. Which makes sense, since they don't document it.
                    • The amd gpu updated firmwares for my current card is approximately 700kb, the largest single chunk being 350kb, notably the card still works perfectly fine without them, they are just updates. The firmware for a single nvidia gsp card is 62mb+.

                      What the firmwares do is worlds apart, a lot of amdgpu drivers functionality is in kernel, where it arguably should be. Firmwares being mostly for initialization.

                      Nvidia keep a lot more functionality in their firmware. The card itself acting a lot more "system on chip"

                    • And they are signed, but a correction on my behalf, they are not encrypted (earlier reporting said they were, just tore up some files).

                      The guts of the nvidia firmware is a 34mb risc-v ELF binary.

                    • The amd gpu updated firmwares for my current card is approximately 700kb, the largest single chunk being 350kb, notably the card still works perfectly fine without them, they are just updates. The firmware for a single nvidia gsp card is 62mb+.

                      So?
                      Are you trying to imply that the multiple discrete ME and MEC blocks on an AMD are somehow more "something" than the GSP on an NVidia, or the little Arm processor on an Apple GPU, due to the file size sent to them?

                      What the firmwares do is worlds apart, a lot of amdgpu drivers functionality is in kernel, where it arguably should be. Firmwares being mostly for initialization.

                      lol- that is not an assertion you can make based on the file sizes.
                      And further, firmwares aren't "mostly for initialization". The CE and MECs on an AMD are CPUs that run code, and must run that code constantly to make the GPU, well, GPU.
                      Does it seem to require less code than the GSP on an NV

                    • 34MB now, from 62MB?
                      Earlier reports?

                      It sounds to me like you read some shit and got very confused.
                      The 62MB number you got is from the headline regarding NV just pushing 62MB of binary blobs to linux-firmware.git.
                      That is for all of their cards, and is very similar to amdgpu's load in that .git as well (57MB)
                      These are the firmwares for the various command controller CPU blocks within each GPU (which really are little more than microcode, they're not full operating systems, but they are code that runs on
                    • 34MB now, from 62MB

                      That's one component, the GSP firmware specifically, there is more than one component

                      The 34MB you're thinking of is for the GSP, which is optional (and in fact, not supported by nouveau anyway)

                      The reason for it being added was to add support in nouveau. It has to be a fixed version of it because the interface isn't fixed and changes constantly, version 515 is what they've decided to support.

                      Basically, it loads the little RISC-V in the NV with a full operating system that runs the driver for you, and just presents a very simplified command queue to the main CPU.

                      Yes, you're arguing what I'm saying, full OS and it's own full system.

                      That is for all of their cards, and is very similar to amdgpu's load in that .git as well (57MB)

                      have you looked at the amdgpu firmware directory? separate files for each card, around 700kb or less for each card. No giant full os system.

                      Does it seem to require less code than the GSP on an NV? Sure does. Is there something that can be implied from that? Na, not even a little.

                      So just prior yo

                    • Forgot this part

                      If you dislike this feature, you don't use it.

                      This is the 'cannot change gpu clock and runs like ass' version of nouveau.

                      If you want to be able to clock the gpu to normal performance speeds, you need to use the nvidia giant firmware. While it will technically run without, this is not useful to people and why people won't run nouveau if they can avoid it. This is why GSP support is being sought out so nouveau can run at useful speeds.

                      Will it fix out of the box nvidia performance when it's integrated? yep. At the cost of running a full bl

                    • That's one component, the GSP firmware specifically, there is more than one component

                      Uh, ya... One component of exactly 2 cards, and an optional one at that. No individual card uses 62MB, and only 2 cards have the option of using the 20MB, and 32MB component.
                      If you go ahead and clone linux-firmware.git, you'll see that the average binary blob load for any card, not including the 2 that support GSP, or those 2 with GSP disabled, is actually larger for any AMD card.
                      We can use my RTX2060 laptop GPU as the nvidia example:

                      *@primary:~/linux-firmware/nvidia/tu106$ du --max-depth=1 -h
                      112K ./gr
                      116K .

                      And my old HP with a Vega:

                      *@primary:~/linux-firmware/amdgpu$ du --max-depth=1 -h
                      948K ./vegam

                      What is this?! 817% more bytes of firmware?!?!

                    • If you dislike this feature, you don't use it.

                      and be unable to reclock the gpu, rendering the video card relatively useless, as it is now with nouveau when using without GSP support.

                      When not using the "optional" thing it becomes a paperweight, that isn't so optional is it?

                      Further, as has been widely published, AMD appears to be looking for RISC-V engineers with a combined expertise in GPUs, indicating they plan to follow NV's lead in the offload-to-the-GPU-CPU.

                      Hopefully they at least document and provide buildable from scratch toolchains if they do. Their gpu documentation might have been a complete cluster when dumped by being so detailed with register listings, but better that then nothing at all like nvidia.

                      Your argument is complete and total fucking bullshit, and you know it.

                      Saying GSP is "optional" when

                    • and be unable to reclock the gpu, rendering the video card relatively useless, as it is now with nouveau when using without GSP support.

                      lol, moving the goalposts is one thing, but you just launched them into fucking orbit. Classy.
                      I thought our problem was auditing?
                      Did we drop that since we have confirmed that AMD has on-average ~800% more binary blobbage per card?

                      Hopefully they at least document and provide buildable from scratch toolchains if they do. Their gpu documentation might have been a complete cluster when dumped by being so detailed with register listings, but better that then nothing at all like nvidia.

                      Yes, you'll be able to find them in the same place as their PSP toolchains </sarcasm>
                      AMD's GPU documentation is still a flaming pile of shit. amdgpu only exists because 99% of the code was supplied by AMD.

                      Saying GSP is "optional" when not using it guts performance because the card can't be reclocked is pretty bullshit too imho.

                      You've brushed upon the problem above.
                      Nouveau is a desperate (and p

                    • Your solution to a trust problem is to taint the kernel with the proprietary driver?

                      That's a complete non-starter from the get go.

                      If a completely open source kernel driver were fully functioning without GSP, that would be far less of an issue.

                      If that's something you have and aren't sharing to the world, please go ahead.

                    • Your solution to a trust problem is to taint the kernel with the proprietary driver?

                      Of course it isn't.
                      But you're moving the goalposts, again.

                      That's a complete non-starter from the get go.

                      Well, judging from your original arguments, all non-auditable binary blobs are problematic.
                      So using your argument, we can assert that a non-GSP NV card using nouveau is actually the least problematic.
                      Do you agree?

                      If a completely open source kernel driver were fully functioning without GSP, that would be far less of an issue.

                      Well, it does, except for on a small subset of cards.
                      There are lots of cards that amdgpu doesn't support at all (I happen to own one). They're old, but they exist.
                      At least for 4xxx cards, nouveau support is mostly functional without GSP

                    • There are lots of cards that amdgpu doesn't support at all (I happen to own one). They're old, but they exist.

                      If it's that old you probably want the radeon driver rather than amdgpu, unless you're talking 90's old.

                      Of course I don't. Again, you're trying to draw an AMD shaped line in the sand.

                      Tell that to the intel, arm mali, and every other in mainline drm driver. Nouveau would be fine if it were performant, which is no fault of their own really. The last cards it decently supports are from seven years ago. Nvidia can fix this easily if they wanted to, but they don't. When the vendor gives no shits, why support them?

                      Even some mobile gpus are becoming open/mainline now, which is nice for long

                    • If it's that old you probably want the radeon driver rather than amdgpu, unless you're talking 90's old.

                      Correct ;)

                      Tell that to the intel, arm mali, and every other in mainline drm driver. Nouveau would be fine if it were performant, which is no fault of their own really. The last cards it decently supports are from seven years ago. Nvidia can fix this easily if they wanted to, but they don't. When the vendor gives no shits, why support them?

                      Of course they can. Again with the goalposts. You have morphed your original argument against binary blobs to attacking the closed source nvidia kernel driver.
                      Mali, like Nouveau still requires binary blobs uploaded to the cards.
                      As does amdgpu.
                      Of all of them, amdgpu requires the largest binary blobs.
                      By your original argument, amdgpu is the worst.

                      And hey, at least the closed-source kernel driver and GSP are auditable since their CPU architectures are known, unlike the binary blobs that get p

                    • You have morphed your original argument against binary blobs to attacking the closed source nvidia kernel driver.

                      Since you mentioned using it in order to avoid having to use GSP.

                      Of all of them, amdgpu requires the largest binary blobs.

                      more than the 34mb gsp blob? not at all

                      If nouveau can run at a decent speed without GSP I'd be fairly happy. That is not the case.

                    • I think we're done here though, there is no path forward from saying a 700kb firmware is larger than a 34mb full mini os load.

                      good day.

                    • Since you mentioned using it in order to avoid having to use GSP.

                      Well yes, because we should compare the open source to open source drivers, of course.

                      more than the 34mb gsp blob? not at all

                      Optional, and only on 1 single family of cards.

                      If nouveau can run at a decent speed without GSP I'd be fairly happy. That is not the case.

                      It can, except on one single family of cards. That will change, of course.

                    • Not a very clever dodge.
                      The 34MB OS load is a tangential discussion item vs your initial complaint, which is non-auditable binary blobs.
                      GSP exists on ~1% of all extant NV GPUs.
                      So, say I concede that GSP sucks. Back we go to your original beef.
                      Non-auditable binary blobs.
                      In this instance, the 700kb of firmware for amdgpu cards grossly outweighs the ~100-150kb firmware used on 99% NV cards.
          • by ceoyoyo ( 59147 )

            The GP isn't talking about drivers.

        • Given your complaint was about the loss of NVLink, I'd think you'd have pointed out that as the major shortcoming of AMD devices in an equivalent situation, as there is no analogue.
          • by Rei ( 128717 )

            That'd be acceptable if they'd offer more VRAM in consumer cards than NVIDIA does, but they don't do that either :

      • Given that AMD has always lacked the high-speed interconnect for its cards, I don't really see how this solves the gripe.
      • Guess the only thing left now is AMD.

        At present AMD is quite far behind in capabilities. Though this could be seen as NVIDIA resting on their laurels and maybe AMD will actually catch up.

    • by Tablizer ( 95088 )

      The industry's message is "go cloud or go away"

    • That was the plan. Can’t let the common rabble use cheap ass video cards to do the work of gold plated AI devices.
    • by Njovich ( 553857 )

      You can get 8x H100 for only 400k. You are a low 6 digit /. user. Your stock options alone must be worth at least 8 digits.

If all else fails, lower your standards.

Working...