Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
Cloud Hardware

Alibaba Cloud Says It Cut Nvidia AI GPU Use By 82% With New Pooling System (tomshardware.com) 27

Alibaba Cloud claims its new Aegaeon GPU pooling system cuts Nvidia GPU use by 82%, letting 213 H20 accelerators handle workloads that previously required 1,192. The advancements have been detailed in a paper (PDF) at the 2025 ACM Symposium on Operating Systems (SOSP) in Seoul. Tom's Hardware reports: Unlike training-time breakthroughs that chase model quality or speed, Aegaeon is an inference-time scheduler designed to maximize GPU utilization across many models with bursty or unpredictable demand. Instead of pinning one accelerator to one model, Aegaeon virtualizes GPU access at the token level, allowing it to schedule tiny slices of work across a shared pool. This means one H20 could serve several different models simultaneously, with system-wide "goodput" -- a measure of effective output -- rising by as much as nine times compared to older serverless systems.

The system was tested in production over several months, according to the paper, which lists authors from both Peking University and Alibaba's infrastructure division, including CTO Jingren Zhou. During that window, the number of GPUs needed to support dozens of different LLMs -- ranging in size up to 72 billion parameters -- fell from 1,192 to just 213. While the paper does not break down which models contributed most to the savings, reporting by the South China Morning Post says the tests were conducted using Nvidia's H20, one of the few accelerators still legally available to Chinese buyers under current U.S. export controls.

Alibaba Cloud Says It Cut Nvidia AI GPU Use By 82% With New Pooling System

Comments Filter:
  • by Kokuyo ( 549451 ) on Tuesday October 21, 2025 @05:03AM (#65740152) Journal

    But it's sure good to see for the rest of us.

    • by AleRunner ( 4556245 ) on Tuesday October 21, 2025 @05:56AM (#65740180)

      If you increase the efficiency of use of a resource you also increase the number of use cases that can be addressed. That can easily end up with more of the resource being used. An Nvidia CPU is now 5.5 times as valuable as it would have been before. If Nvidia has any sense they will try to build an open source scheduler like this that anyone can drop into any cloud.

      • by Kokuyo ( 549451 )

        I think that's a big non-sequitur you've produced there.

        • by Smidge204 ( 605297 ) on Tuesday October 21, 2025 @07:58AM (#65740352) Journal

          It's called Jevons Paradox [wikipedia.org]

          In short: the more efficiently you can use a resource, the better the ROI you get for investing in the utilization of that resource, and the more people consume.

          This applies to computing power. Maybe it doesn't make sense in 1974 for a small business to invest in computer workstations for their staff. But by 1994 computers were so much more powerful, so much more capable, and actually cheaper relative to that capability (read: more efficient) that it now makes no sense to NOT invest in the technology for your business.

          If this succeeds in lowering the barrier to entry for leasing AI data center resources, expect demand to go up as more people try to do more things.
          =Smidge=

      • If you increase the efficiency of use of a resource you also increase the number of use cases that can be addressed.

        But those use cases are also less valuable or they would have already been addressed. Its a little like those silly claims by some economists that people who value the dollar they have to lose more than the dollar they stand to win are irrational. In fact, what you lose with the dollar you have is always more valuable than what you get with the dollar you win. That's why you choose it when you don't have another dollar.

        • But those use cases are also less valuable or they would have already been addressed.

          Correct, but just as small businesses give much more value to most countries than large ones do, especially in higher Western democracies, the slightly less valuable smaller use cases normally end up much more valuable in total than the original bigger ones.

      • by AmiMoJo ( 196126 )

        In this case the more likely outcome is that it accelerates the replacement of Nvidia hardware with inferior but cheaper domestic GPUs. Those will continue to improve at pace too.

  • by derplord ( 7203610 ) on Tuesday October 21, 2025 @05:27AM (#65740164)

    I cut GPU usage by 100% by not having anything to do with this useless bubble shit.

    • Re: That's nothing. (Score:5, Interesting)

      by EldoranDark ( 10182303 ) on Tuesday October 21, 2025 @06:13AM (#65740196)
      Which is kinda relevant. Does this better utilisation of hardware mean we still use nearly as much energy? And now in a denser configuration?
      • Fewer GPUs = lower power consumption, for sure, although of course, more of the transistors in those remaining GPUs are active now. Either way, the difference in quantity is so great, there *must* be a power saving here. The cost of keeping them switched on is appreciable, and I assume you can only put a certain number of GPUs on a given motherboard, so you presumably can have less servers running the GPUs as well.

        Either way, this sort of thing can only be a good thing for the world because of reduced consu

        • It sounds like the problem this addresses is that when you have multiple models available for use, lots of cards sit idle. Judging by consumer GPUs, that could be the difference between 15w and 500w. The cynical take is that it's not a way to build fewer GPUs. It's a way to run more power through the existing ones. I don't think we're running out of ideas on where else to cram more ai output.
      • by allo ( 1728082 )

        The amount of compute is the same as before, but fewer cards have to idle to receive a new load, so you can shut down some devices. So it reduces energy use. Clever scheduling also allows for better batching. Batch-Processing GPU loads means something like needing 120% of the load for one job to handle two, with the only downside that job one has to wait until a job two is received (or be processed without batching if there is too much wait time).

  • Instead of pinning one accelerator to one model, Aegaeon virtualizes GPU access at the token level, allowing it to schedule tiny slices of work across a shared pool.

    Does that risk a Row-Hammer-like breach whereby one customer's query can snoop on another's?

  • by marcle ( 1575627 ) on Tuesday October 21, 2025 @09:45AM (#65740614)

    The US hasn't had to be competitive in that way, and China now has more fire in the belly. I for one welcome our new Chinese AI overlords.

    • by cusco ( 717999 ) <brian.bixbyNO@SPAMgmail.com> on Tuesday October 21, 2025 @10:01AM (#65740652)

      Behold the power of the all-purpose diplomatic tool, Sanctions! If you want to make your enemy more self-reliant and independent of you just sanction the shit our of their country and before long they won't need you at all. If you want to promote innovation in your enemy prevent them from buying the tools they need so that they develop their own that are superior to what you sell. The brain trust in Washington DC have managed to come up with a program to promote innovation and self reliance and make the US no longer necessary to the world. It's ingenious!

      Oh, what's that you say? That wasn't really their goal? Really? Seems like the entirely predictable results of implementing tens of thousands of sanctions all over the world, are our "leaders" really that stupid? We need better leaders then.

      • If you want to make your enemy more self-reliant and independent of you just sanction the shit our of their country and before long they won't need you at all.

        Rest assured it is not just America's enemies working to remove America from our lives as much as possible. Your former allies are all doing the same. Quietly, behind closed doors, but the same nonetheless. The US is no longer a trustworthy partner for pretty much anything.

        • by cusco ( 717999 )

          Indeed, the only actual "allies" the US seems to have left are the EU countries, and not even all of them, and then only because the "leadership" is too heavily invested in their overlord to try to break away. In much of Europe the public's opinion of the US is the lowest it's ever been since the advent of methodical surveys, even places traditionally seen as reliable lackeys like England, Germany and Poland. It seems to be the reason for the conservative backlash that's hitting elections throughout the s

        • US politics are far too volatile and US foreign policy is far too unpredictable. Parking your money in US assets seems incredibly risky for foreign investors right now.

          Some one shit the bed, or fighter jet.

  • And thus require better cooling? Granted, there are still structures whose activity will not increase with the 5x greater load per GPU but 5x is big enough that I would expect some marginal or even not so marginal cooling solutions to fail.

Measure with a micrometer. Mark with chalk. Cut with an axe.

Working...