

Startup Claims Its Upcoming (RISC-V ISA) Zeus GPU is 10X Faster Than Nvidia's RTX 5090 (tomshardware.com) 63
"The number of discrete GPU developers from the U.S. and Western Europe shrank to three companies in 2025," notes Tom's Hardware, "from around 10 in 2000." (Nvidia, AMD, and Intel...)
No company in the recent years — at least outside of China — was bold enough to engage into competition against these three contenders, so the very emergence of Bolt Graphics seems like a breakthrough. However, the major focuses of Bolt's Zeus are high-quality rendering for movie and scientific industries as well as high-performance supercomputer simulations. If Zeus delivers on its promises, it could establish itself as a serious alternative for scientific computing, path tracing, and offline rendering. But without strong software support, it risks struggling against dominant market leaders.
This week the Sunnyvale, California-based startup introduced its Zeus GPU platform designed for gaming, rendering, and supercomputer simulations, according to the article. "The company says that its Zeus GPU not only supports features like upgradeable memory and built-in Ethernet interfaces, but it can also beat Nvidia's GeForce RTX 5090 by around 10 times in path tracing workloads, according to slide published by technology news site ServeTheHome." There is one catch: Zeus can only beat the RTX 5090 GPU in path tracing and FP64 compute workloads. It's not clear how well it will handle traditional rendering techniques, as that was less of a focus. In speaking with Bolt Graphics, the card does support rasterization, but there was less emphasis on that aspect of the GPU, and it may struggle to compete with the best graphics cards when it comes to gaming. And when it comes to data center options like Nvidia's Blackwell B200, it's an entirely different matter.
Unlike GPUs from AMD, Intel, and Nvidia that rely on proprietary instruction set architectures, Bolt's Zeus relies on the open-source RISC-V ISA, according to the published slides. The Zeus core relies on an open-source out-of-order general-purpose RVA23 scalar core mated with FP64 ALUs and the RVV 1.0 (RISC-V Vector Extension Version 1.0) that can handle 8-bit, 16-bit, 32-bit, and 64-bit data types as well as Bolt's additional proprietary extensions designed for acceleration of scientific workloads... Like many processors these days, Zeus relies on a multi-chiplet design... Unlike high-end GPUs that prioritize bandwidth, Bolt is evidently focusing on greater memory size to handle larger datasets for rendering and simulations. Also, built-in 400GbE and 800GbE ports to enable faster data transfer across networked GPUs indicates the data center focus of Zeus.
High-quality rendering, real-time path tracing, and compute are key focus areas for Zeus. As a result, even the entry-level Zeus 1c26-32 offers significantly higher FP64 compute performance than Nvidia's GeForce RTX 5090 — up to 5 TFLOPS vs. 1.6 TFLOPS — and considerably higher path tracing performance: 77 Gigarays vs. 32 Gigarays. Zeus also features a larger on-chip cache than Nvidia's flagship — up to 128MB vs. 96MB — and lower power consumption of 120W vs. 575W, making it more efficient for simulations, path tracing, and offline rendering. However, the RTX 5090 dominates in AI workloads with its 105 FP16 TFLOPS and 1,637 INT8 TFLOPS compared to the 10 FP16 TFLOPS and 614 INT8 TFLOPS offered by a single-chiplet Zeus...
The article emphasizes that Zeus "is only running in simulation right now... Bolt Graphics says that the first developer kits will be available in late 2025, with full production set for late 2026."
Thanks to long-time Slashdot reader arvn for sharing the news.
This week the Sunnyvale, California-based startup introduced its Zeus GPU platform designed for gaming, rendering, and supercomputer simulations, according to the article. "The company says that its Zeus GPU not only supports features like upgradeable memory and built-in Ethernet interfaces, but it can also beat Nvidia's GeForce RTX 5090 by around 10 times in path tracing workloads, according to slide published by technology news site ServeTheHome." There is one catch: Zeus can only beat the RTX 5090 GPU in path tracing and FP64 compute workloads. It's not clear how well it will handle traditional rendering techniques, as that was less of a focus. In speaking with Bolt Graphics, the card does support rasterization, but there was less emphasis on that aspect of the GPU, and it may struggle to compete with the best graphics cards when it comes to gaming. And when it comes to data center options like Nvidia's Blackwell B200, it's an entirely different matter.
Unlike GPUs from AMD, Intel, and Nvidia that rely on proprietary instruction set architectures, Bolt's Zeus relies on the open-source RISC-V ISA, according to the published slides. The Zeus core relies on an open-source out-of-order general-purpose RVA23 scalar core mated with FP64 ALUs and the RVV 1.0 (RISC-V Vector Extension Version 1.0) that can handle 8-bit, 16-bit, 32-bit, and 64-bit data types as well as Bolt's additional proprietary extensions designed for acceleration of scientific workloads... Like many processors these days, Zeus relies on a multi-chiplet design... Unlike high-end GPUs that prioritize bandwidth, Bolt is evidently focusing on greater memory size to handle larger datasets for rendering and simulations. Also, built-in 400GbE and 800GbE ports to enable faster data transfer across networked GPUs indicates the data center focus of Zeus.
High-quality rendering, real-time path tracing, and compute are key focus areas for Zeus. As a result, even the entry-level Zeus 1c26-32 offers significantly higher FP64 compute performance than Nvidia's GeForce RTX 5090 — up to 5 TFLOPS vs. 1.6 TFLOPS — and considerably higher path tracing performance: 77 Gigarays vs. 32 Gigarays. Zeus also features a larger on-chip cache than Nvidia's flagship — up to 128MB vs. 96MB — and lower power consumption of 120W vs. 575W, making it more efficient for simulations, path tracing, and offline rendering. However, the RTX 5090 dominates in AI workloads with its 105 FP16 TFLOPS and 1,637 INT8 TFLOPS compared to the 10 FP16 TFLOPS and 614 INT8 TFLOPS offered by a single-chiplet Zeus...
The article emphasizes that Zeus "is only running in simulation right now... Bolt Graphics says that the first developer kits will be available in late 2025, with full production set for late 2026."
Thanks to long-time Slashdot reader arvn for sharing the news.
About time we can upgrade GPU RAM (Score:5, Interesting)
It has been a long time that we are missing RAM slots on a GPU board.
With Nvidia setting limits that 3rd party OEMs aren't allowed to surpass, the GPU has been fixed and there is not enough competition on the high end. AMD is looking viable now at least for enthusiast gamers.
With x86 licenses being short, the APU market does not have enough competition either.
Re:About time we can upgrade GPU RAM (Score:4, Interesting)
There will never be RAM slots or sockets on a GPU. It's impossible from a signal integrity standpoint.
Re: (Score:2)
At least AMD has an APU that can share most of its possible 128GB with the CPU part.
Unfortunately, as far as I am aware, only Framework has a product announced in that regard.
Re: (Score:2)
Its memory bandwidth and performance are still a far cry from Apple APUs. It's a step in the right direction, but I think they could have made it better than they did.
The only thing it does better than an Apple Silicon part of equivalent loadout (M4 Max, 128GB) is tasks that can only be done on x86.
I really, really, really want an x86 part that competes.
Re: (Score:1)
Apple's memory solutions have an unusually large number (mostly 2, 3, 4 or 8 for plain, Pro, Max and Ultra chips respectably) of high-speed memory channels to provide that much bandwidth. It's only practical because they use something like a package-on-package design. It would take a huge number of circuit traces to do that with replaceable memory, meaning a lot of power loss and board space and probably lower clock rates because of the additional distance.
I would also love to see a good competitor to tha
Re: (Score:2)
Apple's memory solutions have an unusually large number (mostly 2, 3, 4 or 8 for plain, Pro, Max and Ultra chips respectably) of high-speed memory channels to provide that much bandwidth.
Correct.
It's only practical because they use something like a package-on-package design.
Incorrect.
It would take a huge number of circuit traces to do that with replaceable memory
Strix Halo will not have replaceable memory.
meaning a lot of power loss and board space and probably lower clock rates because of the additional distance.
Power loss isn't really a general problem, it's a problem that specific to LPDDR. You can have lots of bandwidth with replaceable memory too, but these days, we're using LPDDR which makes replaceable memory difficult at higher speeds due to the low voltage. But still not relevant- as no Strix Halo part will have replaceable DRAM.
I would also love to see a good competitor to that solution, but I don't expect other companies to have that much integration for desktop or workstation designs. Others do it for cell phones and tablets, but the optimization goals there are very different.
Na. Strix Halo is a direct attempt at competing with Apple Silicon- and it's a a step toward it, but they stopped early
Re: (Score:1)
Power loss isn't really a general problem
It absolutely is a general problem, because physics are real. It takes energy to toggle a bit. It takes more energy to make that transition visible at a longer distance. It takes more energy to do that if there are -- or just can be -- more devices per line. It takes more power to toggle a bit more times in a second. All of those things relate to resistance and capacitance, and the length of a trace is a major driver for both resistance and capacitance.
Strix Halo is nice, but it has less memory bandwid
Re: (Score:2)
It absolutely is a general problem, because physics are real. It takes energy to toggle a bit. It takes more energy to make that transition visible at a longer distance. It takes more energy to do that if there are -- or just can be -- more devices per line. It takes more power to toggle a bit more times in a second. All of those things relate to resistance and capacitance, and the length of a trace is a major driver for both resistance and capacitance.
The implication was not that it does not take power, lol
The implication is that you are wrong that it's somehow a limiting factor.
Strix Halo is nice, but it has less memory bandwidth than Apple's higher end chips and doesn't compete very well on power efficiency.
No shit. That was the point of my post that you replied to.
The top end "AMD Ryzen AI Max+ 395" is comparable to the M4 Pro in most specs but uses a lot more power to get slightly less memory bandwidth
Yup.
in significant part because of the memory layout.
Negative. It's limited by its 256-bit worth of DRAM bus.
They could have made it more if they had wanted.
There's no way for it to meet even M1 Max levels of bandwidth (400 GB/sec) with off-package RAM because, with current memory data rates, that would require twice as many pins on the SoC and twice as many traces on the motherboard.
It's a mobile part- it's not like its socketed.
Yes- you add more pins.
AMD makes a 512-bit x86 APU right now. They just don't sell it to the consumer market.
The whole package would be so much bigger that it wouldn't be cost-competitive.
The package isn't the expensive part. The silicon is.
R
Re: (Score:2)
Re: (Score:2)
At least AMD has an APU that can share most of its possible 128GB with the CPU part.
That's not a positive. The performance of such systems is pathetic compared to what we expect from high-end GPUs.
Re: (Score:3)
There will never be RAM slots or sockets on a GPU. It's impossible from a signal integrity standpoint.
The picture of the graphics card on their website clearly shows two SODIMM slots.
According to their presentation the GPU card has soldiered LPDDR5X memory running at 273GB/s and card slots for DDR5 at 90GB/s. This isn't total bandwidth available to the GPU as a whole but rather the dedicated memory/bandwidth available to each chiplet within the GPU /w high speed interconnects between the various chiplets.
Re: About time we can upgrade GPU RAM (Score:2)
I'm not sure soldiers are the answer here.
Re: About time we can upgrade GPU RAM (Score:1)
Re: (Score:2)
Re: (Score:2)
That sounds suboptimal, given that there's a reason GPUs tend to use GDDR (or HBM) rather than normal DDR.
HBM is currently being driven by AI rather than rendering. I don't think pushing the dial on uniform memory is optimal either from a cost or energy perspective. Certainly makes some things easier and certainly some workloads that benefit but NUMA schemes are more energy efficient and scalable. The metric that matters is bandwidth available to each core.
For comparison, a 5090 has around 1.8TB/s of memory bandwidth. Maybe better for some tasks, but that makes it a sidegrade to the 5090 at best.
What is the 5090? ... 30% faster than a 4090 @ 1TB/s when actually rendering graphics? There are no shortage of applications for which bandwidth is not
Re: (Score:2)
The number of connector points required to achieve this sort of thing (the H100 memory interface is 5120 bits wide - that's 10K pins) are physically impossible for a pluggable connection that's also going to run at 6.4 gigabaud, NOT consume a thousand watts driving the bus lines and NOT be the siz
Re: (Score:2)
Unfortunately this only reinforces GP's claim that achieving the main memory bandwidth required of a "serious" modern GPU (the H100 SXM5 module has 3.5TBps for reference) is not possible outside of soldered HBM.
The high bandwidth shit is primarily for AI. I think the niche here is more likely to be less bandwidth intensive applications than AI. Having said that if bolt cards are cheap and available I wouldn't be surprised to see people pick them up for batch mode inference across large sparse models. This would be a kick ass cost effective option for a lot of people.
The number of connector points required to achieve this sort of thing (the H100 memory interface is 5120 bits wide - that's 10K pins) are physically impossible for a pluggable connection that's also going to run at 6.4 gigabaud, NOT consume a thousand watts driving the bus lines and NOT be the size of two premium server CPU sockets.
This is a false choice. If you look at DDR4, DDR5, DDR6, DDR7 pin counts remain more or less constant and bandwidth doubles each time. Where you
Re: (Score:2)
It's an interesting setup. Slow third tier expandable memory. Is it worth it over just using system memory? Maybe for some things, and this thing seems to be pretty focused on specific problems.
Re: (Score:2)
The picture of the graphics card on their website clearly shows two SODIMM slots.
And yet this is *NOT* a typical gaming GPU. The GP's post stands. You're comparing run of the mill LPDDR5X memory at 273GB/s to the current standard GDDR7 at 1,792GB/s. Expandable RAM is great for some compute scenarios which is what this card is targetted for. But this is objectively not a good thing for a desktop GPU - this is a huge tradeoff in speed for available VRAM. For virtually all desktop / gaming scenarios the amount of VRAM isn't nearly as relevant as the speed of it. ... within reason (the low
Re: (Score:2)
It seems this thing isn't really designed to be a GPU, it's a compute accelerator for tasks that are typically done on a GPU like AI and raytracing.
Really more of a very specialist CPU, with tiered RAM, connectivity like ethernet, and so on.
But is it... (Score:1)
...10x faster at the same price? Anyone can design a better item, but the claim means nothing if it is that much more expensive as well.
Re: (Score:2)
Looking at the specs, a fully kitted-out 1c26-032 with 160GB of RAM would probably cost more than a 5090. Certainly not $20000, but it would be pricey.
Let's see (Score:5, Informative)
this is only in "simulation" so the proof and the TDP numbers will be in the tape-out prototypes..
Re:Let's see (Score:5, Insightful)
Even 10% slower would put them in the running if there is a lower initial capital investment for someone putting together a data center. I don't see a need to exaggerate if your fundamentals are in order.
Re: (Score:2)
If an established company announce a graphics card that is 10x faster, that's news.
Not really. Read the story. It's 10x faster in a very specific scenario. That happens frequently enough. This is not going to replace the GPU in your gaming rig.
Re: (Score:1)
You're off your meds again, I see. I know it's hard, but take them despite how you think they make you "feel." It will help reign in your TDS and get you that much closer to reality. You might want to take an extra dose of the antipsychotics to start, because you aren't a danger to anyone else while you're busy staring at a wall.
Not for gaming (Score:3)
It sounds like they just put a lot of cores on it, enough to handle ten times as much rays in a raytracer. Which can be parallelized up to each pixel on screen.
Unverified claims (Score:5, Informative)
I'll believe it when I see it. I hate it when it a piece of corporate hype finds its way into the slashdot feed.
Re:Unverified claims (Score:4, Informative)
Re: (Score:2)
this "simulation only"-vapor-ware is not worth reporting on at this time
Some of us might be interested in this, even if it's vapor ware, at the moment. The best ideas all start out as vapor ware.
Re: (Score:2)
True, but they're claiming to be ten times faster for some workloads, and ten times slower for others. While the claims are pretty wild, at least they're not claiming they're better at everything.
Re: (Score:2)
And specifically FP64, which is not something the 5090 is especially good at: NVidia claim only a few percent of the FP32 performance, actually somewhat less than a high end threadripper.
They're claiming 3x the FP64 performance of a threadripper, presumably without the general purpose CPU performance of that.
Re: (Score:2)
And specifically FP64, which is not something the 5090 is especially good at: NVidia claim only a few percent of the FP32 performance, actually somewhat less than a high end threadripper.
Nvidia goes out of its way to make sure none the xx90 GPUs come anywhere close to the performance and usability of the much more expensive data center GPUs for compute. This market segmentation is very much intentional.
Re: Unverified claims (Score:2)
Re: (Score:2)
Re: (Score:2)
On the other hand, if they use too large silicon, their yield may be get funny. But if I can think of that, so can they.
Re: (Score:2)
Rookie Numbers! (Score:5, Funny)
My upcoming Custom GPU with integrated AI will be 1,000 times faster than NVidia's puny RTX5090. Better still, it only draws 5 watts!
Right now, I'm a little behind schedule on delivery to market due to some manufacturing delays due to capital issues. If you're an investor, this is your chance to git in on the ground floor of what is certain to be the market and industry disruptor of the century. Don't miss out on this once in a lifetime opportunity. Send me your life saving via BitCoin or Ethereum TODAY!
Can't go tits up!
Re: (Score:3)
Yesterday's news. I recently published a slide that says MY upcoming GPU will be 10,000 times faster than Nvidia - and draw only 3.5 watts!
Re:Rookie Numbers! (Score:5, Funny)
Re: Rookie Numbers! (Score:2)
I was going to do that to.
Re: (Score:2)
Proprietary RISC-V (Score:2)
Sure would be nice to see a fully open source hardware implementation of a GPU with RISC-V so that a SBC could be built without any limits on use from the licensing.
Re: (Score:2)
The thing with RISC-V is that its vector extension had been designed from the start to be useful as a foundation to design a GPU around.
Then there are many ways to do the actual microarchitecture, of course, and any RVV implementation is not necessarily particularly GPU-like.
I don't want to see just open source hardware, I would like to see open source GPU frameworks based around RVV -- and GPU designers embracing such an open source system. More portability between vendors could lead to more competition.
Plenty of GPU opportunities (Score:2)
Re: (Score:2)
It's about FP64 (Score:5, Informative)
This is a contrived comparison. All GeForce cards have terrible FP64 performance. It's rarely used in gaming and they don't bother trying to make it fast. Data center cards like H100 and B200 have much better FP64 performance. They're designed for compute applications where its more important.
So they compared their own GPU designed for compute against an NVIDIA GPU designed for gaming, and found theirs is better at compute. I'm shocked, shocked to hear that! If they're going to test against a GPU designed for gaming, they need to test on gaming benchmarks. If they're going to test on compute benchmarks, they need to compare to a GPU designed for compute.
Re: (Score:2)
I've also found that AMD GPUs have much better FP64 performance than comparable Nvidias (using my own OpenCL code, so not a general benchmark). It's just another instance of Nvidia's optimizations [slashdot.org] in their gaming cards.
As for the better FP64 of data center cards, it's interesting that AI uses very low precision such as 4 or 8 bits. So I wouldn't be surprised to see AI-optimized GPUs that lose the higher precision parts.
Finally, if you can make a chip with a ton of RISC-V cores, why not make a massively
Re: (Score:2)
That was true up through Radeon VII. For any consumer AMD card beyond that, it's no longer true. And Radeon VII was released in 2019.
Proprietary extensions are bad (Score:3)
"Bolt's additional proprietary extensions designed for acceleration of scientific workloads"
When GPGPU in supercomputers was an up and coming thing 20 years ago, the big roadblock to acceptance by the big government funding agencies was the need to rewrite software. The scientific programs were written, debugged, and optimized over many years. Switching to CUDA or any other new thing requires rewriting the code, including debugging and optimizing. Often the time to redevelop the software swamps the actual runtime (e.g., it may take 1-2 years to optimize a complex program to get it to run a few days faster), which wipes out the benefit of the new hardware.
Re: (Score:2)
When GPGPU in supercomputers was an up and coming thing 20 years ago, the big roadblock to acceptance by the big government funding agencies was the need to rewrite software. The scientific programs were written, debugged, and optimized over many years. Switching to CUDA or any other new thing requires rewriting the code, including debugging and optimizing.
Switching to OpenCL would give you a lot more hardware/vendor options than CUDA. OTOH, if this new company can put a lot of RISC-V cores on one die, why don't they just make a massively multicore CPU?
Optimized for one specific metric? (Score:2)
There is one catch: Zeus can only beat the RTX 5090 GPU in path tracing and FP64 compute workloads. It's not clear how well it will handle traditional rendering techniques, as that was less of a focus.
It's relatively easy to optimize one metric or two. But that's not the same as being "10x faster" generally.
Only Up, Never Down (Score:3)
Oh great. A competitor. This means NVIDIA GPUs are going to double in price, right?
So this GPU can run 32-bit PhysX? (Score:3)
Apple (Score:2)
So I've been wondering how good is Apple's neural engine at calculating the transformers compared to Nvidia and is there a chance they could improve training by beefing it up i.e. adding more neural cores? Looking at buying an M4 64GB MBP and having to use cloud for training if needed. Is there a shred of a possibility that Apple could release coprocessor modules that could be added to the motherboard or even connected via USB cables, even if it is a box with Nvidia chips in it to improve their training per