Nvidia Physics Engine Almost Complete 179
Nvidia has stated that their translation of Ageia's physics engine to CUDA is almost complete. To showcase the capabilities of the new tech Nvidia ran a particle demonstration similar to Intel's Nehalem demo, at ten times the speed. "While Intel's Nehalem demo had 50,000-60,000 particles and ran at 15-20 fps (without a GPU), the particle demo on a GeForce 9800 card resulted in 300 fps. In the very likely event that Nvidia's next-gen parts (G100: GT100/200) will double their shader units, this number could top 600 fps, meaning that Nehalem at 2.53 GHz is lagging 20-40x behind 2006/2007/2008 high-end GPU hardware. However, you can't ignore the fact that Nehalem in fact can run physics."
Is particle motion a fair test case? (Score:5, Informative)
When I was getting up to speed on IBM Cell programming, IBM had a programmer's tutorial (excellently written, btw). The example problem they used for their chapter(s?) on code tuning were a particle simulator. It was a wonderful example problem, because it showed how to vectorize a program. But then when we went to vectorize our own algorithm, it didn't fit the Cell's vector programming instructions nearly as cleanly, so in the end we didn't get nearly the performance increase due to vector instructions as did the particle simulator.
So I'm thinking that just even though CUDA can do a good job with particle motion simulations, we shouldn't remotely assume that it's good for particular algorithms for which each of us is responsible.
Re:I couldn't find anything specific - will nVidia (Score:5, Informative)
So, physics should work on Linux, having been ported to CUDA already, and CUDA being cross-platform, but the question is if any Linux games will actually support and/or make use of it.
Re:Experts please explain something (Score:1, Informative)
Re:Experts please explain something (Score:5, Informative)
As a result, the massively parallel hardware on a GPU that exists to render many triangles/pixels/shader programs/etc at once can be exploited to calculate updates for many particles at once. The CPU is at an inherent disadvantage because it is only leveraging a small subset of it's full capabilities.
The CPU is very good at *lots* of different tasks. The GPU is phenomenally good at a very specific set of tasks. Physics just happens to fit within that set.
Re:I couldn't find anything specific - will nVidia (Score:5, Informative)
Anyone ever tell you that a lot of scientific types use Linux on their workstations? They do.
Re:Experts please explain something (Score:5, Informative)
Think of a CPU as a big tool box. There's something in there for every task, including a screwdriver that can be used to put screws into things. Then next to it, you have a big cordless 18 volt drill (The GPU). Way easier to drive screws with that than the manual screwdriver in your toolbox. But when you need to drive a nail instead, yeah you could bash it in with your screw gun, but it's easier to go back to your toolbox and find a hammer.
I guess that would make a physics processor the nail gun. Or something...
Re:Sync between machines with different cards? (Score:5, Informative)
Huh? The local physics computations are only being used for presentation and local extrapolation. The server recomputes the relevant physics anyway, and can re-sync everyone periodically. That's how FPSes work to reduce lag--they do local extrapolation, and the server periodically snaps everyone back in line. It sometimes leads to what John Carmack calls "paradoxes" where locally displayed events get undone, but it works.
So, if some portion of your local physics calculation is purely for local presentation (e.g. game outcome doesn't depend on exactly how dirt particles fly around or boobs bounce, but you want it to look realistic), the server doesn't need to reproduce any of that to model the game correctly. Your screen might look different than someone else's, but in an immaterial way. For the super-soaker example, the server will still compute actual "wetness," possibly with a simplified model that skips computing the goosebumps and most of the dripping water.
--JoeRe:Experts please explain something (Score:3, Informative)
Applying a complex formula millions of times, each time with a different set of coefficients? -> GPU.
GPUs fail at tasks where lots of synchronization or frequent branching is necessary. Stuff like game logic, for instance, or state machines. Your typical office application is a good example.
So, GPUs are processors that operate in a very limited area. But they excel there. This is a contrast to CPUs, which are a jack of all trades.
Using how much bandwidth? (Score:3, Informative)
Re:Experts please explain something (Score:3, Informative)
On my Mac Pro, I have 8 3GHz CPU's. Consider that the baseline.
I have 2 8800GT cards to drive the 30" and 2x23" (rotated, one each side of the 30"
Now stream processors aren't quite the same as general purpose processors, but the way they're implemented these days, they can be programmed in high-level languages (see jon stokes' article [arstechnica.com]) and if their architecture suits what you want, they can be very very quick. See here [arstechnica.com] for info on programming them...
Simon.
Re:300 fps ! (Score:3, Informative)
Wow ! That's more than 4 times faster than the human brain can detect. Now if I only knew why a frame rate this high is needed. Anybody?
It's not needed. But it's useful for comparison.
Nehalem only got x frames per second, but nVidia Magic Goodness 9800 Large Numbers GTX got y FPS, where y > x, can show that nVidia MG9600LNGTX > CPU.
Also, presumably this won't be used to run 50,000 particle games at 300fps, but much more complicated simulations (infinitely destructable environments, not linear algebra) at 60fps.
Re:I couldn't find anything specific - will nVidia (Score:5, Informative)
Re:So (Score:5, Informative)
Re:Experts please explain something (Score:1, Informative)
A GPU has very little control logic, and is jam-packed with arithmetic units. The control logic it does have is also fairly distributed.
The CPU has the advantage that due to instruction windowing, out-of-order execution, branch prediction, and other niceties, it can fairly efficiently execute many types of code using a fairly simple execution model. It provides things we take for granted, like a unified view of main memory.
The GPU has the advantage of massive horsepower - but essentially, the GPU can't walk and chew gum at the same time. GPUs are massively parallel SIMD (or SPMD for the newest generation) machines, so they can execute one code path over a lot of data at once. Many things your OS does do not fall under this category of massively parallelizable problems, so your GPU would execute them far slower than your CPU.
Even in the realm of highly numeric algorithms, there are some that don't translate well to the GPU model - ray tracing is one of them. Ray-tracing parallelizes well, but since GPU are SPMD, they have trouble with code with a high level of branching. There are also bandwidth issues - getting data to all the different arithmetic units on the GPU becomes a very serious bottleneck, especially for problems that lack strong data coherency. Ray-tracing has both these problems, so the end result is that GPU based ray-tracers are at best tied with CPU based ray-tracers for the current generation.
Re:Experts please explain something (Score:1, Informative)
Data protection, debugging facilities, virtual memory etc. are more complex, but again they don't make an incredible difference. If they cost an order of magnitude slowdown, then you'd bet you'd see CPUs sold without them for more uses.
No, the CPU is excellent at processing large and unpredictable control flow (eg. branches), and more importantly unpredictable data access. A modern CPU like core2's ability to extract ILP and keep the memory pipeline full is quite incredible. It does this by prefetching, speculatively executing loads and stores long before dependent branches, other stores, or any other instructions have completed.
CPU = master of extracting ILP and MLP from serial code. This is the highly trained specialist.
GPU = brute force approach to executing highly parallel tasks. This is the herd of retards that is trained for some specific repetitive tasks but breaks down when asked to handle more complex situations.
Re:It's just particles (Score:3, Informative)
Couldn't agree more (Score:2, Informative)
A GeForce 8800GTX has 16 multiprocessors, which each have 8 processors, so a total of 128 processors. It's your basic SIMD (single instruction, multiple data) architecture. So if you have lots of conditions, you go from having 128 processors, to having 16 as your code serializes because it has turned into MIMD.
In my particular project, there have been times where I've tried to optimize something with an if statement, but it's not worth it for the reason described above. With that said, it's still much faster then doing computation on the CPU. I think the biggest problem to working with CUDA, at least for me, is that I've never worked with an SIMD architecture like it, so I don't know any really good techniques. I assume they're out there, but I haven't come across anything more then what they say in the CUDA programming guide, which is fairly minimal.
I think CUDA is good for what it is designed to do, but you won't see a real time raytracer on it anytime soon even though the gflops are there.