Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?
Graphics Software Hardware

Nvidia Physics Engine Almost Complete 179

Nvidia has stated that their translation of Ageia's physics engine to CUDA is almost complete. To showcase the capabilities of the new tech Nvidia ran a particle demonstration similar to Intel's Nehalem demo, at ten times the speed. "While Intel's Nehalem demo had 50,000-60,000 particles and ran at 15-20 fps (without a GPU), the particle demo on a GeForce 9800 card resulted in 300 fps. In the very likely event that Nvidia's next-gen parts (G100: GT100/200) will double their shader units, this number could top 600 fps, meaning that Nehalem at 2.53 GHz is lagging 20-40x behind 2006/2007/2008 high-end GPU hardware. However, you can't ignore the fact that Nehalem in fact can run physics."
This discussion has been archived. No new comments can be posted.

Nvidia Physics Engine Almost Complete

Comments Filter:
  • by DoofusOfDeath ( 636671 ) on Monday April 14, 2008 @12:24PM (#23065158)
    Is a particle motion simulator a abnormally easy test case?

    When I was getting up to speed on IBM Cell programming, IBM had a programmer's tutorial (excellently written, btw). The example problem they used for their chapter(s?) on code tuning were a particle simulator. It was a wonderful example problem, because it showed how to vectorize a program. But then when we went to vectorize our own algorithm, it didn't fit the Cell's vector programming instructions nearly as cleanly, so in the end we didn't get nearly the performance increase due to vector instructions as did the particle simulator.

    So I'm thinking that just even though CUDA can do a good job with particle motion simulations, we shouldn't remotely assume that it's good for particular algorithms for which each of us is responsible.
  • by karbonKid ( 902236 ) on Monday April 14, 2008 @12:26PM (#23065196) [] offers: "NVIDIA Driver for Linux with CUDA Support (169.09)"

    So, physics should work on Linux, having been ported to CUDA already, and CUDA being cross-platform, but the question is if any Linux games will actually support and/or make use of it.
  • by Anonymous Coward on Monday April 14, 2008 @12:39PM (#23065446)
    Here's a quick gotcha: GPU has lots of small computing units/processors inside and can crunch a lot of data in parallel and give you results fast. However, I have yet to see one GPU that can actually do double precision floating point. The Cell processor as well as the current GPU crop (with the exception of ATI) can only do single precision and that is known to reduce the accuracy of your computation.
  • by caerwyn ( 38056 ) on Monday April 14, 2008 @12:46PM (#23065592)
    Particle systems are about as perfectly a parallel problem as it's possible to get- and not only that, the calculations are generally very straightforward with few (if any) general CPU control structures (branches, loops, etc).

    As a result, the massively parallel hardware on a GPU that exists to render many triangles/pixels/shader programs/etc at once can be exploited to calculate updates for many particles at once. The CPU is at an inherent disadvantage because it is only leveraging a small subset of it's full capabilities.

    The CPU is very good at *lots* of different tasks. The GPU is phenomenally good at a very specific set of tasks. Physics just happens to fit within that set.
  • by dreamchaser ( 49529 ) on Monday April 14, 2008 @12:49PM (#23065634) Homepage Journal
    (sarcasm on)Yes, I am sure there are no scientific or enginnering applications for accellerated physics calculations (sarcasm off).

    Anyone ever tell you that a lot of scientific types use Linux on their workstations? They do.
  • by cowscows ( 103644 ) on Monday April 14, 2008 @12:54PM (#23065716) Journal
    In its most basic definition, a GPU is just a CPU that's been highly specialized towards graphics. It can't do anything that a CPU can't also do, it just is able to do particular things much faster. Basically, they take a lot of the common tasks that graphics require of a processor, and they put really efficient ways to do it directly into the hardware, and they put a whole lot of them.

    Think of a CPU as a big tool box. There's something in there for every task, including a screwdriver that can be used to put screws into things. Then next to it, you have a big cordless 18 volt drill (The GPU). Way easier to drive screws with that than the manual screwdriver in your toolbox. But when you need to drive a nail instead, yeah you could bash it in with your screw gun, but it's easier to go back to your toolbox and find a hammer.

    I guess that would make a physics processor the nail gun. Or something...
  • by Mr Z ( 6791 ) on Monday April 14, 2008 @01:30PM (#23066348) Homepage Journal

    Huh? The local physics computations are only being used for presentation and local extrapolation. The server recomputes the relevant physics anyway, and can re-sync everyone periodically. That's how FPSes work to reduce lag--they do local extrapolation, and the server periodically snaps everyone back in line. It sometimes leads to what John Carmack calls "paradoxes" where locally displayed events get undone, but it works.

    So, if some portion of your local physics calculation is purely for local presentation (e.g. game outcome doesn't depend on exactly how dirt particles fly around or boobs bounce, but you want it to look realistic), the server doesn't need to reproduce any of that to model the game correctly. Your screen might look different than someone else's, but in an immaterial way. For the super-soaker example, the server will still compute actual "wetness," possibly with a simplified model that skips computing the goosebumps and most of the dripping water.

  • by ardor ( 673957 ) on Monday April 14, 2008 @01:36PM (#23066440)
    A GPU excels at massively parallel tasks. Stuff like rendering, raytracing (yes! a GPU can raytrace!), DCTs, ... essentially, a GPU is a stream processor that maps a function (a shader) on independent entities (pixels, or vertices).
    Applying a complex formula millions of times, each time with a different set of coefficients? -> GPU.
    GPUs fail at tasks where lots of synchronization or frequent branching is necessary. Stuff like game logic, for instance, or state machines. Your typical office application is a good example.
    So, GPUs are processors that operate in a very limited area. But they excel there. This is a contrast to CPUs, which are a jack of all trades.
  • by tepples ( 727027 ) <tepples@gm a i l . c om> on Monday April 14, 2008 @01:42PM (#23066530) Homepage Journal

    The local physics computations are only being used for presentation and local extrapolation. The server recomputes the relevant physics anyway, and can re-sync everyone periodically.
    Using how much bandwidth per particle?

    For the super-soaker example, the server will still compute actual "wetness," possibly with a simplified model that skips computing the goosebumps and most of the dripping water.
    That's kind of what I meant by "use coarse particles for game logic and finer particles for iCandy."
  • by Space cowboy ( 13680 ) * on Monday April 14, 2008 @01:46PM (#23066600) Journal
    Simple reason:

    On my Mac Pro, I have 8 3GHz CPU's. Consider that the baseline.

    I have 2 8800GT cards to drive the 30" and 2x23" (rotated, one each side of the 30" :-) monitors. That gives me 224 1.5GHz stream processors in the GPU. Even harmonising GHz, thats 8 compared to 112, and its a lot easier to get a parallel algorithm running efficiently on a GPU than on multiple CPU's due to differing hardware designs...

    Now stream processors aren't quite the same as general purpose processors, but the way they're implemented these days, they can be programmed in high-level languages (see jon stokes' article []) and if their architecture suits what you want, they can be very very quick. See here [] for info on programming them...

  • Re:300 fps ! (Score:3, Informative)

    by Z34107 ( 925136 ) on Monday April 14, 2008 @01:54PM (#23066734)

    Wow ! That's more than 4 times faster than the human brain can detect. Now if I only knew why a frame rate this high is needed. Anybody?

    It's not needed. But it's useful for comparison.

    Nehalem only got x frames per second, but nVidia Magic Goodness 9800 Large Numbers GTX got y FPS, where y > x, can show that nVidia MG9600LNGTX > CPU.

    Also, presumably this won't be used to run 50,000 particle games at 300fps, but much more complicated simulations (infinitely destructable environments, not linear algebra) at 60fps.

  • by somersault ( 912633 ) on Monday April 14, 2008 @01:54PM (#23066744) Homepage Journal
    id seem to make all their games for Linux, and they also like using random accessories like that USB flak jacket thing that simulates you getting shot (by quickly puffing up pockets of air with pneumatic compressors). No doubt Quake 5 or whatever will support it at least.
  • Re:So (Score:5, Informative)

    by Lord Ender ( 156273 ) on Monday April 14, 2008 @02:01PM (#23066870) Homepage
    NVDA is a chip maker. They make the best GPUs, period. Their chips are used in high-end PCs, but also in Sony's PS3.
  • by Anonymous Coward on Monday April 14, 2008 @02:09PM (#23066992)
    In a nutshell: A CPU dedicates a very large portion of the die to control logic, and a relatively smaller portion to arithmetic units.

    A GPU has very little control logic, and is jam-packed with arithmetic units. The control logic it does have is also fairly distributed.

    The CPU has the advantage that due to instruction windowing, out-of-order execution, branch prediction, and other niceties, it can fairly efficiently execute many types of code using a fairly simple execution model. It provides things we take for granted, like a unified view of main memory.

    The GPU has the advantage of massive horsepower - but essentially, the GPU can't walk and chew gum at the same time. GPUs are massively parallel SIMD (or SPMD for the newest generation) machines, so they can execute one code path over a lot of data at once. Many things your OS does do not fall under this category of massively parallelizable problems, so your GPU would execute them far slower than your CPU.

    Even in the realm of highly numeric algorithms, there are some that don't translate well to the GPU model - ray tracing is one of them. Ray-tracing parallelizes well, but since GPU are SPMD, they have trouble with code with a high level of branching. There are also bandwidth issues - getting data to all the different arithmetic units on the GPU becomes a very serious bottleneck, especially for problems that lack strong data coherency. Ray-tracing has both these problems, so the end result is that GPU based ray-tracers are at best tied with CPU based ray-tracers for the current generation.
  • by Anonymous Coward on Monday April 14, 2008 @02:19PM (#23067142)
    This is wrong. Actually processing the bits given any sort of operator instruction is pretty easy. Sure, there are some clever ways to implement div or mul etc. but the actual functions they perform is not too spectacular. Vector operations aren't rocket science either.

    Data protection, debugging facilities, virtual memory etc. are more complex, but again they don't make an incredible difference. If they cost an order of magnitude slowdown, then you'd bet you'd see CPUs sold without them for more uses.

    No, the CPU is excellent at processing large and unpredictable control flow (eg. branches), and more importantly unpredictable data access. A modern CPU like core2's ability to extract ILP and keep the memory pipeline full is quite incredible. It does this by prefetching, speculatively executing loads and stores long before dependent branches, other stores, or any other instructions have completed.

    CPU = master of extracting ILP and MLP from serial code. This is the highly trained specialist.

    GPU = brute force approach to executing highly parallel tasks. This is the herd of retards that is trained for some specific repetitive tasks but breaks down when asked to handle more complex situations.
  • by CastrTroy ( 595695 ) on Monday April 14, 2008 @04:07PM (#23068638) Homepage
    You obviously don't remember what it was like when 3D cards first came out. There were many games that were Glide only. Meaning that you had to have a 3DFX graphics card to run them. Back when sound cards first came out, this also happened. You had to have a sound blaster, or Adlib, or Gravis ultrasound, and again only certain cards would work with certain games, I imagine that physics cards could probably work in the same way. Require that you use a certain physics card, or you don't get to play the game. Eventually things will get standardized, and you will get to use any physics card for any game, but I don't think that it's necessary right from the start.
  • Couldn't agree more (Score:2, Informative)

    by Redbaran ( 918344 ) on Monday April 14, 2008 @04:34PM (#23068974)
    I'm using CUDA for my masters project, and I've had the same problem that you describe. The way CUDA works, having a conditional statement that evaluates differently in each thread will kill your performance. It makes sense as to why:
    A GeForce 8800GTX has 16 multiprocessors, which each have 8 processors, so a total of 128 processors. It's your basic SIMD (single instruction, multiple data) architecture. So if you have lots of conditions, you go from having 128 processors, to having 16 as your code serializes because it has turned into MIMD.

    In my particular project, there have been times where I've tried to optimize something with an if statement, but it's not worth it for the reason described above. With that said, it's still much faster then doing computation on the CPU. I think the biggest problem to working with CUDA, at least for me, is that I've never worked with an SIMD architecture like it, so I don't know any really good techniques. I assume they're out there, but I haven't come across anything more then what they say in the CUDA programming guide, which is fairly minimal.

    I think CUDA is good for what it is designed to do, but you won't see a real time raytracer on it anytime soon even though the gflops are there.

Outside of a dog, a book is man's best friend. Inside of a dog, it is too dark to read.