Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Hardware

AGP Texture Download Problem Revealed 268

EconolineCrush writes "The latest high-end graphics cards are capable of rendering games at 1600x1200 in 32-bit color at jaw-dropping frame rates, but that might be all they're good for. For all their gaming prowess, all of these cards have horrific AGP download speeds that realize only 1/100th of their theoretical peak. This article lays it all out, testing video cards from ATI, Matrox, and NVIDIA, and clearly illustrates just how bad the problem is. While these cards have no problems rendering images to your screen, you're out of luck if you want to capture those images with any kind of reasonable frame rate via the AGP bus."
This discussion has been archived. No new comments can be posted.

AGP Texture Download Problem Revealed

Comments Filter:
  • by Yarn ( 75 ) on Monday August 19, 2002 @08:27AM (#4097018) Homepage
    I'd certainly expect the AGP bus to be used asymmetrically, how often do you want to do high speed data capture with a card that's primarily output?

    The only situation I can see where you'd want more than PCI bandwidth returning would be for uncompressed HDTV capture, and there are better ways to do that (grab the raw broadcast stream for example)
    • nitpicking, AGP is not a bus. It's Accelerated Graphics Port. See article at anand for more info.
    • by Mike Connell ( 81274 ) on Monday August 19, 2002 @08:42AM (#4097106) Homepage
      There are actually some good reasons to be able to do this apart from just taking screenshots. I did (sad but true) these tests over 4 years ago finishing grad school, and the results (read back speed is very bad) were much the same.

      Two reasons for wanting to grab the framebuffer (or parts of it) are for

      a) texture imposters (realtime adaptive billboarding) and
      b) split world/image-space occlusion culling.

      With faster readback, both these techniques would probably be used more in "normal" software (ie games).
      0.02
      • Yes, but... (Score:5, Informative)

        by Anonymous Coward on Monday August 19, 2002 @09:31AM (#4097419)
        a) texture imposters (realtime adaptive billboarding)

        That's what render-to-texture is for, you don't need to read data back to the CPU.

        b) split world/image-space occlusion culling.

        This wouldn't be too useful for realtime graphics anyways, because of the way the 3D graphics pipeline works. The CPU can already be processing data a few frames ahead of what the GPU is currently working on. If you read back data from the card every frame, you have to wait for the GPU to finish rendering the current frame before you can start work on the next one.

        • Re:Yes, but... (Score:5, Informative)

          by Mike Connell ( 81274 ) on Monday August 19, 2002 @10:50AM (#4097983) Homepage
          That's what render-to-texture is for, you don't need to read data back to the CPU.

          That is true for simple versions, but with methods moving towards image based rendering you often have to pull the data back anyway. Then you can process the textures to produce better imposters - not necessarily just billboards

          Re: occlusion culling. People are using these methods today for realtime graphics (for example combinations of Greens HZB, or HOMs) even with the low readback speed. UNC's Gigawalk software is one published example (Google for it). Getting Z or alpha channel infomation back is the biggest hit, so these methods would be even more efficient and so more widley applicable with faster transfers. When you're rendering N million triangles per frame (UNC quote 82Million) you have to do this stuff to get realtime rendering.

          So it is used for realtime graphics today - although mainly for heavy duty applications not games.

          HTH
        • Re:Yes, but... (Score:3, Informative)

          by Mike Connell ( 81274 )
          Oops, forgot to point out one more thing too: HP and NVidia have both implemented opengl extensions to address the issue of getting Z occlusion information back (nvidia's is layered on top of the HP extension iirc). This isn't useful for reading back the framebuffer fast, but helps when doing realtime occlusion culling.
    • I ran into something like this on an application I'm writing... when I first made an MPEG recording of my 3D output, there were no textures. About 3 seconds and one forehead-slap later I realized that the video card's memory (where the rendering happens) isn't on the CPU's bus (where the recording happens).

      It seems the lesson here is that proper captures from video RAM are slow. Yeah, it'd be nice to change that. But how many people really care? Given how long it took anyone to notice, I can't help but think that very very few people really care - and with good reason. Unless you're into making rendered movies, it's irrelevant.

  • Software issue? (Score:5, Informative)

    by larien ( 5608 ) on Monday August 19, 2002 @08:30AM (#4097031) Homepage Journal
    From the article, the author reckons this is a software (driver) issue rather than a hardware issue. I also note the test rig ran Windows, but how does linux shape up? Is it better or worse?

    In any event, there's another issue he doesn't really touch upon; while he mentions that a single frame at 1600x1200@32bit colour is 7.5MB, he ignores the fact that a 30fps movie would require (30*7.5)=225MB per second uncompressed; you either have to have that much disk bandwidth or have enough CPU grunt to compress that on the fly. I guess a dedicated MPEG encoder card could help, but your average box is going to have trouble keeping up with on-screen gibs, rocket trails and blood splatters and encoding video.

    • If it were just a driver bug or even a design tradeoff, why is it that all GPUs from any manufacturer are uniformly abysmal? Even an SGI 320 with its UMA design still only gives 18.9 MB/s readback speeds, to my tests.

      I asked nVidia at SIGGRAPH why image readback is so slow. They said, no motherboard they know of (not even their own) supports AGP Writes back to the system memory. Without that, you're limited to PCI bandwidth at best, far less than what the AGP spec allows.

      However, we're not even seeing that. Results are showing 1% of what is possible. It's certainly a hardware issue, but there may be a lot of room to improve from the software side, too.

  • nobody asked! (Score:2, Interesting)

    by tanveer1979 ( 530624 )
    "However, no manufacturer has presently made this aspect of driver performance a priority."
    Why should they, was anybody complaining till now. The well wont come to horse, the horse has to go to the well to drink water.
    So unless a large number of people want it nobody wants to mess around with a perfectly working driver.
    And it is not a piece of cake. Recording its own rendrings the software way would be a bitch, the best way would be to provide an access point on the bus itself, though it would play havoc with the board timings and noise issues.
    In the end it will call come down to .. Will it justify the cost
  • Imagine That (Score:5, Insightful)

    by mosch ( 204 ) on Monday August 19, 2002 @08:31AM (#4097038) Homepage
    Wow, what a surprise. Video cards being built on ultra-thin margins are only being designed for the use that 99.99% of the population wants to use them for. You'd think with their huge 4% and 5% profits they'd add in lots of features that only a very few people want, just in case!

    In summary, who the fuck cares?

    • Re:Imagine That (Score:4, Insightful)

      by epine ( 68316 ) on Monday August 19, 2002 @09:17AM (#4097295)

      This is exactly the attitude that creates endless headaches mapping good concepts onto workable implementations, and results in systems becoming so convoluted by the time they work properly they are nearly impossible to maintain.

      The principle of least surprise dictates that random orders of magnitude should not be sacrificed in your fundamental primitives.

      It seems to me that if I spend $300 on my CPU and $600 on my GPU that I might want to be able fetch back what the GPU creates. What kind of idiot puts their most powerful processor at the end of a one way street?

      There are endless reasons that could come up why this feature might need to be exploited. Just because you can't come up with them doesn't mean they don't exist. You are talking about 99.9 percent of your own creativity, which I assure you is a far sight less that the sum total of the creativity out there looking for cool new things to do.

      It does make sense to consider cost/benefit here. The first observation here is that we are talking about a baseline primitive (texture returned to system memory), and that we are looking to recover a rough factor of ten, not a rough factor of 10 percent.

      In the video card industry, things are designed to hit the 90 percent point. These days the GPU industry rivals the CPU industry in dollar value. I simply can't believe the graphics card companies can't afford to have someone sit down and crank this up to 50% bus utilization. I suspect they could do this without even scratching their head.

      I've had to use many primitives over the years designed by this guy or his second cousin. If he only knew how much of the pain he experiences as a computer user is the result of good people bending over backwards to deal with unsuspected, arbitrary constraints when they could have been polishing the product interface instead. But some people have no imagination for these things.

      • Perhaps... (Score:5, Insightful)

        by ColGraff ( 454761 ) <maron1&mindspring,com> on Monday August 19, 2002 @10:01AM (#4097642) Homepage Journal
        "What kind of idiot puts their most powerful processor at the end of a one way street?"

        Maybe they're the kind of idiots who know most people just want the best possible OUTPUT for gaming possible, and so don't want to add any overhead in card performance - or even additional design time - that isn't related to gaming performance. You know, the idiots who make cards that get award after award from gaming companies, then write near-perfect drivers, port those drivers to linux, and let you overclock the card to your heart's content. Those sort of idiots. My, they're idiotic.

        Nobody says, "buy a geforce 4 ti, make the next toy story." No, it's advertised as a gaming card, and that's what its designed to do. If you want to do high-end video rendering things, perhaps a gaming card isn't the best choice.

        • Yeah, but you ever run Half-life on a cluster of boxes with Oxygen Wildcat cards? Damn.....
        • by gspeare ( 470147 ) <geoff@sh[ ]tt.com ['alo' in gap]> on Monday August 19, 2002 @12:04PM (#4098547) Journal
          Hey, I just realized that my high-end printing device has absolutely no hardware provision for reverse-direction printing! If I want to take the high quality document I just printed and put it back into electronic form, I have to spend hundreds of dollars* for a completely separate "scanning" device! What a ripoff!

          Really, as soon as the market for this sort of capture starts to grow, someone will have a hardware solution. The first ones will be cheesy: a connecter into a separate PCI capture card, for example; but eventually a more reasonable method will become standard design.

          To me, this is just the free market in action, working (more or less) as it should be.

          * I know how much scanners cost. Think hyperbole. :)

        • If you want to do high-end video rendering things, perhaps a gaming card isn't the best choice.

          Why is it that a much more expensive Quadro card gives equally slow results? I've run a very similar test on an SGI 320 (shared-memory design) and it only gives 18.9 MB/s.

          Anyone reading this with a Wildcat 6000-series? What does that bench at?

        • What kind of idiot puts their most powerful processor at the end of a one way street?

          the kind of idiots who know most people just want the best possible OUTPUT for gaming, and so don't want to add any overhead in card performance - or even additional design time - that isn't related to gaming performance. You know, the idiots who make cards that get award after award from gaming companies, then write near-perfect drivers,


          here it comes...

          port those drivers to linux ...

          Bingo!

          The only problem is in the driver. Hardware's up to the job.

          The driver has been ported to Linux.

          So fix it!

          Closed source? Reverse engineer it.

    • Very few people use their typical desktop video cards for actual video production or anything related to it because the hardware up until now was simply unable to handle that sort of load. Now we have these cards that are the beginning of a new era of computer-generated visuals. The article is saying that they can do quite a bit more than they can do now if someone would just write some better drivers for them.

      Now, streaming real-time rendering images over the internet? Maybe not fullscreen stuff right now because of a multitude of hampering factors on affordable internet bandwidth which I won't name for clarity's sake, but for the limiting factor to be the internet itself and not the graphics card is still a significant step.

      This would definately be very beneficial to low-budget game developers and movie directors. We could very well see the return of the shareware boom (remember the early-mid 90's?) because of this.

      sure, only a small portion of the people who'd buy the cards would use these features that the article talks about, but they'd be people that didn't have that capability before. Whenever this happens in any medium/artform/what-have-you, there is the tendency for a lot of experimental stuff to appear. I think we have some very interesting times ahead of us if someone gets these drivers written.
  • It's not the cards (Score:5, Insightful)

    by tmark ( 230091 ) on Monday August 19, 2002 @08:32AM (#4097045)
    all of these cards have horrific AGP download speeds that realize only 1/100th of their theoretical peak...you're out of luck if you want to capture those images with any kind of reasonable frame rate via the AGP bus."

    As the quoted article clearly indicates, the problem lies with the drivers and not with the cards, the latter which the original poster intimates.

    And the underlying reason is immediately understandable: after years of AGP cards and years of noone really complaining raising this issue - (except, now, developers of video-editing software who could benefit) - it seems clear that there isn't much demand for this kind of performance. In the (near ?) future there might be, but why should these companies spend money working on driver performance in areas like this when really customers only care about how well Quake will run ?

    When people are willing to pay for these features is when companies will pay to build the requisite drivers. And that is how it should be.
    • When people are willing to pay for these features is when companies will pay to build the requisite drivers. And that is how it should be.

      Alternately, they could publish full specs for their cards and provide the drivers as open source, and the few people who need the different features now could write them or have them written. This code could be contributed back to the card manufacturers and integrated in future driver releases, resulting in the feature being available for everyone. For example, ATI apparently didn't see enough market demand to provide 3d-accelerated Linux drivers for the Radeon 8500, but The Weather Channel did [linuxhardware.org], and now we'll all benefit.

      Obviously this is a bit idealistic, but hey, we're talking about how it should be here. As I started writing this, no one has made a good answer on the "what about under Linux" question, but honestly (and despite the way that that seems like a reflexive slashdot response), that's the real solution to this "problem".
    • by zenyu ( 248067 ) on Monday August 19, 2002 @08:49AM (#4097152)
      I had to switch an application from a screaming PC to a chunky old SGI we now use for a stool because of this problem. We eventually found an expensive graphics card that could keep up. I think it was called Wildcat something or other. We were getting free Quatro 3's at the time which we really wanted to use, but they just had horrible memory read rates. The nVidia guy told us it was an unoptimized path, using software with no hardware support or something. Like maybe they were reading a pixel at a time or something.
  • But why? (Score:3, Interesting)

    by AAAWalrus ( 586930 ) on Monday August 19, 2002 @08:32AM (#4097046)
    The article presents that once the images are rendered out to the display, they are simply discarded. Sure, for any sort of video capture or whatnot, that sucks. However, the article does not attempt to answer why video card manufacturers do this, or if there are any cards that do take advantage of the AGPx4 bandwidth. My guess is cost. If all AGP video cards provided video feedback into the bus, you're probably looking at a non-consumer level product. And you know what? All I do IS use my GeForce to play video games. If dumping the frames after they are rendered keep the cost of my card down, I'm probably happier for it. Quite simply: Does this matter for the average consumer?
  • Huh... (Score:4, Interesting)

    by Viking Coder ( 102287 ) on Monday August 19, 2002 @08:35AM (#4097064)
    If I'm reading this article right, they're claiming that it also hinders normal screen captures.

    That would mean that software like VNC would have much higher performance, if the drivers were updated, the way these guys are demanding. (Wouldn't it?)

    That'd be fantastic!
    • Um.. No.

      The slowest card reads back at 8.376 MB/s OR 67.008 Mb/s OR about 2/3 the bandwidth available on a 10/100 network.

      Network performance is the primary limitation to streaming frames.

      The best cards would stream at 13.283 MB/s OR 106.264 Mb/s exeeding the speed of 10/100 and only able to push 8 streams on perfect Gigabit ethernet. Unfortunately, Gigabit ethernet is not nearly as fast as advertised, ranging from as low as 280 Mb/s for generics, to as high as 860 Mb/s for 3Com's best.
  • by seldolivaw ( 179178 ) <me AT seldo DOT com> on Monday August 19, 2002 @08:42AM (#4097113) Homepage
    I know nothing about anything, obviously, but I can see that game designers might think it nice to be able to send stuff to your screen but for you to be unable to send it to storage somewhere.

    This *is meant to be* a dumb question. Mod me down if I'm wrong; it's only Karma.
  • by JackAsh ( 80274 ) on Monday August 19, 2002 @08:46AM (#4097133)
    A couple of salient points come to mind when reading this article:

    1) Recording games/presentations/etc. The reason why we don't do it is because if the system was capable of generating it real time in the first place, it's far less space intensive to record the parameters of the animation than the output. i.e. It's cheaper to say "Daemia fires rocket at these coordinates" than record an MPEG of said rocket shot. AND, as hardware gets better, your recording does too.

    Which leads me to point 2:

    2) Since it's cheaper to capture realtime animation by capturing parameters, the only use of the capture function would be NON-realtime applications - i.e. getting your Geforce5TiUltraPro to render an extremely complex scene with incredible realism at 1 fps. That's not a typo. If we have 10MB/s back-into-the-PC bandwidth and each super high resolution shot takes 10MB on average, we have a wonderful solution working at 1 fps. Spend the fill rates on 600 passes for each pixel or something like that. Imagine the quality of the scenes! Capture the damn things and be glad you're not rendering at 1 frame per hour like they were 5 years ago.

    Repeat after me - if you're rendering for posterity you don't need real time... That'll come eventually.

    -JackAsh
  • DMCA (Score:2, Troll)

    by Vandilzer ( 122962 )
    You think the **AA would ever allow this the ability to make a perfect digital copy of what ever is displayed on you screen. Now your monitor will have to be disabled every time a copyrighted work is displayed on your screen.
  • From the article:
    Right now, even the very latest graphics cards aren't ready to do much more than play games and put pretty pictures onscreen. If graphics companies really want to replace CPUs for professional rendering, they've got a bit more work to do.

    A stunning example of stating the obvious.

    The hardcore 3D gamer market is small enough; I can't see manufacturers busting their humps to serve an even smaller one.

    • Actually they do, but they charge much higher prices. 3DLabs is the best known x86 3D rendering cards, NVIDIA and ATI have some offerings as well under the Quadro and Fire brands, also SGI, SUN, HP, and IBM all sell their own proprietary cards for several thousand as well (for their respective platforms). I think the author of the article wants to purchase video game cards for the few hundred bucks they cost, do a driver update, and have something competitive with the much more expensive professional cards.
  • by grahamtriggs ( 572707 ) on Monday August 19, 2002 @08:50AM (#4097156)
    ...that I have ever read. Either that, or I am missing something here... The idea that graphics subsytems have 'bandwidth to burn' is kind of ironic, given that every graphics chip is ultimately held back in performance by the amount of bandwidth available to it - especially when using high quality options like anti-aliasing. The main focus of the article is actually a very niche segment... the idea of transeferring back rendered images over the AGP bus for TV / film / etc. is a joke... Rendering at high quality takes a huge amount of bandwidth (ie. textures and geometry)... as someone else pointed out, transferring back high-res images would take up over 200MB - that's a quarter of your AGP bandwidth! And without taking into account contention and timing issues in uploading/downloading that would mean that you simple couldn't realise the full potential of the bandwidth without a lot of other (expensive?) hardware... The simple fact is that for production uses, you would be *far* better off taking a stream of data from the DVI connector, and storing that for later use... Screen capture for business use is a reasonable point - however when does that require 3d rendering to be taking place? There should be no contention and no reason why the AGP bus couldn't be utilised fully - although would the graphics companies make enough out of this to justify the effort? As for internet streaming - how many people have access to bandwidth fast enough for high quality, full screen video streaming? Enough said...
    • "the idea of transferring back rendered images over the AGP bus for TV / film / etc. is a joke..."

      Why? You don't seem to follow up this opinion with any facts to back yourself up. Being able to do things like Interactive Multi-Pass Programmable Shading [nec.com] means that you can achieve near-PRman levels of graphics quality, using standard graphics hardware. But, of course, you need to capture that back to main memory for it to be any use. That hardly seems worthy of your ridicule.

      "as someone else pointed out, transferring back high-res images would take up over 200MB - that's a quarter of your AGP bandwidth!"

      Who are you to decide what's a good use case, and what's a bad one? This sounds to me like a case where several different people have presented reasonable requests for features - and you're shooting them down because you think what they want to do is "a joke". Since this can be fixed with a software update, I think it's a pretty reasonable request.

      "you simple couldn't realise the full potential of the bandwidth without a lot of other (expensive?) hardware..."

      Why on earth do you make that claim? Could you back that up with some facts? The article is claiming that it's a software issue, only. In fact, the test they put together sounds like a very reasonable one - they're not coming anywhere NEAR using the bandwidth in creating the images, and still, they're getting horrible bandwidth, downloading them. That doesn't sound like contention and timing - that simply sounds like bad, bad drivers.

      "you would be *far* better off taking a stream of data from the DVI connector"

      So, now, to solve the bandwidth issue, you're going to add a second card to the motherboard. What magical, ethereal bus bandwidth will this second card use? I think you need to re-examine your argument on this point.

      "However when does that require 3d rendering to be taking place?"

      This isn't just talking about 3d rendering. This is all screen capturing.

      "There should be no contention and no reason why the AGP bus couldn't be utilised fully"

      Wait a minute - now you're switching your argument?

      "would the graphics companies make enough out of this to justify the effort?"

      As everyone keeps saying, this sounds like it can be fixed in software. That's a pretty negligible cost for the vendors to spend.

      "As for internet streaming - how many people have access to bandwidth fast enough for high quality, full screen video streaming?"

      What about intranet? Lots of companies have intranet bandwidth fast enough for what you're talking about.

      Enough said...
    • Just to illustrate the point you made about it taking 200 MB to send the images back --

      1600x1200x32bit = 7,680,000 bytes / image

      24fps means 184,320,000 bytes / second back down the AGP bus -- and that's if you only want 24 fps. That's a lot of bytes moving around, especially when you have to be sending data back up to render future frames.

      Maybe you could do some sort of hardware compression, but as other people have mentioned, video cards are already large enough, make too much heat, use too much power, and are expensive enough that I don't want to be adding additional complexity and cost to them for what a few people want to do. If there are people who want this, they should pay for the R&D and production costs of these specialized chips.
  • Do the sums (Score:2, Insightful)

    When you record video it is normally compressed by hardware or a DSP. They are compressed for a damn good reason.

    Uncompressed, say just 1600x1200x24bit is about 6Mb per frame. At say 70 frames/sec is about 420Mb a second to store to disk.

    So what exactly are you going to do with that much data? If you had 512Mb of ram you could hold 1 seconds worth.
    Forget a hard disk, even a 3 disk raid doesn't have that sustained IO rate.

    • i agree with you completely. BUT i think about things; using firewire/ieee1394, you can do essentially raid/striping of sorts. current firewire has a theroritical peak of 400 mb/s; next gen firewire should see 800 mb/s...

      oh wait. that's megabits. we're talking megaBYTES. fuxor. sounds like we've got a decade or so before we have consumer-level storage options at this level. crazy.

      btw, if i had mod points currently, i'd mod you up.
  • by popoutman ( 189497 ) on Monday August 19, 2002 @09:05AM (#4097233) Journal
    Did anyone spot that the ability to capture the framebuffer would provide a very easy path for the capture of video, e.g. dvd playback, or streamed video?
    No worries about macrovision, badly controlled overlays, or screwey playback software.
  • How about the obvious for video production... since going out isn't a problem... why not just hook up a recording device (could be digital media) to the video out port of the video card.

    Does this really have to be over-engineered?

  • by eeeeaagh ( 591431 ) on Monday August 19, 2002 @09:24AM (#4097358)
    We just ran into this problem when implementing a ray tracer [uiuc.edu] using the GPU that will be presented soon at the upcoming Graphics Hardware Workshop [graphicshardware.org].

    Our ray intersection algorithm implemented on the GPU (an "old" Radeon 8500) was able to intersect 114M rays per second. This was loads faster than the best CPU implementation, which could handle between 20 and 40 intersections.

    But when we tried to implement a ray tracer based on this, and an efficient one that didn't intersect every ray with every triangle, the readback rate killed us. Our execution times slowed down to the low end of the fastest CPU implementations.

    And the readback delay seems to be completely due to the drivers, which apparently still use the old PCI-bus code. If the drivers could use the full potential of the AGP bus, our ray tracer could approach twice the speed of the best CPU ray tracers.

  • If the drivers are truely the only issue and not the hardware, wouldn't this be a great opportunity for the XF86 guys and whoever writes the particular tdfx modules to optimize Linux first.

    "No Mr. Vallenti sir you don't understand we have to use Linux. It's the only game out there for our CG budget. Windows can't do RAM write back with decent FPSes, and commodity GPU's are 20 times cheaper..."

    Wouldn't that suck for them... at least it would be amusing.

  • And I tend to agree that its a software issue.

    NVIDIA says that if you ask for contents of the framebuffer in a call to glReadPixels and you ask for it in the same pixel units its stored in, you won't be really disappointed. If, however, you ask for that same region of the framebuffer in another format, you're screwed. (So, if your framebuffer is 8-8-8-8 RGBA, and you ask for luminance or 10-10-10-2 or something else odd, you aren't going to be pleased with the performance.)

    This isn't by the way, just a render-movies-on-your-PC issue. Lots of scientific computing, visualization, etc., applications render with OpenGL and then grab the framebuffer to store a result. This throughput issue is significant considering that for many applications, what was an enormous data set 10 years ago is now not such a big data set. Like another poster said, this issue is one of the ones that still ties people to SGI.

    While 99% of your other concerns might be dealt with, there are still lingering problems like this one that keep some people from moving to commodity hardware.
    • It is possible to handle anything-to-anything untiling and format conversion in the CPU at high enough speed for this not to be the bottleneck. I've written code to do it, and I'm sure the guys at nVidia could do it as well if they wanted to (which is not to say that they are).

      My suspicion is that the raw bit-shovelling across the bus is more likely to be the problem.
  • by cyranose ( 522976 ) on Monday August 19, 2002 @10:47AM (#4097960) Homepage
    I've been doing real-time 3D graphics for 10 years and read-back speeds have been the biggest problem for doing many advanced algorithms. We have asked the companies to improve this many times. The problem as I see it: Quake and other benchmark apps don't rely on readback.
    Here are a few other important but non-Quake techniques that are driven by readback speeds. I'll go into more detail on the first for illustration purposes.
    High-quality real-time occlusion culling -- many techniques render the scene quickly by using a unique color tag per object or polygon and then read back the framebuffer to figure out everything that was visible (and how many pixels for each) for a final high-quality pass. If HW drivers would even just implement the standard glHistogram functions (which essentially compress the framebuffer before readback), this would become practical. NVidia adds their NVOcclusion extension, but it's limited in how many objects at a time you can test, it's very asynchronous, and it requires depth sorting on the CPU to make it most useful. The render-color technique does not. Yet HW makers are spending lots of money adding custom HW to do z-occlusion when a simple driver-based software technique may be easier.
    Dynamic Reflection Maps -- for simple, reflective surfaces -- Requires background rendering from multiple POVs (generally six 90 degree views) and caching these. Even if you can cache a small set of maps in AGP memory, you want fast async readback if you have a large fairly static scene and you're roaming around.
    Real-time radiosity -- similar to above, but needs more CPU processing of the returned images and possibly depth maps (reading back the depth buffer is often even more expensive than the color).
    Real-time ray tracing -- the better quality approaches need fast readback to store intermediate results (due to recursion, etc..). With floating point framebuffers and good vertex/pixel shaders, ray-tracing becomes possible, but not yet practical. I believe ./ may even have run a link to one of these techniques a while back.
    So there's a lot more to this issue than just making movies of your games. Faster, better graphics would be possible. So why isn't this a priority?
    ------------ cyranose@realityprime.com
  • The article claims that the drivers, not the HW, are causing the performance problem. Based on my conversations with a premier graphics programmer and some x86 experts, I don't believe that it is this simple. In particular, note that XFree86 2D, which uses its own drivers, also has pathetic readback rates.

    I barely understand the technical details, but it seems like there are some serious misfeatures in the way that the AGP bus interacts with CPUs and caches on both Intel and AMD during readback; it is going to be hard for card vendors to fix this problem (even if they decide to care). It may be that a new bus and/or new CPU glue will be needed for high-readback-rate applications.

  • by Animats ( 122034 ) on Monday August 19, 2002 @12:14PM (#4098630) Homepage
    If you want the rendered image back in main memory, render it into an offscreen buffer, or "pbuffer" in the OpenGL world. That's the standard approach, and it's designed to be fast, unlike reading back the screen buffer. Here's an NVidia tutorial for developers [nvidia.com] on how to do it. Not only is it faster, you don't have to worry about what the user is doing with overlapping windows or seeing the cursor in the picture.

    OpenGL supports reading back the screen buffer mostly so that the OpenGL validation suite can check the rendering accuracy. For that, it doesn't have to be efficient. And if you read back in some format other than the actual structure of the framebuffer, every pixel gets converted in software and performance will be awful.

    This article reads like it was written by an overclocker, not a graphics developer.

  • The nascent art of machinima [machinima.com], which involves using 3D game engines to make desktop movies, could benefit from a practical way to record game output faster. (It would also be nice to export directly to .AVI format for editing in Premiere or Avid, but that's another wishlist.)

  • by PhilFrisbie ( 560134 ) <phil@hawksoft.com> on Monday August 19, 2002 @01:12PM (#4099053)
    This has been discussed many times on various news groups. Here is my 'Readers Digest' version:

    If you read the AGP spec, which was written by Intel, you will note that it is based on the PCI 2.0 spec. The PCI 2.0 spec is for a 32 bit, 33 MHz symmetric bus which gives you a max transfer of rate of 132 MB per second. The AGP spec is for an asymmetric bus, 33 MHz read and 66+ MHz write. But writes were optimized at the expense of reads, since Intel was pushing video with NO onboard texture memory, and who would want to read back the image in real-time anyway, right?!?

    Yes, I am sure that drivers do have some affect, but the AGP spec is the first bottleneck. On an OpenGL news group it was reported last year that a person tested two identical video cards, the only difference being one was AGP and the other was PCI. The read performance for the PCI version was several times faster than the AGP version.

    Of course, some video cards are also to blame because of the frame buffer format they use, but that is another story...

  • Follow my reasoning here. I've heard from other articles at /. that Alan Cox (or one of the big name advocates) couldn't think of a reason to justify to NVidea to OpenSource their drivers. There would be no profit for them to do so.

    But if they had, the drivers would have been updated to scratch whoever's itch needed to be scratched. In this case the bandwidth from card to Memory.

    One of the benifits of Open source is that even seldom used features are enhanced, so that when suddenly there is a demand for this the features are in place.
  • I spent most of the summer working on AGP driver bugs, so let me clarify a few things.

    AGP was designed by Intel as an ad hoc solution to combat the problem of transferring large textures to a graphics card over the PCI bus. It's an extension to PCI, essentially, allowing fast, pipelined, ONE-WAY transfers. That should be repeated. AGP is PCI, with a different connector, and a bunch of extra pins and logic for pipelined transfers from system memory to the card. In fact, without "fast writes" enabled, CPU -> graphics card writes are plain PCI; only transfers requested BY THE CARD are accelerated.

    There is nothing new about this. It's in the spec.

    It is NOT meant to be a two-way bus. It it was never designed for offloading cinematic rendering to the card, for later recovery. AGP came out around 1997, before NVIDIA or ATI had shaders in hardware. PC rendering was nowhere near photorealistic at the time; that was the domain of software raytracers. Without AGP, video cards seriously hog the AGP bus with their texture streaming. That is ALL that AGP fixes.

    The real solution is to come up with a new bus. I tend to like unified memory architecture designs, but they have disadvantages as well. The real trouble is getting the PC industry to agree on anything; if ATI came up with a new bus standard, for instance, I doubt NVIDIA or Matrox would adopt it, not wishing to appear to submit to their competitor.

    -John
  • Someone build a bloody box with a DVI input and a gigabit ethernet port on it. Connect DVI out of video card to DVI input on our magic box, gigabit Ethernet on the box to gigabit ethernet on the PC. As each frame is generated, capture it and spew it back to the PC over the ethernet, then ask the custom software on the PC (via a packet from the magic box) to put the next frame over the DVI.

    Lather, rinse, repeat.

    Won't be cheap, but someone could almost certainly whip one up with a Xilinx FPGA. I know they make one with a built-in TMDS receiver, which is what you'd need to decode the DVI signal.
  • "AGP Texture Download Problem Revealed"

    "AGP Texture Download Problem" implies that there's a problem downloading textures via AGP from main memory. But it's not about texture transfers at all, it's about transfers of rendered frames back to the system (in the opposite direction).

    Hey, 'Taco... You're the high point of the /. editing staff; your readership is depending on you to drag the other editors up the bell curve kicking and screaming by your example. Don't give up now. =)
  • I'm not suprised at this - when you spend your effort optimising for
    output, dragging that final image back up to the input is kinda like
    running up a downward moving escalator...you *can* do it - but you
    probably shouldn't.

    It seems to me that if you are rendering movies with this technology,
    you are either a small operation who can probably afford to wait (say)
    10x longer than realtime to do it - or you are some big production house
    who can afford to do better.

    In those cases, why not simply stick a frame-grabber onto the digital output?

    Heck you can even get around the 8 bits-per-component problem by using a
    fragment shader to render the high order bits to red and the middle bits
    to green and the low order bits to blue - then do three passes to render
    the Red component of your image at 24 bits per pixel, then the green, then
    the blue.

    Using the downstream performance to your advantage is the way to go.

    The title of this article (which talks about "Texture Download" is most
    confusing because that's a term usually used to describe the process of
    taking a texture map out of the CPU and stuffing it into the graphics
    card's texture memory.

    This is more like "Screen Dump Upload".

In the long run, every program becomes rococco, and then rubble. -- Alan Perlis

Working...