Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Slashdot Log In

Log In

Create Account  |  Retrieve Password

Benchmarking the Benchmarks

Posted by CmdrTaco on Mon Feb 11, 2008 10:45 AM
from the blogging-the-bloggers dept.
apoppin writes "HardOCP put video card benchmarking on trial and comes back with some pretty incredible verdicts. They show one video returning benchmark scores much better than another compared to what you get when you actually play the game. Lies, damn lies, and benchmarks."
+ -
story

Related Stories

This discussion has been archived. No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
 Full
 Abbreviated
 Hidden
More
Loading... please wait.
  • by Aranykai (1053846) <(moc.liamg) (ta) (resnogls)> on Monday February 11 2008, @10:51AM (#22379664)
    We used to benchmark a computer by *gasp* actually running things on it. If you wanted to find out how well it would perform running a game, you played the damn game and found out. Course, thats not good enough for these ubernoobs who think they are cool with their benchmark scores on their forum signatures...
    • Re: (Score:3, Interesting)

      by Anonymous Coward
      It's not the benchmark-scores that count. Sure, you need a specific minimum to enjoy the game, but it's the actual gameplay that makes the game fun, no matter the hardware.

      I'm pretty sure these benchmarks are invented by men.
      • It's not the benchmark-scores that count. Sure, you need a specific minimum to enjoy the game, but it's the actual gameplay that makes the game fun, no matter the hardware.

        I'm pretty sure these benchmarks are invented by men.

        These benchmark scores are important when trying to determine a balance of cost vs. performance. So yes, these benchmarks were invented by men. This is because the old standard of picking the one whose color matches their shoes also resulted with the invention of the credit card.

      • by donscarletti (569232) on Monday February 11 2008, @12:11PM (#22380564)

        There is indeed a bare minimum hardware performance required to play but sadly many new games, especially Crysis, that bare minimum is scarily close to the market's maximum. Benchmarks are supposed to be a way to isolate this and objectively measure it so that a good purchasing decision can be made by the consumer and when the game is played hopefully the subjective experience of enjoyment will follow. A framerate above human perception is needed for fun (as jerky frames lead to nausia and frustration), high detail is needed for the beauty of a game which is probably just as important (it's been the basis for visual art, music and poetry for millennia).

        The reason we've got so far and now can have computers, electricity, aeroplanes, cars, etc. is because of the willingness of scientifically inclined individuals to isolate, experiment and measure. Technology is one of the things in life that can be measured and I think it is a good idea to continue to do it, provided we can do it right. Experimentation and science is what got us out of caves no?

        As for Hardocp, what have they proven? Apparently traditional time demos run a fairly linear amount faster than realtime demos, even though it has been acknowledged that realtime demos render more including weapons, characters and effects that the canned demo does not. This would be interesting if the question was "how fast can Crysis run on different cards" but that's not what people want to know. What I'd want to know is which card should I buy to allow me to continue to play cutting edge games for as long as possible while enjoying their whole beauty but not getting a framerate low enough to make me uncomfortable. It just so happens that the card with the best timedemo benchmark has the best actual playthrough benchmark and by roughly the same factor. The only difference is that the traditional timedemo depends on only the graphics hardware whereas the playthrough benchmark depends on efficiency elsewhere in the engine (AI physics), where the player spent most time and if reviewing subjectively, the reviewers current mindset and biases.

        Somebody please think of the science!

        • by cHiphead (17854) on Monday February 11 2008, @01:11PM (#22381244)
          Some of us make purchasing decisions based on the piece of shit game we are thinking of buying. Crysis is a joke with such high requirements for a playable experience. I base my game purchases on what will run on my old pos single core p4 2.8ghz box. Any game that can't impress with such insanely fast hardware as we have these days even on the 'budget' boxes is not a game worth investing in.

          I must be getting old, I haven't upgraded my box in almost 2 years.

          Cheers.
        • Re:back in my day... (Score:4, Interesting)

          by billcopc (196330) <vrillco@yahoo.com> on Monday February 11 2008, @01:23PM (#22381374) Homepage
          It's funny that you mention Crysis... people are freaking out over Crysis the same way they freaked out over Aero Glass a year ago. The reality is, Crysis runs fine on midrange gaming systems. It won't run in 1920x1200 with DX10 eyecandy on that crusty old Geforce 6200, but it certainly does not require a $2500 powerhouse to be enjoyable.

          In the end, benchmarks can be useful as long as you don't accept their results as the gospel truth. Some benchmarks favor ATI, some favor NVidia, and I'm sure there's gotta be one benchmark that favors Intel Extreme Graphics :P... the important thing is to find parallels that relate to your own needs and wants so you can put those numbers into perspective.
    • by SQLGuru (980662) on Monday February 11 2008, @11:10AM (#22379842)
      And, on top of that, they are on your lawn....

      Layne
    • Re:back in my day... (Score:5, Informative)

      by Sancho (17056) on Monday February 11 2008, @11:40AM (#22380146) Homepage
      The problem is that it's hard to objectively score performance by "running things on it." Benchmarks are nice because they run the exact same tests every time. You can't just turn on FPS display and walk around in the game to measure performance--your actions may not be the same each time, and slight variations could cause drastically different results.

      Benchmarking provides potential customers with a metric to compare potential purchases.
        • Re: (Score:3, Insightful)

          You're conflating benchmarking games vs. benchmarking graphics cards. If you're looking for raw power for an arbitrary amount of money, you'd want to get the graphics card which has the maximum frame rate at that price. If you're looking to play a specific game, you'd look for a graphics card which most people (quite subjectively, obviously) say plays the game well.

          The point is that you can't use a standard game (plus FPS meter) played by a human player to judge a graphics card's raw capabilities. To red
    • Admit it, you "benchmarked" with Windows Solitaire.
  • by Yath (6378) on Monday February 11 2008, @11:00AM (#22379760) Journal

    Crysis, UT3, and COD4 are the three primary games we are using currently, with Crysis performance certainly being the new watermark in the industry.


    I have no idea what this means, but it certainly sounds like Crysis has left its mark somewhere or other.
    • read it again Crysis left a watermark.

      don't ask why the water smells funny and is yellow in color.
  • Is your benchmark of the benchmarks accurate? We might have to benchmark it.
  • by Anonymous Coward on Monday February 11 2008, @11:05AM (#22379798)
    I used to do this benchmark:
    10 PRINT TIME$
    20 FOR I=1 TO 9999
    30 NEXT I
    40 PRINT TIME$

    I then improved it to be:
    10 A$=TIME$
    20 IF A$=TIME$ THEN GOTO 20 !breaks out when the seconds change
    30 I=1:A$=TIME$
    40 I=I+1:IF A$=TIME$ THEN GOTO 40
    50 PRINT I

    Ahhh...the good old days... (1970s, early 1980s)
    • I used to do this benchmark:
      10 PRINT TIME$
      20 FOR I=1 TO 9999
      30 NEXT I


      I think I've spotted a bug. You'll need a much bigger upper limit on that loop, if you're busy-waiting for basic to be capable of something useful ;)
    • by sempernoctis (1229258) on Monday February 11 2008, @01:04PM (#22381128)
      My favorite benchmark for finding the size of the memory heap:

      void doit(int i) { printf("%i\n", i); doit(i + 1); }

      worked really well until I tried it in an environment where the call stack could get paged...then it turned into a hard drive benchmark
  • by Anonymous Coward on Monday February 11 2008, @11:07AM (#22379810)
    Benchmarking using actual games is, of course, important. But part of the reason a lot of us buy video cards and such isn't JUST about the performance on today's games, but for how they'll play the games coming out in the next few months. Synthetic benchmarks often implement advanced features not currently seen in today's games, but which will be implemented in just-over-the-horizon games. So while clearly one ought not judge a card purely on 3DMark or similar benchmarking suites, they do have their uses.
  • by Thanshin (1188877) on Monday February 11 2008, @11:09AM (#22379826)
    ...And an international benchmarking committee.

    To avoid concentrating all the data management in a single entity, we need a national benchmarking committee for each country and then international elections to get a chief of benchmarking interrelationships or CBI.

    To avoid the possible corruption of the CBI, we would need an independent international supervision committee for the review of benchmarking standards.

    The IISCRBS would review the actions of the CBI yearly and produce a thorough report.

    That report (which would be called the IISCRBS-CBI report) would be the main reference to start any kind of productive debate about who has the leetest rack and who's a lame n00b.
  • I have what was a "hot" card only eighteen months ago (7800) ago and now it is stuttering on some of the newer content when I'm raiding. The rest of the game is glass smooth. Suppose it could be the PC but it is a pretty good PC too.

    Would love a site that showed "here is the game on the highest settings on these CPU/GFX combos".

    • I have what was a "hot" card only eighteen months ago (7800) ago and now it is stuttering on some of the newer content when I'm raiding.

      Are you one of those software pirates?

      • hehe.

        Well you probably know what I meant and were making a funny but in case you didn't.

        In EQ, on a raid, you get 54 people close to you (so they can't be clipped based on distance), and 40-70 server side creatures (player pets, monsters, the big "bad") and your machine is trying to keep up and report on and render all that in real time. My frame rate is >60 (>100?) in some content but in the new content on a raid, it can go to 10 to 20 fps unless I turn off a lot of features. Kinda sucks.
    • That would be nice, especially retouching on older ones and also cheaper combos you'd find in generic desktops.

      I'd also like to see a benchmark app you canr un from usb or dvd/cdrom booting. Something that gives you a clean slate to compare against running it in your existing install so you can see how much all the various apps and drivers are bogging your performance down.
  • Benchmarks (Score:5, Insightful)

    by Anonymous Coward on Monday February 11 2008, @11:13AM (#22379878)
    Duh, a benchmark is a controlled test performed "on a bench" - meaning, in a controlled environment with specific, well-described procedures.

    You must perform the same exact test on all video cards, disclose any variables, and you must not "pick a subset of completed tests to publish". You must not compare tests performed using different procedures, no matter how slight the deviation of the procedures are.

    One cannot draw conclusions about "real world" performance from a benchmark. The benchmark is merely an indicator. A "real world" test that uses the strong, formalized procedures of a benchmark IS a benchmark - and suddenly, the benchmark is not "real world" - because the "real world" doesn't have formal procedures for gameplay.

    Haphazard "non-blind" gameplay on a random machine is NOT a benchmark, and it can not provide useful, comparable numbers.

    A good benchmark is one where (1) most experts agree that it has validity, and (2) one where the tester cannot change the rules of the game.

    The numbers of a benchmark are meaningless, except in terms of being compared to one another using the same exact procedure.

  • Okay, so benchmarks don't adequately reflect real applications. Not much of a surprise there...

    But does this impact their usefullness in comparing hardware at all?
    =Smidge=
    • Yes.

      RTFA. It clearly shows how the canned timedemo benchmarks most sites use can be horribly misleading and give totally wrong impressions.
      • We've known this for years, which is why a lot of the better review sites moved away from timedemos a long while ago.

        However, they can still (sort of) be used to compare cards against each other. They don't do much to reflect playability of a game at given settings accurately, but in theory all of the numbers you get from a timedemo should be inflated by about the same percent.
        • The article attempts to show that the numbers you get from a timedemo *don't* correlate well to what you get in the real world. Some cards or drivers do better in the "timedemo -> real life" conversion than others.

          This difference is the entire point of the article.
  • They never use the same game configuration, so trying to figure out how much faster one thing is than another is impossible. Rather than have 1 variable (the hardware being benchmarked), they use 2 variables (the hardware, and the settings of the benchmarked software).
    • by jonnythan (79727) on Monday February 11 2008, @11:33AM (#22380080) Homepage
      Um, they come up with what is probably the most useful data of all:

      The highest playable settings for given hardware.

      They then change the video card and find the highest playable settings for that hardware.

      I'd much rather compare the highest playable settings for two different cards than the timedemo benchmark numbers for two different cards.
      • Re: (Score:3, Insightful)

        You know that's totally intractable, right?

        For example: 1620x1050 with no AA may be considered unplayable (jaggies) for some, but others it's perfectly fine...

        Or, maybe you can turn on the AA, but deactivate shadows, changing your whole "playable" demographic again.

        It's like asking someone to benchmark coffee at different resturants to grade whether it is palletable or not.

        ~D
  • by tayhimself (791184) on Monday February 11 2008, @11:22AM (#22379966)
    Here are a few that I had :
    - is triple-buffering on or vsync off? This will make a huge difference to real time versus sped up timedemos
    - is sound on when playing back both types of timedemos?
    - how does FRAPS affect your benchmark scores?

    Finally, in relation to the Crysis real world gameplay versus the AT benchmark score, I thought it was common knowledge that the game would be slower when actually playing it because you likely have physics,AI,logic,sound calculations to do that you don't in timedemo mode. What is the big deal here?
    • by DeadChobi (740395) <DeadChobi.gmail@com> on Monday February 11 2008, @11:38AM (#22380122)
      It's misleading because video card manufacturers tweak their drivers to perform better in timedemos versus real world gameplay so that hardware review sites will do reviews touting the game as playable on such-and-such a card at maximum settings even though real world gameplay never comes close to what the time demo is doing to the game. Wow, that was one sentence. Oh, and how can you say that card A outperforms card B without ever comparing them in gameplay? That would be like me going into a hardware store and swinging two different hammers to compare them, then buying one based on that test only to find out that its total crap at actually hammering.

      The root of the issue is that timedemos give the video card manufacturers something to tweak their drivers around besides gameplay. And there are also some arguments over how representative of your actual experience a timedemo will be. At least HardOCP gives a crap about their methodology, as opposed to other hardware sites which don't use any sort of statistical analysis.
      • Reminds me of how the EPA is changing how fuel efficiancy is determined for cars. The old standard was not realistic compared to how most people actually drive. Now they are putting a lot more stop & go driving in their testing and getting lower, but more realistic, numbers.
      • One of the under-appreciated things about the Q3 and D3 engines is that demos are essentially a recording of the network stream. So running imedemo on a demo will be extremely accurate for real world performance.
    • It's misleading because sometimes one card will come out way in front of another during a canned benchmark due to tweaking, shortcuts, whatever.... but that same card will come out way behind the other card during actual, real-life gameplay.

      See the difference?

      HardOCP's testing is only concerned with real-life gameplay. Most of the time, their conclusions are pretty similar to other sites... card A is faster than card B, for instance. However, sometimes, their conclusions are opposite what other sites come u
    • Ideally, the graphics card and all on-CPU calculations are running in parallel, so the influence of this extra work on graphics performance should be minimal. This is what they mean in TFA when they refer to situations that are not CPU-limited.
  • Give you an idea relative to other cards tested using the same benchmark. However, I have always found them misleading and somewhat gratuitous. Declaring a card superior over another just because it gives five more frames a second than another card is dumb. Especially when it is the difference between 110 and 115 frames per second.

    As long as you don't run two 30 inch monitors, any name brand video card for about 200 bucks will give you great playable rates at 1680 x 1050.

    A lot of benchmarks imply you need t
    • Re: (Score:3, Insightful)

      "As long as you don't run two 30 inch monitors, any name brand video card for about 200 bucks will give you great playable rates at 1680 x 1050."

      Not in Crysis, Call of Duty 4, UT3, etc.

      When I go to plunk down $200 - $300 on a video card, and one of them performs comfortably at my LCD's native resolution and the other one doesn't, that matters. Saying all cards in a given price range are roughly equivalent is saying that you are completely, 100% blind to the reality of video cards today.
    • by TheMeuge (645043) on Monday February 11 2008, @11:35AM (#22380100) Homepage

      As long as you don't run two 30 inch monitors, any name brand video card for about 200 bucks will give you great playable rates at 1680 x 1050.
      Evidently, you've never actually PLAYED Crysis. On an AMD64 Dual Core at 2.4GHz, 2GB of RAM, and Nvidia 8800GTS 640MB (>>$200), I needed to reduce my resolution to 1280x1024 and set everything to Medium, to have the framerate not drop into single digits or low teens, and stay at 20-30fps.
      • Seconded. I have an almost identical setup other than my 8800GTS is 320 megs and I had to play with everything set on medium to be playable.
      • actually, I have played (play) crysis... a mix of high and medium settings at 1680 x 1050... I use a HIS Ice 3850, 4 gigs of ram (yeah only 3 are used) and an E8400... I will say that I never said you could use a $200 card to run a game at high settings with great rate (and crysis is a pig for resources), just that you could get great frame rates, and you can by playing with the settings. And the games still look really good.

        The other guy who has trouble playing call of duty 4, that I don't get, I found it
  • Not the same card (Score:3, Insightful)

    by jandrese (485) <kensama@vt.edu> on Monday February 11 2008, @11:47AM (#22380238) Homepage Journal
    One thing that's bothering me is that HardOCP said "Anandtech benchmarked this card vs. an 8800GTS and said it came out faster, then we benchmarked it against an 8800GTX and it game out faster, then people complained that our results didn't match". Isn't that expected? The GTX is a faster card than the GTS last time I looked. Why is it such a shock that the ATI card came in between them in performance?

    It is a bit of a shock that ATI's latest and greatest can't seem to consistently beat nVidia's over a year old GTX cards I guess.
  • FLASH NEWS: [H]ardOCP throws such outdated concepts such as "controlled testing environment" and "repeatability" out the window and calls it revolutionary! Yay!
    • Re: (Score:2, Insightful)

      aren't you being just a little bit... oh, I dunno... offtopic?

      Either I misunderstood you, or I don't see how the license can be a metric of performance or accuracy.
      • Re: (Score:3, Funny)

        Either I misunderstood you, or I don't see how the license can be a metric of performance or accuracy.

        Clearly you haven't been drinking enough of your Kool Aid. Please contact the FSF and request more immediately.
    • Re: (Score:3, Informative)

      without using the screen-recording functionality, the overhead should be statistically irrelevant.