Benchmarking the Benchmarks 126
apoppin writes "HardOCP put video card benchmarking on trial and comes back with some pretty incredible verdicts. They show one video returning benchmark scores much better than another compared to what you get when you actually play the game. Lies, damn lies, and benchmarks."
Erase Futuremark = instant win (Score:1, Insightful)
OSS (Score:1, Funny)
Re: (Score:2, Insightful)
Either I misunderstood you, or I don't see how the license can be a metric of performance or accuracy.
Re: (Score:3, Funny)
Clearly you haven't been drinking enough of your Kool Aid. Please contact the FSF and request more immediately.
Re:OSS (Score:5, Funny)
Translation: if you mod me down, I will become more insightful than you can possibly imagine.
Re: (Score:1)
Re: (Score:1)
Did you read this?
And even if the operating system or platform is opensource, that doesn't mean that the benchmark will be. He didn't mention any benchmark but actually referred to a platform. So how could you have any idea of how biased the benchmark software is/isn't?
back in my day... (Score:5, Funny)
Re: (Score:3, Interesting)
I'm pretty sure these benchmarks are invented by men.
Re: (Score:3, Insightful)
It's not the benchmark-scores that count. Sure, you need a specific minimum to enjoy the game, but it's the actual gameplay that makes the game fun, no matter the hardware.
I'm pretty sure these benchmarks are invented by men.
These benchmark scores are important when trying to determine a balance of cost vs. performance. So yes, these benchmarks were invented by men. This is because the old standard of picking the one whose color matches their shoes also resulted with the invention of the credit card.
Re:back in my day... (Score:5, Insightful)
There is indeed a bare minimum hardware performance required to play but sadly many new games, especially Crysis, that bare minimum is scarily close to the market's maximum. Benchmarks are supposed to be a way to isolate this and objectively measure it so that a good purchasing decision can be made by the consumer and when the game is played hopefully the subjective experience of enjoyment will follow. A framerate above human perception is needed for fun (as jerky frames lead to nausia and frustration), high detail is needed for the beauty of a game which is probably just as important (it's been the basis for visual art, music and poetry for millennia).
The reason we've got so far and now can have computers, electricity, aeroplanes, cars, etc. is because of the willingness of scientifically inclined individuals to isolate, experiment and measure. Technology is one of the things in life that can be measured and I think it is a good idea to continue to do it, provided we can do it right. Experimentation and science is what got us out of caves no?
As for Hardocp, what have they proven? Apparently traditional time demos run a fairly linear amount faster than realtime demos, even though it has been acknowledged that realtime demos render more including weapons, characters and effects that the canned demo does not. This would be interesting if the question was "how fast can Crysis run on different cards" but that's not what people want to know. What I'd want to know is which card should I buy to allow me to continue to play cutting edge games for as long as possible while enjoying their whole beauty but not getting a framerate low enough to make me uncomfortable. It just so happens that the card with the best timedemo benchmark has the best actual playthrough benchmark and by roughly the same factor. The only difference is that the traditional timedemo depends on only the graphics hardware whereas the playthrough benchmark depends on efficiency elsewhere in the engine (AI physics), where the player spent most time and if reviewing subjectively, the reviewers current mindset and biases.
Somebody please think of the science!
Re:back in my day... (Score:4, Insightful)
I must be getting old, I haven't upgraded my box in almost 2 years.
Cheers.
Re:back in my day... (Score:4, Interesting)
In the end, benchmarks can be useful as long as you don't accept their results as the gospel truth. Some benchmarks favor ATI, some favor NVidia, and I'm sure there's gotta be one benchmark that favors Intel Extreme Graphics
Re: (Score:1)
Re:back in my day... we didnt make bad analogies (Score:2)
The difference is that Aero Glass required far more system resources than its equivalent under Linux or available for XP (Stardock have something, I cant remember its name though). I have yet to see a game that can match the graphics on Crysis. Crysis runs at a reasonable frame rate on my Geforce 8800 GTS, I average about 20 FPS which halves around effects like waterfalls. This is on an Athlon 6000, 2 GB RAM, runni
Re: (Score:1)
Re: (Score:2)
What do you mean I need to buy a 1541 single sided floppy drive for my C=64 - I just bought this tape drive six months ago, paid $100 for it and at the time it was the fastest secondary storage known to man - I could type LOAD "*",1,1 and by the time I was done eating lunch my game was ready to run, and now you tell me I have to buy a new piece of hardware just to play a game?
Re:back in my day... (Score:5, Funny)
Layne
Re:back in my day... (Score:5, Informative)
Benchmarking provides potential customers with a metric to compare potential purchases.
Re: (Score:1)
It doesn't matter if you don't follow the same path each time, what counts is the actual feel... some games can get away with lower framerates in the flashy areas (e.g. Crysis), while others would be totally unacceptable.
I believe it's HardOCP that plots graphs of the minimum, maximum and average FPS. That's a step i
Re: (Score:3, Insightful)
The point is that you can't use a standard game (plus FPS meter) played by a human player to judge a graphics card's raw capabilities. To red
Re: (Score:2)
What you're saying makes sense when you write it down, but after having read the article the OP is talking about, as well as some of the related articles, I think it's fair to say that they are reliably doing just that. Decide on a specific run to do through a specific section, practice it
Re: (Score:2)
I'll give you a great example. In Crysis, my 8800GT system at home can be set up to 1280x1024 HIGH settings and still get 25FPS+ through the whole timedemo. Take those same graphics settings and try to play the last series on the aircraft carrier and it i
Re: (Score:2)
Re: (Score:2)
Re:back in my day... (Score:4, Informative)
That DX chip kicked the arse out of the SX models.
Solitaire on "You just won. Watch the cards leap" was good for checking out the Windows performance, but Wolf told you how fast the PC was.
Re: (Score:3, Funny)
Re: (Score:2)
I use the dir command in dos to benchmark my new computers. and have been doing so since the 8088.
Re: (Score:1)
Those were rigged, too. (Score:2)
Apparently, it had something to do with trading correctness for speed.
FRAPS Overhead? (Score:1)
Re: (Score:3, Informative)
Re: (Score:1)
whatevermark (Score:3, Funny)
I have no idea what this means, but it certainly sounds like Crysis has left its mark somewhere or other.
Re: (Score:2)
don't ask why the water smells funny and is yellow in color.
Re: (Score:2)
hmm (Score:2, Funny)
My old benchmark (Score:3, Funny)
10 PRINT TIME$
20 FOR I=1 TO 9999
30 NEXT I
40 PRINT TIME$
I then improved it to be:
10 A$=TIME$
20 IF A$=TIME$ THEN GOTO 20 !breaks out when the seconds change
30 I=1:A$=TIME$
40 I=I+1:IF A$=TIME$ THEN GOTO 40
50 PRINT I
Ahhh...the good old days... (1970s, early 1980s)
Re: (Score:2)
I think I've spotted a bug. You'll need a much bigger upper limit on that loop, if you're busy-waiting for basic to be capable of something useful
Re:My old benchmark (Score:4, Funny)
void doit(int i) { printf("%i\n", i); doit(i + 1); }
worked really well until I tried it in an environment where the call stack could get paged...then it turned into a hard drive benchmark
Re: (Score:2)
void doit(int i) { printf("%i\n", i); doit(i + 1); }
Oh god! What has become of this site? Poor spelling and grammar I can understand. Confusing the stack and the heap is a sign of the times!
Synthetics not entirely useless (Score:4, Informative)
Re: (Score:1)
Re: (Score:1)
More cornbread?
Re: (Score:2)
Crysis sorta breaks your argument though. One thing everybody will agree on is that there currently is NO consumer hardware that will play smoothly it at its highest settings. I've heard that a two (three?) card SLI setup of nVidia's top of the line overclocked monsters can get it to pump out 30fps or so with its settings maxed out, but that's about it. The game of tomorrow--today!
We need international benchmarking standards! (Score:4, Funny)
To avoid concentrating all the data management in a single entity, we need a national benchmarking committee for each country and then international elections to get a chief of benchmarking interrelationships or CBI.
To avoid the possible corruption of the CBI, we would need an independent international supervision committee for the review of benchmarking standards.
The IISCRBS would review the actions of the CBI yearly and produce a thorough report.
That report (which would be called the IISCRBS-CBI report) would be the main reference to start any kind of productive debate about who has the leetest rack and who's a lame n00b.
Would like to see a real world comparison for EQ (Score:2)
Would love a site that showed "here is the game on the highest settings on these CPU/GFX combos".
Re:Would like to see a real world comparison for E (Score:5, Funny)
Are you one of those software pirates?
Re: (Score:2)
Well you probably know what I meant and were making a funny but in case you didn't.
In EQ, on a raid, you get 54 people close to you (so they can't be clipped based on distance), and 40-70 server side creatures (player pets, monsters, the big "bad") and your machine is trying to keep up and report on and render all that in real time. My frame rate is >60 (>100?) in some content but in the new content on a raid, it can go to 10 to 20 fps unless I turn off a lot of features. Kinda sucks.
Re: (Score:2)
Re:Would like to see a real world comparison for E (Score:2)
I'd also like to see a benchmark app you canr un from usb or dvd/cdrom booting. Something that gives you a clean slate to compare against running it in your existing install so you can see how much all the various apps and drivers are bogging your performance down.
Re:Would like to see a real world comparison for E (Score:1)
Problem with EQ is that performance can vary greatly depending on the card, the drivers, and of course the settings.
There are non-graphical settings within EQ that can slow down your computer in a raid environment that won't mess with it much in a non-raiding environment. Basically anything that logs information to your hard drive will really mess you up in a raid.
But EQ has so many damn bugs in it that benchmarking would be
Re: (Score:2)
It is posted somewhere on "therunes.net" boards. I linked it to my guild boards a couple months ago.
Re: (Score:1)
Benchmarks (Score:5, Insightful)
You must perform the same exact test on all video cards, disclose any variables, and you must not "pick a subset of completed tests to publish". You must not compare tests performed using different procedures, no matter how slight the deviation of the procedures are.
One cannot draw conclusions about "real world" performance from a benchmark. The benchmark is merely an indicator. A "real world" test that uses the strong, formalized procedures of a benchmark IS a benchmark - and suddenly, the benchmark is not "real world" - because the "real world" doesn't have formal procedures for gameplay.
Haphazard "non-blind" gameplay on a random machine is NOT a benchmark, and it can not provide useful, comparable numbers.
A good benchmark is one where (1) most experts agree that it has validity, and (2) one where the tester cannot change the rules of the game.
The numbers of a benchmark are meaningless, except in terms of being compared to one another using the same exact procedure.
Re: (Score:2)
So they threw benchmarking out, for the most part, and instead tried to make a system for measuring how well a given video card delivers a positive experience. It's not ideal.. but at least it's immune to interference from the video card makers. Now you just have to worry about bias from
Re: (Score:2)
Benchmarks != Reality (Score:2)
But does this impact their usefullness in comparing hardware at all?
=Smidge=
Re: (Score:2)
RTFA. It clearly shows how the canned timedemo benchmarks most sites use can be horribly misleading and give totally wrong impressions.
Re: (Score:2)
However, they can still (sort of) be used to compare cards against each other. They don't do much to reflect playability of a game at given settings accurately, but in theory all of the numbers you get from a timedemo should be inflated by about the same percent.
Re: (Score:2)
This difference is the entire point of the article.
Obligatory Portal Reference (Score:1)
HardOCP benchmarks suck ass (Score:2)
Re:HardOCP benchmarks suck ass (Score:4, Insightful)
The highest playable settings for given hardware.
They then change the video card and find the highest playable settings for that hardware.
I'd much rather compare the highest playable settings for two different cards than the timedemo benchmark numbers for two different cards.
Re: (Score:3, Insightful)
For example: 1620x1050 with no AA may be considered unplayable (jaggies) for some, but others it's perfectly fine...
Or, maybe you can turn on the AA, but deactivate shadows, changing your whole "playable" demographic again.
It's like asking someone to benchmark coffee at different resturants to grade whether it is palletable or not.
~D
Re: (Score:2)
I've heard rumors that similar things are done for movies, books, games, tv shows, and even food.
I believe the idea is to work out how closely you agree with the reviewer in question in order to determine if what they say is useful (and of course when you completely disagree they can be useful - if they love it you'll hate it sort of thing)...
But, yes, if the point was meant to be that there is no one comparison function and hence each persons ordering will may be different
[H] raises more questions than it answers (Score:3, Informative)
- is triple-buffering on or vsync off? This will make a huge difference to real time versus sped up timedemos
- is sound on when playing back both types of timedemos?
- how does FRAPS affect your benchmark scores?
Finally, in relation to the Crysis real world gameplay versus the AT benchmark score, I thought it was common knowledge that the game would be slower when actually playing it because you likely have physics,AI,logic,sound calculations to do that you don't in timedemo mode. What is the big deal here?
Re:[H] raises more questions than it answers (Score:4, Informative)
The root of the issue is that timedemos give the video card manufacturers something to tweak their drivers around besides gameplay. And there are also some arguments over how representative of your actual experience a timedemo will be. At least HardOCP gives a crap about their methodology, as opposed to other hardware sites which don't use any sort of statistical analysis.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
HardOCP didn't really do any sort of statistical analysis. They gave min/avg/max on a few cards. Anandtech and Toms Hardware have a sample population and a methodology that blows the doors of HardOCP statistically.
HardOCP is just regurgitating age-old arguments that have been around since the dawn of benchmarks. I helped code 3DMark in 1996, we went through the same arguments then. Nothing has changed. Synthetic benchmarks serve a purpose: because playing the game and r
Re: (Score:2)
See the difference?
HardOCP's testing is only concerned with real-life gameplay. Most of the time, their conclusions are pretty similar to other sites... card A is faster than card B, for instance. However, sometimes, their conclusions are opposite what other sites come u
Re: (Score:1)
Re: (Score:2)
Re: (Score:2)
There's no reason you couldn't write a benchmark/demo which actually performs the physics/AI/logic/sound calculations, as opposed to pre-calculating that ahead of time. Even if your AI or physics code contains
Benchmarks are a marketing tool only (Score:2)
As long as you don't run two 30 inch monitors, any name brand video card for about 200 bucks will give you great playable rates at 1680 x 1050.
A lot of benchmarks imply you need t
Re: (Score:3, Insightful)
Not in Crysis, Call of Duty 4, UT3, etc.
When I go to plunk down $200 - $300 on a video card, and one of them performs comfortably at my LCD's native resolution and the other one doesn't, that matters. Saying all cards in a given price range are roughly equivalent is saying that you are completely, 100% blind to the reality of video cards today.
Re: (Score:1)
Re: (Score:2)
I can *just barely* enable AA and AF with the 8800GT. I would not be able to do this with a 30% slower card like the 3870.
This is why reviews matter.
Re: (Score:1)
Re:Benchmarks are a marketing tool only (Score:4, Informative)
Re: (Score:2)
Re: (Score:2)
The other guy who has trouble playing call of duty 4, that I don't get, I found it
Re: (Score:2)
I play using the quake raytracing engine and my benchmarks are sec/frame, not frame/sec.
Re: (Score:2)
So the gameplay sucks.
The difference is that I spent a lot less money on hardware, ergo, I got a lot more sucky gameplay for my money.
Or in other words, my suck per buck ratio is a lot higher.
Yeah.
Benchmarking Benchmarks? (Score:1)
Not the same card (Score:3, Insightful)
It is a bit of a shock that ATI's latest and greatest can't seem to consistently beat nVidia's over a year old GTX cards I guess.
It is about the "cheating" in benchmarks (Score:1)
What the best method for eliminating the discrepancies from those best able to code for a given benchmark is I am not sure but it seems he tries.
Suuure... (Score:2)
We prefer stopwatches (Score:1)
Re: (Score:1)
Why DX10? (Score:1)
Re: (Score:1)
I personally dont know any gamers that use Vista other than what might have come on a new laptop and most of those have even removed it of the laptop.
Insufficient sample size (Score:2)
All they've proven is that there is something wrong with the timedemo system in Crysis.
Re: (Score:3, Funny)
Layne
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
But by default, Windows and Linux will boot and just ignore any extra memory they can't address. PAE shouldn't enter the picture for any serious gamers.
Re: (Score:2)
Re: (Score:2)