Proposal For Open-Source Benchmarks 118
nd writes: "Van Smith from Tom's Hardware has written a proposal that calls for open source benchmarking. He talks about the need for increasing the objectivity of benchmarking. The proposal is basically to develop a suite of open-source benchmarking tools and new methodologies. It's a rather dramatic column, as he discusses Transmeta, bias towards Intel, among other things. " Well, once you get through the inital umpteen pages of preamble, the generically named A Modest Proposal is the actual point. Interesting idea - but I shall weep for the passing of bogo-MIPs as the definitive measure of system performance. *grin*
Generic benchmarks are useless (Score:1)
Guy 2: raw FPU/3DNOW and some INTEGER for Quake
Guy 3: raw SSE for some scientific software
Guy 4: raw MMX to watch German pay-tv illegaly
Guy 5: raw [insert CPU unit here]
I buy my CPUs not based on benchmark numbers.
I buy them because they have fast units that
I like. (Athlon FPU, etc.)
Always a good idea (Score:1)
But i digress. My benchmark system is built around the theory that packets are stuffed by weight not by volume. Kinda like potato chips. They always have that little disclaimer on the bottom of the bag. I don't really like potato chips much, but I generally like the green ones. The extra oil is tasty, and they're a little bit chewier. I like chewy foods. I also like green foods. I love broccoli.
But I digress. So any random amount of data passed from one process to another (or one computer to another, or what have you) can be measured by a) the size in bits and b) the usefulness of data. For example, if a random packet contains information that just alerts another program that it's there, then it wouldn't really be all that important. Like people, I believe that a program should be able to operate independantly. That way if the dependant processes die, the program doesn't instantly die. If no one I'm working with holds up their end of the project, then I'm going to be set behind writing extra code or configuring servers I hadn't intended to. If I just planned on that from the get-go, then it wouldn't be so much of an issue. Time is money, I've heard it said, and unfortunately I'm running out. Of both actually. It's not fun. Yeah, I thought it was pretty cool to be able to buy a house in cash, but now I'm on the ramen noodle diet for a couple months until I can build my savings back up. Which wouldn't be so bad if I could afford some dried squid to toss in the noodles. That's MIGHTY tasty. I'm left to doing the best I can with what spices I'm not out of. And garlic. I don't care HOW broke I am, I can always afford fresh garlic.
But I digress. If you set up a system to compare the relative importance of a given data packet, weighed against its size, and multiplied by the time it takes to complete an operation, you can get a good measure of how efficient your system is. The coding is relatively simple. You need something that will run every possibility against the system in order to see where it performs well, and whether or not it gives priority to more important packets, regardless of size. Size is NOT everything. And no, I'm not as endowed as my ex-girlfriend told you, no one is that big. She's just trying to get you interested in me. Nothing personal, I mean, you're a nice person and all, and you have a really pretty smile, but you're just not my type. I don't think that you'd really like me so much once you got to know me. I don't know why the ex is so quick to push me off on someone. It's not like I'd even DREAM of touching her with a stolen dick. Oh, was that out loud.
But I digress. My benchmarking tool is open sourced and available at . . . . I'm bored. I don't want to do this anymore. Sigh.
Give Transmeta a little more wiggle room?? (Score:1)
Those are the folks who say that "new benchmarks are needed." I imagine when your product doesn't excel with the present benchmark, you're inclined to want to change it.
As Transmeta is spending a lot of time hyping their chips for portable use, maybe they should also scrutinize their plan to champion a server OS for use on their chips. Is the additional load of a Timesharing-type system really warranted, any more than desktop or server-grade benchmarks? Why tie up resources with all the clap-trap associated with a multi-user OS (i.e. group and user attributes in the filesystem are a real waste on a handheld device)?
Come up with new benchmarks if need be.
But also throw away quaint ideas from the 70's like 'everything ultimately devolves into being a teletype' and 'this is a TimeSharing system, supporting twenty-five users.'
Probably wouldn't be a bad idea to think about scrapping stuff like termcap, either. Handheld portable devices should obviously still run sendmail and the NNTP server of your choice, however. Heck, here where I work we have a Palm-Pilot running that "L-whatever" OS duct taped to every printer, because we're just darned fond of LPD spools and other quaint stuff from the 70's.
Hey, it's a new era of computing.
Methodology... (Score:1)
Open methedology is at LEAST as important as an open source benchmarking tool. Knowing how the numbers are being generated is definately very important. It is also EXTREMELY important to know how those numbers are being massaged.
As for the issue of a vendor modifying the benchmark to skew results, two things. First, who's to say they don't already by influencing how the closed benchmark is written (admittedly less of a problem with benchmarks intended to be cross-platform)? Secondly, I think this can be solved with liscensing. Require that the source code, including modifications, be distributed with the benchmark. And require the source to be posted when you post numbers generated with this benchmark (ie, liscense the damned thing so people/companies cannot secretly modify it and post whacked numbers without letting us see the modifications).
I think the idea of forcing these things into the open is an EXCELLENT idea. Even if the article (at toms that is) is mostly an arrogant polemic with very little of real substance...
Re:Here are some suggestions... (Score:1)
Re:what about SPEC? (Score:1)
SPEC is the best benchmarking suite I've seen, but it has shortcomings which I've been itching to fix with my own benchmarking suite. But it's a huge project, and I haven't had time to do more than poke at it a little. I'll put up what notes I have on my web site here [flyingcroc.net] when I have the time, but they're not there right now.
A brief summary: I think SPEC has the right idea, in that a benchmark should consist of a suite of real-life applications which are only allowed to be optimized in limited ways (to accurately represent how applications are optimized in the real world), that the components should be like the applications the target audience is interested in running, and that distinctions should be made between applications which stress different parts of the system. I think the target audience could be broadened considerably by selecting a slightly different set of applications, and I think that in addition to an int and fp sub-suite (which stress only the CPU and memory subsystems, to a large degree), there should be a third sub-suite which uses applications with more holistic demands on the system -- system calls and filesystem. I think that the purpose of a benchmark should be to enable the user of the system to predict how well the system is going to perform, and filesystem performance often has a large impact on real-life performance. For better or for worse, this will make the choice of OS and disk subsystem a much more important factor in determining the results of the benchmark, increasing explosively the number of reference systems necessary to generate useful results, but if such a level of complexity is necessary to accurately protray reality, then that is the level of complexity that we should have.
Making the benchmark open source adds a new level of agenda to the benchmarking effort. It makes it in the best interests of hardware vendors to see better optimizations in the compilers used, and if the level of authenticity assigned to the benchmark report is partially dependent on the openness of the compiler used, then that could mean more corporate contributions to open-source compilers. It would also help avoid the use of bogus compilers which are only really useful for compiling SPEC components, which is a big problem with SPEC right now.
-- Guges
"Open source" ideology (Score:3)
You know I'm getting somewhat sick of the whole open source thing. At first I thought it was a Good Thing, a way to allow people to collaborate on code and to keep it from being stolen. But gradually I am becoming more and more cynical about it - not so much the concept, but more the zealotry that surrounds it.
Just look at the title of the article linked in this story - "A Call to Arms - A Proposal for Open-Source Benchmarks". WTF? Why is this a call to arms? Isn't this just a bit rabid for what is, after all, just an article about benchmarks. Benchmarks may be important, but they're not worth getting worked up over.
And then the first page of the article is a rambling piece of tabloid "cyber"-journalism far worse than even Jon Katz has ever managed. Why is this diatribe necessary? Surely we all know what open-source is, and we all realise that the net has changed a lot of things. No, it's the same thing I see again and again - the zealotry of the open source proponent who feels the need for grand rhetoric and buzzword-filled arguments.
There is an ideology behind open source, and a good one, but it has been taken too far. Richard Stallman is not the best person to represent such a diverse group of people - his radical politics and hatred of commercialism make him quick with the denounciation of anything he disagrees with, like the name Linux - after all, he'd rather it was "GNU/Linux" or even worse "Lignux". This kind of ideological zeal is certainly putting me off of the idea, and others I'm sure too, but there seems to be a never-ending parade of people willing to subscribe to his beliefs and zealotry.
Anyway, what I'd like to see is a return to what open source is about - writing good, free code for the use of all. There's no need for flaming attacks on closed-source software or whatever - that shouldn't be the point of open source, and is just a waste of time better spent coding. Unfortunately /. seems to provoke this kind of hysteria, but even with this I'll still read it :)
If you disagree, feel free to reply. Nicely :)
Funny... (Score:4)
Slashdot Multitudes: Yay! (clapclapclapclap)
Jon Katz: "Open Source Babble Transmeta Crusoe Linux Ramble Internet Cyber-World Paradigm Revolution"
Slashdot Multitudes: Windbag! Parasite! Media Whore!(boooo, hissssss)
Re:"Open source" ideology (Score:1)
Re:"Open source" ideology (Score:1)
Just look at the title of the article linked in this story - "A Call to Arms - A Proposal for Open-Source Benchmarks". WTF? Why is this a call to arms?
Because most of the section names in the article, "A Call To Arms" included, are the names of episodes of Babylon 5.
Re:Uhhh....Yeah, but who will use it? (Score:2)
Besides, any IBM salesdroid worth his commission would mention the RS/6000 line.
__
Read the HOWTO (Score:3)
There are *lots* of open-source benchmarks, and of course we can make new and better ones, and get a test suite together.
For starters, the LBT (Linux Benchmarking Toolkit):
Run the BYTEmarks (and the old UNIX ones too, they're funny), Whetstone, XBench... oh, and compile a stock kernel (and don't fiddle with the options, 2.0.0 was recommended then.)
Personally, I'd also suggest bonnie, it's a good benchmark for disk performance, but you'd have to have a range of options here. (testing disk performance and cache, so you'd really want a large number here too, just to be fair. 2*RAM?)
Also, when RedHat boots up, it has those RAID checksumming tests, those are good. They test different implementations of the same algorithm, so they say a lot about the individual chip. (whether it likes MMX, works well with different optimizations, and whatnot)
---
pb Reply or e-mail; don't vaguely moderate [152.7.41.11].
BogoMIPS (Score:2)
Calibrating delay loop... 897.84 BogoMIPS
-----
If Bill Gates had a nickel for every time Windows crashed...
bogo-MIPS explained (Score:2)
Until I decided to look it up ust there, I had no idea what bogo-MIPS was. Enlightenment can be found either here [kinetic.org] or here [enlightenment.org].
-- Michael
Re: Tom's Hardware (Score:2)
Tom and his hardware aside, I think that open benchmarking tools are a good idea. However, we might see a different set of problems, in that if the hardware company knows exactly what code is going to be executed to benchmark their product, they can optimize/cheat for that code.
Re:Open Source Benchmarks? SPEC! (Score:1)
http://www.spec.org/cgi-bin/order/ [spec.org]
those dont look like open source prices to me...
--Siva
Keyboard not found.
MIPS is not the plural of MIP (Score:1)
Talking about "a MIP" or "bogo-MIPs" is absolutely idiotic.
and a site to compare them! (Score:1)
Isn't there already a standard benchmarking? (Score:1)
"I can only show you Linux... you're the one who has to read the man pages."
Re:Uhhh....Yeah, but who will use it? (Score:1)
--
Re:Uhhh....Yeah, but who will use it? (Score:3)
But, nowhere in their advertising did they mention the size of the engine or the amount of power or anything about "performance". Back in those days everyone just knew Cadillacs had plenty of power. I suspect it's the same with IBM and their mainframes - just too much reputation to even advertise.
--
Re: Tom's Hardware (Score:3)
Another idea - do benchmarks need to be portable? (Score:2)
To explain: define a set of tasks (this could include some of the same set of tasks as some of the current synthetic benchmarks), but define then in the algorithm that must be used rather than the implementation. Then write a C/whatever standard that implements that algorithm as well as possible to use as a base. _Then_ allow the proponents of particular platforms to modify a version of the code (possibly using #ifdefs or whatever to keep it in one code base) as long as they use the same algorithm.
One possible test (I'm only using it as an example, not suggesting it) would be to calculate a certain portion of the Mandelbrot Set down to a depth of 10000 and put the results in an array of a certain structure, where it must be done using brute force with a presicion of a least 40 binary significant digits (i.e. 64-bit longs or doubles)
The current distributed.net RC5-64 could be considered an example of such a benchmark - using processor tweeks are good as long as you solve the problem.
Open source can be used to prevent cheating, in that it can be seen that everyone is following the correct algorithm (or strict review by trusted organization as in the case of RC5-64). It also means that people can look over the tweeks for other platforms and see of any of them are applicable.
The rationale for this approach:
(1) change the rules so that what is currently 'cheating' becomes part of the process - it becomes very difficult to cheat.
(2) A lot of 'real world' applications like Photoshop and Quake are presumably using these sorts of tweeks for their inner loops anyway, so this is mirrored by allowing the same tweaks in the tests.
This idea has several downsides:
(1) it can only provide synthetic benchmarks, and on fairly small examples (so optimizing if for particular archectures doesn't require huge resources)
(2) it only tests the speed that can be got using assembler
(3) It requires each platform to have some advocates good enough and willing to put time into optimizing code so every platform gets a fair go.
(4) because the tests are so small, it needs a moderately large number of individual benchmarks - for instance RC5-64 on its own is useless since it doesn't test memory speed, and PowerPC and x86 architectures have the huge advanatage of having rotate instructions.
(5) rather than give a single number (which is what people tend to want), resulting benchmarks would give a set of results for various aspects of the chip - the would make the results of more interest to technically oriented people.
I'd be willing to put a little work into PowerPC G3 and possibly G4(Altivec) optimization in such a project.
A more extreme version of this idea is to allow algorithm optimization too
Prior "Modest Proposal" (Score:2)
(...gee, where did all my Karma go?)
New benchmark needed (Score:1)
How about something which measures how fast Linux stocks are becoming worthless? You could maybe plot it against the frequency of ESR articles at Slashdot in which he tells all of you how rich he is.
Notable Linux milestones today:
Short 'em to the floor, that's what I always say! :)
Cheers,
ZicoKnows@hotmail.com
I'd love to contribute codes (Score:2)
Our plasma simulation group has several simulation codes which would be pretty good as part of an open-source floating-point benchmark suite--*provided* this benchmark suite is distributed under the GPL or Berkeley license.
We considered giving our codes to SPEC, but SPEC wants to be able to *sell* their benchmark suite for $500 a copy. This caused us legal headaches so rather than deal we didn't try to participate in SPECfp2000.
We can offer C and C++ codes which exercise the FPU and memory subsystem heavily: they tend to be cache friendly though.
PeterM
Uhhh....Yeah, but who will use it? (Score:5)
Sure, its a great thing for the rest of us, because we dont have anything we're trying to sell. Just dont expect anyone on the outside to hop on the bandwagon.
Yours In Science,
Bowie J. Poag
Project Founder, PROPAGANDA For Linux (http://metalab.unc.edu/propaganda [unc.edu])
Quake (Score:1)
Re:Slowest BogoMIPS I've seen... (Score:2)
Re:Benchmarks should not be Open Source (Score:2)
Having the source code for it will only make this trick slightly easier (less reverse engineering needed). Besides, if information leaked out that actual HARDWARE cheated on benchmarks, they would be under a LOT of critisism and I suspect they'd be caught rather quickly.
Re:Benchmarks should not be Open Source (Score:1)
Wasn't it the original PowerVR chipset that did this? Or was it one of the early Riva128 ones.. I'm thinking back, way back.. But I do recall there being a big stink over how if you renamed a popular benchmark's
It's no secret that companies cheat on benchmarks. Heck, ATI released an entire set of drivers (the Rage Pro TURBO drivers) that made a few benchmarks faster and a few real games SLOWER. Was there critisism and what not? Lord no. For some reason, it was expected in the 3D accelerator market. I'll paraphrase Brian Hook: "2 years ago, if you went into a trade show with vague specs and no real product, you'd be laughed out. Now it's a way of doing business."
Babylon 5? (Score:1)
Re:Benchmarks by nature are subjective (Score:2)
Benchmarks should be open sourced, the community that uses the system(s) at large should define what the tests
(torturous as they should be) actually test. That will determine the difference between fluff and actual fact.
Of course, it's also Standard Operating Procedure to optimize products to perform well on Benchmarks specifically (I hear stories about compilers that seek out "Whetstones" or "Dhrystones" and will substitute hand-optimized machine code for 'em rather than just compile the code).
Bottom line, is you can't trust 3rd party benchmarks. You need to test a system for your specific application. This, though, is prohibitively expensive for most applications. So you gotta rely on benchmarks.
Therefore, make your benchmark as close to real-world use as possible! Especially if you're open-sourcing it. Then, optimizing for the benchmark is actually optimizing for real-world use.
(The problem with this, of course, is that your real-world use may be dramatically different than mine. If I'm rendering 3D graphics, I have different needs than someone running, say, a web server. So this then requires a family of benchmarks, reflecting real-world usage in different domains of endeavor.)
A benchmark written in Java is a Java benchmark (Score:1)
It may have already been started (Score:1)
Re:"Open source" ideology (Score:1)
--jeff
"We've got a blind date with Destiny...and it looks like she's ordered the lobster."
-The Shoveler, "Mystery Men"
Re:BogoMIPS (Score:1)
The Good, The Bad, and the Ugly... (Score:4)
The Good:
There already is a great benchmark for processors, and it's called SPEC [spec.org]. Yes, it's not open source, but it's really quite reliable for comparing CPUs of any architecture. As slashdot user "cweber [mailto]" pointed out in his post, they have been doing this for 11 years, and they periodically revise their benchmark suite to stress CPUs more uniformly.
The open-source method. This is really good to ensure that there are no cheaters at the benchmark level.
Tom's interesting ideas [tomshardware.com] on Crusoe. This stems from the fact that SPECmarks don't quite approximate real usage that Crusoe depends on to use it's hotspot optimizations. However, we are interested in the raw sustained speed of the processor (in this case), not the speed of the OS or it's task swap latency. Tough problems to solve.
Open-source means that the benchmark code will be able to take advantage of the best compiler available for the target CPU (see comment at end).
The Bad:
Anyone who has done benchmarks knows that even small variations in system config can have strage or harmful effects on the benchmark results. This open-source effort is going to have to have a database of hardware configs in order for this to be useful.
The Ugly:
Vendors are going to oppose this (at least not support it). Why? Because plain and simple they have an interest in promoting the most favorable statistics possible about their products. They want to keep feeding you "polygon fill rates" and "texels per second" because their card may not stand up in a direct test program comparison. Plus, they are just dying to convince you that they have new BogusMarketingAcronym (tm) technology and their competitor does not. Nevermind that SSE and 3Dnow do pretty much the same thing -- companies have an interest in differentiating themselves as much as possible.
If this benchmark actually takes off (and gets widely accepted), we might get cheaters at the firmware or hardware level. This has happened before -- although which company it was and which benchmark they cheated I can't remember. I can't find it on the net or remember to save my life (sigh)...
I also need to say something to the people who think a processor should be judged independently of a compiler. This is just plain dumb. Why? Because a processor and it's compiler are a team. You can't use one without the other. When a chip is designed, there is a direct information dependence between the chip architects and the compiler writers. They are designed as a pair (ideally), and they should be tested as such. If a given compiler has great optimizations, then great! That means the compiler understands its target real well. It is a win for both the CPU and the compiler for pulling it off. This compiler is going to do the same kinds of optimizations when vendors use it to write programs, so that helps the comparison between benchmark code and apps.
However, I can see the need to compare not only the best compiler, but GCC as well, because of its broad acceptance. But if you are serious about performance, and want to get every once of juice out of your chip, you use the vendor provided compilers, not GCC. Don't get me wrong, GCC is great for compliance and portability, but it usually doesn't compare well with vendor compilers for generated code speed (with the possible exception of IA-32).
Ars Technica also published [arstechnica.com], a while back, some good information regarding CPU benchmarks. Check it out if you are interested in SPEC or CPU benchmarks in general.
Useful Benchmarks (Score:1)
Nuff said.
Re:Funny... (Score:4)
Re:This is DEFINATELY a good idea.. and here is wh (Score:1)
Re:Cheating (Score:1)
The benchmarking standard should state that code changes are not allowed, and it should detail how the benchmark was to be run, how it was to be reported and where.
what about SPEC? (Score:2)
As well as generic benchmark can serve anyway. There is of course no substitute to check out a box with your own apps and workload.
Re:This is DEFINATELY a good idea.. and here is wh (Score:2)
That is why open source benchmarks are a good idea -- not only does it allow people to improve on the code directly, but it lets people see exactly what is going on behind the scenes.
That is a very bad idea, indeed. If the code base of the benchmark changes at all, none of the numbers are comparable between releases. This is exactly why tightly controlled benchmarks like SPEC have been successful. SPEC only changes every few years, there are clear rules of what you can do and what not while compiling and running the benchmarks and there are rules about how to report the resulting numbers.
Inasfar as one can trust generic benchmarks, SPEC has held up nicely and allowed us to superficially compare systems from Unix vendors with different CPUs, different architecture and different OS. Even with infrequent updates, the transition from one version of SPEC to the next gets in the way sometimes. I can only imagine how bad a true open source solution without additional rules would be.
Benchmark Quality (Score:1)
I have produced a couple articles on Java performance (JavaLobby & JavaPro Aug'99) and unfortunately, some popular Java benchmarks exhibit various flaws (Caffeine, jBYTEmark, JMark). Even good benchmarks (Volano) are sometimes hard to analyse because I have no access to the sources and I don't know exactly what the benchmark is doing. Statistics without context are useless in the best case, and dishonest in the worst.
Cheating (Score:2)
-----------
"You can't shake the Devil's hand and say you're only kidding."
Re:QuakeIII? (Score:1)
(=
_______________________________________________
There is no statute of limitation on stupidity.
Re:Benchmarks should not be Open Source (Score:2)
_______________________________________________
There is no statute of limitation on stupidity.
Re:BogoMIPS (Score:1)
Surprisingly it works pretty well for day to day stuff but takes almost 5 hours to compile a kernel.
Re:Uhhh....Yeah, but who will use it? (Score:1)
Re:I'd love to contribute codes (Score:1)
Re:BogoMIPS (Score:1)
The real problem is.. (Score:1)
The real problem is crappy benchmarks that don't measure real life performance. Take databases. Every database has a unique set of data with a unique DBA who tunes it in a unique way. It may not even be possible to build a truly neutral benchmark to accurately reflect real life performance.
Also consider the fact that manufacturers will build in tuning tweaks to specifically perform better at some benchmark or another.
If you are going to build benchmarks make sure they take all of this into account.
Re:The Good, The Bad, and the Ugly... (Score:1)
Disclaimer: I have only seen the 3dnow and SSE instruction sets - I haven't used them. I have used the MMX instruction set.
They are similar, yes. But 3dNow is slower - it's 64 bit (instead of 128 bit) and it doesn't have some of the nice instructions that SSE has. Of course, SSE was able to learn from 3dnow's mistakes. There is a difference - it's not just marketing.
"But if you are serious about performance, and want to get every once of juice out of your chip, you use the vendor provided compilers, not GCC."
Um - most users don't compile their own software. Most high-performance software is either:
1. a game. Games tend to use assembly language for optimization.
2. A scientific application. Scientific apps are usually expensive, and you can usually convince a company to let you benchmark it on thier hardware before you buy the hardware.
HINT Benchmark (Score:1)
http://www.scl.ameslab.gov/HINT/ [ameslab.gov]
This benchmark measures performance with multiple dimensions (ie. performance with respect to the size of the problem, the number of processors available, and the size of the cache) It's numerical intensive so it gives a good idea of what scientific programming performance would be like, not say graphics performance...
Re:Slowest BogoMIPS I've seen... (Score:2)
Re:Give Transmeta a little more wiggle room?? (Score:2)
What if an organization maintains a pool of handhelds, and you grab a different one every day? Or for each task?
Even if you never loan it out and it's yours forever, having different users for administrative tasks is a Good Thing; you don't want to be root all the time.
I have some code to contribute, too (Score:1)
Babylon 5 references. (Score:1)
For goodness' sakes. I can't possibly be the first person to notice this.
Didn't anyone else notice that the titles of all the sections in the article were titles from episodes of Babylon 5? I mean come on, "the Geometry of Shadows". "Babylon Squared"!!!!!!!
Geeze. Am I the only sci-fi geek left around here?
Absimiliard
-----------------------------------------------
All sigs are lame, but I've got the lamest sig of all!
This is DEFINATELY a good idea.. and here is why. (Score:2)
Several times, I have read ads for hardware, that proclaim that they are faster then the competitor. They even have pretty bar graphs! And of course, their bar is much longer. However, here is a great place to repeat the long-unblieved-until-now phrase "size doesn't matter."
What is stupifyingly bizzare is that I can turn the page, find the competitors ad, and they proclaim exactly the same thing! It's mind boggling, too say the least.
That is why open source benchmarks are a good idea -- not only does it allow people to improve on the code directly, but it lets people see exactly what is going on behind the scenes.
Or maybe, we shouldn't let companies do benchmarks for their own products. After all, you can make stats say pretty much whatever you want them too. Or maybe we should just ignore ads completely. (I know that I don't trust magazine ads for sure. Rather, I go and find MULTIPLE reviews of hardware, both online, and word-of-mouth, to get an accurate picture.)
,-----.----...---..--..-....-
' CitizenC
' WebMaster, PlanetQ3F [q3f.net]
`-----.----...---..--..-....-
Generically named? Hardly -- quite clever actually (Score:1)
Riiiiight... (Score:1)
Talk about the left hand not knowing what the right is doing...the problem lies not with benchmarks, which are never objective, but biased review sites (such as ones that bash 3dfx for years while running nVidia's advertisements on their frontpage) that can't (or won't) put them in the proper perspective.
telnet://bbs.ufies.org
Trade Wars Lives
Slowest BogoMIPS I've seen... (Score:2)
Re:BogoMIPS (Score:1)
Any one hear of this thing called Math... (Score:1)
Good Benchmarking should start with the Statement of Max, Min from the Specs by the manufacturers of the sub-components and protocals.
Now the Next step is to avoid such silly reports such as a L1 Cached Instruction rep'ed to death.
Generate Constant Stream of interupts both soft and hard from all sources... Now try your Benchmarking proccess.
If you need proof of the idiot ideas benchmarking programs are.. Norton Utilities(DOS) would give a different SI-CPU score based on Idle movements of Users Mouse.
A harrased Box tells no lies. I can calculate how fast the Instuction load times are from the Spec Sheet, what I need to know is Is it running in Spec
Benchmarks by nature are subjective (Score:3)
...just as long as they keep the BogoMIPS around I'm okay with it
zerodvyd
Re:BogoMIPS (Score:2)
Re:Uhhh....Yeah, but who will use it? (Score:2)
Re:Uhhh....Yeah, but who will use it? (Score:2)
Unlike comparing the Ford 460 to the GM 427, now IBM uses commodity processors for most of its machines; You can directly compare, by virtue of the speed and number of processors,(minus the OS and microcode fudge factor) an IBM mini to a SMP PIII, or a Altivec-enabled Mac, or a Alpha. Big Blue's mainframes still use somewhat in-house powerplants, but knowing that the 2001 390 is 1.25 times faster than the last revision isn't going to help you make a purchasing decision between it and a small Alpha cluster..
Re:Here are some suggestions... (Score:1)
What's really needed are some benchmarks which test realistic computer usage. The Spec's try to do this, or did, but it can be argued that a lot of those tasks aren't so common for today's users. How about a benchmark that tests how fast a PC can open Office98, access 3 menu options, type a bit, save, type some more, define a macro, and exit? I haven't seen one of those. But it really would help some folks out.
Re:Here are some suggestions... (Score:1)
-----------------------
Re:Benchmarks (Score:1)
While I don't see this as a justification for trusting closed benchmarking tools, which could be doing far more sinister things (if (manufacturer != Microsoft) { busywait(); } ), it is definitely a problem that the tools themselves will have to address by being comprehensive enough to ensure that it's prohibitively difficult to optimize for that particular benchmarker.
Re:Uhhh....Yeah, but who will use it? (Score:1)
Let's say I'm running AMD, and in the impartial benchmarks, our chips beat equivalent-speed Intel chips by 31%. I'd be a fool _not_ to use those benchmarks in my ads--and crow about how I win using impartial, open benchmarks.
In the same way, it Adobe starts to worry about GIMP and releases a linux version of Photoshop, and it blows away GIMP on the same machine using impartial, open benchmarks, they'll use those benchmarks in their ads.
It's like when the PowerPC G4 did so ridiculously well on Byte's benchmarks, Apple's ads make very clear that the benchmarks came from Byte, and were impartial (hell, if anything, Byte was biased _against_ Apple).
In the end, I'm not sure open benchmarks will improve things all that much. In the examples above, Intel can just make sure that the new PIII Argon improves over the PIII Xeon in the specific areas that will matter.
And if there are too many benchmark suites, it just gets confusing. AMD can advertise that they win on benchmarks A, B, and C, and Intel that they win on D, E, and F--and then who do you believe?
Re:We do need it... I suppose.. (Score:1)
Imagine you're comparing, say, the IBM and Motorola next versions of the PowerPC. They both advertise benchmarks where they clobber each other. In Motorola's case, they're using lots of stuff that takes advantage of Altivec; in IBM's, they're using lots of stuff that takes advantage of parallelism with multiple cores. If you go read the benchmarks and what they were designed for, then maybe you have enough information to figure out that for your usage, two chips with Altivec is better for you than a single double-core chip without Altivec, or vice-versa. But you could probably figure that out without even looking at benchmarks....
But what is a non-biased test going to do in a situation like that? How much is the "right" amount of FPU usage for the test? Or 128-bit vector processing? It really depends on what you do.
I don't think you can do better than specific application tests. If I spend most of my time waiting for a certain set of GIMP filters to do their thing, what could possibly be a better test for me than one that exercizes those filters, on the types of images that I most often?
So an ideal test would have to contain a huge variety of application subtests, and hopefully everyone could look at the particular subtests they care about.
Re:Worthless benchmarks (Score:1)
However, if this is the kind of thing you do all the time, then it's to your benefit if someone tweaks their hardware to improve it.
Apple, with help from Adobe, has gone out of their way to make often-benchmarked Photoshop tests run faster on their machines. But this same effort also makes my actual work with Photoshop go faster--so I have no complaints.
Unfortunately, the end result of optimizing one task may be that other tasks, like starting up Netscape or scrolling in Word, aren't as fast as they could be. Maybe that doesn't matter to you; maybe it does.
The compilation test is a little more problematic. It's very easy to make a compiler run faster by not making the ultimate compiled code slower, or even less accurate. That's obviously something you wouldn't be happy with.
Re:Funny... (Score:1)
---
Benchmarks are a fluke, but necessary... (Score:1)
I'll pick on a popular tool as an example: WebBench. When you talk to folks who do hardware/software that is web related, you inevitably here about WebBench. ("We get XXX connections/sec with WebBench.") For as terrible as Web Bench is, it does provide a standard tool for testing. It's free (as in beer), and anyone can download it from ZDLabs. In other words, I can setup a testbed and compare my numbers with the manufacurers' numbers. If mine are substantially lower, I can go back and ask for an explaination. This gives me power as the buyer. As a developer, it gives me a way of figuring out whether or not my product can compare against another.
But there are a lot of little problems with WebBench that really make it suck. For one, it only runs under Win98. This means I have to physically be at the test location to work with it. Another big problem is when I want to change the parameters of the test, I have to go restart all of my clients -- very tedious. And of course, my favorite -- try running WebBench for an overnight test. It'll crash after a few hours. =( If there was an OSS version of WebBench I could at least see why/where it's crashing. I could also benefit from a lot of other people who are using it to put contributions back to make it more efficient, feature rich, and stable. And the best part -- there are no doubts about how I tweaked the software to make it work better for my config.
Does this mean all benchmarks will be fair? Hell no -- you still can't cure clever sales folks who know how to graph things just the right way to exaggerate their benefits over another product. But at least there will be less voodoo behind how their benchmarks were derived.
'A Modest Proposal': about the title (Score:1)
Re:Here are some suggestions... (Score:1)
Re:BogoMIPS (Score:1)
First one I get with a 2.3.99pre3 kernel and second one with 2.3.41-kernel. I have a 650 Athlon running on a 105Mhz bus thus effectively running at 682.5 Mhz. Superbypass is also enabled.
So what is the reliability of Bogomips according to this or can someone explain a little more?
Re:BogoMIPS (Score:1)
Benchmarking Methodology (Score:1)
My job is to implement and/or alter open source or off-the-shelf software for various new services my company provides. Choice of software and platform is always *very* difficult, and, as you could imagine, performance and reliability play a large part in that decision.
I would like to see a set of non-proprietary, transparent, applied benchmarks developed for important software packages, that would allow me to say "I will use that package, because..." without having to go on rumour, guesswork, sales literature, or, worse still, having the good projects given over to the M$ development people...
I'm not saying it would be an easy to develop well, and it still has huge capacity to be abused by those trying to credit their product (or discredit others'). If someone can get it right, and gain the trust of the commercial world, it could also stand to do the open source movement big favours in the corporate world. We *shall* have Apache on that server, we *will* use Samba, we *can* have OpenLDAP...
Big ifs. I'll be quiet now.
The problem is Van Smith - we need a filter! (Score:1)
The more clued-in Slashdotters have Jon Katz filters installed on the page, but we also need to figure out a way to get a filter on any link to a Van Smith article - he truly is a moron, and in one of the least competent "cyber-journallists" in work today.
The REAL problem with Benchmarks (Score:2)
As the old saying goes, "There are lies, damn lies, and statistics", and benchmarks are the most advanced form of statistics. Draw your own conclusions..
upteen pages of intro (Score:1)
To eliminate hardware bias, write all the benchmark code to the lowest common denominator, perhaps Knuth's MIX/MMIX [stanford.edu] architecture. If you want to know how much your particular hardware is being under-optimized, run the benchmarks under HP's Dynamo [slashdot.org] or the equivalent.
Not worth the web space it occupied (Score:1)
Seriously, there are exactly two benchmarks that really make any difference:
Of these two benchmarks, the second is obviously the most important. (Surprisingly, the Quake FPS freaks aren't far off from the truth here).
Anyone that really wants to know about benchmarking should read the relevant papers or at least read The Art of Computer Systems Performance Analysis [barnesandnoble.com] by Raj Jain (and no, I'm not a shill for Barnes & Noble but that's one spot where you can get the book).
If Van Smith got paid for that article, he should be forced to eat it... byte by byte (sorry, couldn't resist :-). -Brazilian
I'll tell ya who. . . (Score:3)
So to answer your question: Tom's Hardware, and other reputable benchmarking authorities, would use it. TH has rapidly become one of the highest-integrity, best-respected hardware/computing sites around, even (indeed especially) for the Windows crowd. (After, Win32 is still the dominant gaming platform.) If such a thing as open benching became popular, then commercial entities would be FORCED to use the open benchmarks or be accused of marketing skewed numbers, whether those accusations had merit or not.
Re:Uhhh....Everyone if it is the standard. (Score:1)
"IRCache Polygraph Homepage" [ircache.net]
For web cache benchmarking they are the gold standard and everyone in the industry (even the much maligned IBM) shows up to the IRcache events and use Polygraph results in their ad copy.
If you do it right, open source benchmarks can become the standard in an industry...in fact, I think if you do it right, they will become the standard in an industry. Good companies want a level playing field. Bad companies will be weeded out eventually...
And that's all I have to say about that.
We do need it... I suppose.. (Score:2)
YOu would think getting a non biased test would be fairly easy... or perhaps it should be up to the manufactors to make the benchmarks... that why they could bend the thruth as much as possible... but if everyone is bending the truth, then it will all even out.
Yup.
Xscheme Benchmarks (Score:1)
jim
Re:Generic benchmarks are useless (Score:2)
Re:Benchmarks should not be Open Source (Score:2)
I *want* compiler writers and microarchitecture designers to optimize for reasonably well-designed benchmarks, such as Spec. I *want* compilers to recognize critical code fragments, idioms, and kernels in the Spec benchmarks and emit perfectly scheduled code. I don't care if the compiler can't make the optimization in the general case; when I write scientific code, I take care to use the standard style of writing certain common transformations, such as dot products so that compilers (Compaq's ccc and SGI's compilers are excellent in this regard) that target SpecFP pattern match the code and produce good code. I want microarchitecture designers to include elements that make their chips run Spec fast, since if a benchmark in Spec runs quickly and my computational task is similar, it will most likely benefit from any architecture changes as well. Thus, selecting good benchmarks in a suite is utterly critical if the benchmark number will have any value at all; moreover, there are many incidental benefits to selecting benchmarks that represent commonly used tasks or programs.
Re:Here are some suggestions... (Score:2)
br. With regards to your comment about realistic computer usage: those tests that you suggest can be done in such a minute duration of time on any modern computer that they are _not worth testing_! It's essentially "fast enough" for any possible user; let's face it, the CEO's secretary does not need a P-III 800 or an Athlon, to say nothing of an Origin 2000, Starfire, RS/6000, or AlphaCluster.
Here are some suggestions... (Score:4)
Re:BogoMIPS (Score:1)
QuakeIII? (Score:2)
QuakeIII is the most 31337 Benchmark in existence. Dont you guys realize what carmack really was doing when he did a Linux port of QIII?
You need nothing other than QIII for reliable benchmarking of a system.. and any other bench-
marks just dont matter! Nuff said..
:)
Bogomips do rock, but... (Score:2)
It would be great to have tools like that, and create a repository of the results.
Benchmarks (Score:2)
Benchmarks should not be Open Source (Score:2)
Unrelated note: A Modest Proposal was an essay written by Jonathan Swift that proposed that poor people should sell their babies for food. It was satirical and shocking, but most of all, very entertaining.
"Assume the worst about people, and you'll generally be correct"
Useful benchmarks (Score:2)
Most companies that develop systems that might require some form of benchmarks are likely going to have to develop their own with prototypes of their application, I can't see anything arbitrary being very helpful in predicting how a particular system will perform in comparison with any other system.