Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage Hardware IT Technology

Slashdot Asks: What's Your View On Benchmark Apps? 50

There's no doubt that benchmark apps help you evaluate different aspects of a product, but do they paint a complete picture? Should we utterly rely on benchmark apps to assess the performance and quality of a product or service? Vlad Savov of The Verge makes an interesting point. He notes that DxOMark (a hugely popular benchmark app for testing a camera) rating of HTC 10's camera sensor is equal to that of Samsung's Galaxy S7, however, in real life shooting, the Galaxy S7's shooter offers a far superior result. "I've used both extensively and I can tell you that's simply not the case -- the S7 is outstanding whereas the 10 is merely good." He offers another example: If a laptop or a phone does well in a web-browsing battery benchmark, that only gives an indication that it would probably fare decently when handling bigger workloads too. But not always. My good friend Anand Shimpi, formerly of AnandTech, once articulated this very well by pointing out how the MacBook Pro had better battery life than the MacBook Air -- which was hailed as the endurance champ -- when the use changed to consistently heavy workloads. The Pro was more efficient in that scenario, but most battery tests aren't sophisticated or dynamic enough to account for that nuance. It takes a person running multiple tests, analyzing the data, and adding context and understanding to achieve the highest degree of certainty. The problem is -- more often than not -- gadget reviewers treat these values as the most important signal when judging a product, which in turn, also influences several readers' opinion. What's your take on this?
This discussion has been archived. No new comments can be posted.

Slashdot Asks: What's Your View On Benchmark Apps?

Comments Filter:
  • Case in point: ADSL line speed. I've had several different ADSL providers, and living somewhat far out, the speed is consistently bad, sometimes awful. But if I try one of the many 'ADSL speed test' websites, the results are always in line with the promised speed. I once routed one of those through a proxy, just for the name change, and the speed was one tenth, same if I accessed it simply by the IP number ! Benchmarks are too easy to cheat. Wasn't it Intel who was caught doing that a few years ago ?
    • ... same if I accessed it simply by the IP number

      Wait, how would that work. I mean, all the name->IP translation happens locally, and only IP addresses are sent out... unless they deeper packet examination. Which seems like a high cost.

      I suppose they could parse the HTTP request headers... or listen for the DNS queries?

      • by unrtst ( 777550 )

        ... same if I accessed it simply by the IP number

        Wait, how would that work. I mean, all the name->IP translation happens locally, and only IP addresses are sent out...

        When you go to http://www.google.com/ [google.com], your browser sends a header saying:
        Host: www.google.com

        When you go to http://206.111.13.26/ [206.111.13.26], that's not sent.

        I suspect the speedtest site was something like HisProvidersName.speedtest.net, and maybe it faked it if it got a connection from an IP within that provider.

    • Case in point: ADSL line speed. I've had several different ADSL providers, and living somewhat far out, the speed is consistently bad, sometimes awful. But if I try one of the many 'ADSL speed test' websites, the results are always in line with the promised speed.

      Not every place you visit (in fact, likely most places) will fully saturate your downstream link. They might have the bandwidth to be capable of doing so, but they ration it on a per-session (sometimes per-IP) basis so that everybody who happens to access the site can get a reasonable speed. (By the way, this is the principle that so called "download accelerators" take advantage of -- they combine multiple sessions into one. But they won't work on a per-IP basis unless you are able to do i.e. multipath TCP.

  • by Prien715 ( 251944 ) <agnosticpope@gmail. c o m> on Friday April 29, 2016 @03:01PM (#52015093) Journal

    A good benchmark -- in cameras, CPUs, GPUs, cars, anything really -- is ideally a set of tests which contains a random sampling of real-world scenarios. In the beginning, the benchmark is good precisely because the vendors are unaware of it and don't spend a bunch of time trying to optimize for it specifically.

    Once a benchmark becomes popular, companies try to make their product better for the benchmark ("See PHB! I increased our PCBench score by 10%!") but CAN ultimately end up doing so in a custom way that doesn't represent real-world performance (e.g. Volkswagen). Because the company is now specifically trying to optimize for a specific use-case, the benchmark is no longer random and thus no longer representative of real-world use.

    Enter a new benchmark, which is really good, and better mirrors real-world performance and the cycle begins anew.

    • I have a friend working on.... a popular webbrowser. They test JS performance (of theirs and competitors') all the time against benchmarks. In theory, those benchmarks are derived from looking at the 1000 most popular sites (according to some site ranking algorithm). If that's true, than that seems to be a valid(ish) benchmark. I mean, those 1000 sites probably account for the vast majority of traffic, and other sites probably model themselves after those 1000 sites.

    • Comment removed based on user account deletion
    • by unrtst ( 777550 )

      Once a benchmark becomes popular, companies try to make their product better for the benchmark ("See PHB! I increased our PCBench score by 10%!") ...

      Slight tangent from this, when management of any kind starts running the benchmarks / tests / security scanners / etc, watch out! Suddenly, there's a huge red flag that must be fixed immediately, and it's just an internal only static site with a self signed cert.

  • Boot up time and Photoshop filters. Use a bittorrent client to measure internet speeds. "Speed test" web sites are dogged down by traffic.

  • An unpleasant side effect of benchmarking is when manufacturers start building products to do well on the benchmark to the detriment of other, also important, specs. So, while the product may kick ass on the common benchmarks it may not be so great because other important stuff gets neglected.. The benchmark process starts steering the design "committee"..
    • You get what you measure. Unfortunately, my use cases and the majority's are not the same.

    • Companies have been known to take this even further. You can probably find plenty of compilers that have something like, "if(this_looks_like_benchmark_x) emit_special_code_for_benchmark_x". I know for a fact that the old Sun compiler could detect a matrix multiply and would emit hand tuned, parallelized assembly when it detected it.

      Vendors will always play games with benchmarks and customers will always read things into benchmarks that aren't true. That's not to say that benchmarks aren't useful but, if

  • >> What's Your View On Benchmark Tools?

    So..who are the "tools" - the shysters creating the benchmarks or the rubes consuming them?
  • by Anonymous Coward

    That's the conclusion I've mostly come to, at least for complete consumer products.

    When I look at the latest Dell, Apple, etc desktop or laptop I already see the figures available from the maker, and often there's at least a few choices in terms of CPU, RAM, or SSD options. The only way performance from one item to another would be considerably different would be if one OEM made a major error.

    On the other hand there are things that are hard to tell from the spec sheet that make a huge difference for me:
    Is

    • Does the case feel like it will fall apart on the first tweak?

      I've bought dozens of PCs, for myself and others. I have carried a laptop for years on bicycle, including on snow/ice and fell multiple times.
      I've never replaced a desktop, and not even a laptop, because of a broken case. Even the so-called "cheap plastic" laptops are more than durable enough for a lifespan of 3-10 years. And even in the unlikely case of a case break, the laptop will most likely continue to work just fine, and therefore the problem would be only cosmetic.
      Too old and too slow, or broken disp

      • by CAIMLAS ( 41445 )

        Not sure which laptops you've bought or how they've dropped, but apparently you've not worked on others' stuff much - people break shit in some really horrible ways. Cracks in the case around the display, particularly near the hinge, are notably problematic, as are around the keyboard. It doesn't take much of a crack for things to start not working properly.

        • Not sure which laptops you've bought

          Mostly cheap ones.

          or how they've dropped,

          That's my point, they haven't. Or they were in their protecting bag when it happened.

  • by QuietLagoon ( 813062 ) on Friday April 29, 2016 @03:14PM (#52015161)
    Benchmark tools do well when they are used for what they are designed to measure. Benchmark tools go off the rails when they are seen and interpreted as some kind of all-purpose suitability tester.
  • by 110010001000 ( 697113 ) on Friday April 29, 2016 @03:21PM (#52015195) Homepage Journal
    "however, in real life shooting, the Galaxy S7's shooter offers a far superior result."
    Says who? The reviewers "objective" opinion? These are the same guys that say a $10,000 audio cable produces "warmer" sounds than a $5 one.
  • by gurps_npc ( 621217 ) on Friday April 29, 2016 @03:22PM (#52015199) Homepage

    You are not looking at God's manual for existence, to check a score, like some kind of video game.

    It's just the results from a test - helpful, but not perfect. Luck, design for the test, and many other factors may affect it.

    If all you do is look at the benchmark, you deserve to be screwed over. Doing so is like looking at new lawyers grades in law school and making the highest score a partner right off the back.

  • If you want to use a test result, you must first understand what the test is measuring. It isn't ever going to be as simple as "Laptop A got 536 and laptop B got 642, therefore laptop B is better at everything." This same thing applies to medical diagnostic tests, or academic test, or product quality tests. Unfortunately, this is hard. Because statistics is hard. And science is hard.

    Sorry. :-(

  • Benchmark: a standard or point of reference against which things may be compared or assessed.

    Yes, benchmarks do a good job of comparing two pieces of hardware, especially tests which involve the entire system. I use benchmarks all the time for hardware comparison and system optimization/overclock comparison. Without benchmark tools we couldn't effectively compare changes to setting or in hardware speed specifically raw CPU, raw GPU, raw RAM, and raw DISK I/O speeds.

    Benchmark tools also help determine system stability by pushing the hardware to the limit and taking it to it's thermal throttling speed.

    So people ship custom hardware to vendors to cheat on benchmark? Yes.

    Will these cheats show up in the reviews on NewEgg, Amazon, and Tom's Hardware when they can't be replicated? Yes

    So please, benchmark away. Publish the results. Keep the data in a table for all to view. Benchmarks keep everyone honest in the end.

    • Yes, benchmarks do a good job of comparing two pieces of hardware, especially tests which involve the entire system.

      No, they usually don't. Doing a "full system test" is almost certainly not going to give you useful information. How do you weigh individual results into a final result? How do you know the vendor hasn't included special cheat modes into the hardware/software to skew the benchmark? How do you know the benchmark is even testing what it claims to be testing?

      Without benchmark tools we couldn't effectively compare changes to setting or in hardware speed specifically raw CPU, raw GPU, raw RAM, and raw DISK I/O speeds.

      Comparing "raw" anything is probably not useful either. Discovering that increasing the CPU speed by 10% increases a benchmark score by 10% is almost

      • So does "while(true);". That doesn't make it a useful benchmark.

        This actually just gets put in the L1/L2 cache of the CPU. [stackoverflow.com]

        In General, if I use a benchmark like Cinebench [maxon.net] it correlates to real world performance in programs like Final Cut, Adobe Premiere, and After Effects for video rendering.

        In all my years of benchmarking and overclocking, I have not found anything suspicious. Years ago there was the whole Intel vs. AMD benchmark bru-ha-ha where benchmarks favored Intel due to compiler optimization favoring Intel hardware, but the CPU wars are long over. AMD lost and n

  • Systematic review is very important; however, in most cases, the system used to review is not complex enough to effectively qualify what's being reviewed.

    It's like any system used to summarize data: fundamentally you're going to get a flawed diagnosis, because it's summarized. Unless you're dealing with a huge amount of data, and the analysis thereof, the answer is almost always "it depends".

    And then there is the 'bias review' introduced in a lot of these benchmark tools. It's why open source benchmark meth

  • Trust (Score:4, Insightful)

    by wickedsteve ( 729684 ) on Friday April 29, 2016 @03:58PM (#52015381) Homepage
    I can't trust benchmarks unless they are actually doing what my device is for. I have a gaming pc so I trust a benchmark tool that actually renders scenes like the games I play. The benchmark records things that apply to my enjoyment of games like frames per second under various settings. If a tool just gives me a grade on some arbitrary scale then it is no use to me.
  • DxOMark is indeed a perfect example of elaborate benchmarking and what can go wrong with it. To make a streamline and objective test they only measure the few things that are the easiest to measure objectively over various cameras. In the end they seem to just combine these test scores and come up with a number that makes no sense if you look at real life performance, since not only they do not measure a multitude of things that also affect performance, but in addition, the way they combine the things they

  • Benchmarks are necessary, but not sufficient way to test things.

    The reason for benchmarks is simple - you want a scientifically repeatable test that can be used to compare things with each other. This limits the benchmark's utility as a real-world test because it's inherently limited in what it can test. All it gives is how your thing measures up to all the other things out there. And yes, benchmarks will be gamed, doesn't matter the field (see VW, Mitsubishi and everyone else with diesel engines). However,

  • Oh the old days where you could rate a system relative to IBM/XT... and even then people had the same discussion.
  • 3dmark? pretty pictures

    iobench? now that's useful

  • Benchmarks are great tools since they are repeatable and give you a picture of what your hardware, phone, etc is capable of.

    However I've learned never to rely on the benchmarks alone as they normally don't mimic real world usage scenarios.

    Tl;dr, great for reference and stress test, bad for real world usage.

You know you've landed gear-up when it takes full power to taxi.

Working...