Casting a Jaundiced Eye On AnTuTu Benchmark Claims Favoring Intel 82
MojoKid writes "Recently, industry analysts came forward with the dubious claim that Intel's Clover Trail+ low power processor for mobile devices had somehow seized a massive lead over ARM's products, though there were suspicious discrepancies in the popular AnTuTu benchmark that was utilized to showcase performance. It turns out that the situation is far shadier than initially thought. The version used in testing with the benchmark isn't just tilted to favor Intel — it seems to flat-out cheat to accomplish it. The new 3.3 version of AnTuTu was compiled using Intel's C++ Compiler, while GCC was used for the ARM variants. The Intel code was auto-vectorized, the ARM code wasn't — there are no NEON instructions in the ARM version of the application. Granted, GCC isn't currently very good at auto-vectorization, but NEON is now standard on every Cortex-A9 and Cortex-A15 SoC — and these are the parts people will be benchmarking. But compiler optimizations are just the beginning. Apparently the Intel code deliberately breaks the benchmark's function. At a certain point, it runs a loop that's meant to be performed 32x just once, then reports to the benchmark that the task completed successfully. Now, the optimization in question is part of ICC (the Intel C++ compiler), but was only added recently. It's not the kind of procedure you'd call by accident. AnTuTu has released an updated "new" version of the benchmark in which Intel performance drops back down 20-50%. Systems based on high-end ARM devices again win the benchmark overall, as they did previously."
But still... (Score:3, Insightful)
Re:But still... (Score:3, Insightful)
if you use icc instead of gcc for x86 then you should use the ARMCC compiler or Keil or one of the others for arm.
Yet it does not make this "benchmark" honest. (Score:5, Insightful)
Time for ARM to invest in GCC (Score:3, Insightful)
ARM looks like a sore loser here.
>GCC isn't currently very good at auto-vectorization, but NEON is now standard on every Cortex-A9 and Cortex-A15 SoC
So the conclusion is to remove intel optimizations instead of improving ARM ones?
Re:Benchmarks, trustworthy? (Score:5, Insightful)
To be fair, any use of a benchmark to judge which system to buy is pretty silly. The best benchmark you can make is something that is identical to your intended workload; eg play a game or use an application on several systems, and see which feels better to you.
Taking some code written in a high-level language and compiling it for a platform is a great benchmark - if that's what you're going to be doing with the system. But you'd better be using the compiler you'll be using on the system. If you need free, you should test GCC on both. If you are considering buying Intel's compiler (it's not free, is it?), then add it in as another test to see if it's worth the extra outlay of cash. Intel puts a lot of work into making compilers very good on its systems, so if you're going to use the Intel compilers for Intel systems, it's perfectly valid to compare against using GCC on an ARM platform, if that's what you'd be using on ARM.
But if most of what you're running will be compiled in GCC for either platform, yes, you should absolutely test GCC on both.
That said, much of what's noted isn't necessarily intentional wrongdoing. For the example of breaking functionality, it's quite possible that the compiler made a perfectly valid optimization to get rid of 31 of the 32 loop iterations. One of my professors once told a story about how he wrote a benchmark, and upon compiling it, found that he was getting some unbelievably fast results. As in literally unbelievable - upon investigation, he discovered that the main loop of the benchmark had been completely optimized away, because the loop was producing no externally visible results. (As an example, if the loop were to do "add r3 = r2, r1" 32 times, a good compiler could certainly optimize that down to a single iteration of the loop; as long as r2 and r1 are unchanging, then you only need to do it once. Similarly, even if r1 and r2 are changing on each iteration, you need to use the result in r3 from each iteration of the loop, otherwise you could optimize it to only perform the final iteration, and the compiler could pre-compute the values that would be in r2 and r1 for that final iteration.)
So perhaps it's a bad benchmark - but I wouldn't default to calling it malicious, just that the benchmark isn't measuring what you might want it to measure. And quite frankly, most users aren't going to be doing anything that even vaguely resembles a benchmark anyway, so they really have little justification to make a buying decision based on them.