Intel Dismisses 'x86 Tax', Sees No Future For ARM 406
MrSeb writes "In an interview with ExtremeTech, Mike Bell — Intel's new mobile chief, previously of Apple and Palm — has completely dismissed the decades-old theory that x86 is less power efficient than ARM. 'There is nothing in the instruction set that is more or less energy efficient than any other instruction set,' Bell says. 'I see no data that supports the claims that ARM is more efficient.' The interview also covers Intel's inherent tech advantage over ARM and the foundries ('There are very few companies on Earth who have the capabilities we've talked about, and going forward I don't think anyone will be able to match us' Bell says), the age-old argument that Intel can't compete on price, and whether Apple will eventually move its iOS products from ARM to x86, just like it moved its Macs from Power to x86 in 2005."
Well... (Score:5, Insightful)
Turn that boat around (Score:5, Insightful)
Now they are putting all their engineering muscle into minimizing power requirements, while maintaining high performance.
I don't see any reason to think they won't succeed, and if they do, then ARM will end up a niche architecture.
He's mostly right (Score:5, Insightful)
Compounding this fact, ARM isn't that great of an architecture. It's got variable length instructions, not enough registers, microcoded instructions, and a horrible, horrible virtual memory architecture.
The big thing that ARM has is the licensing model. ARM will give you just about everything you need for a decent applications SOC. Processor, bus, and now even things like GPU and memory controllers. Sprinkle in your own companies' special sauce, and you have a great product. All they ask is for a little bit of royalty money for every chip you sell. And since everyone is using pretty much the same ARM core, the tools and "ecosystem" is pretty good.
But there's not much of an advantage to the architecture... the advantage is all in the business model, where everyone can license it on the cheap and make a unique product out of it.
And nowadays, the CPU is becoming less important. It's everything around it -- graphics, video, audio, imaging, telecommunications -- is what makes the difference.
Re:Speed versus complexity (Score:5, Insightful)
For example, taking your point about data bandwidth, because the x86 has so few registers, it has to do data IO a lot more compared to something like the PowerPC or SPARC.
To make up for that, Intel built a lot of logic in microcode and pipe-lining. It was a lot of work, but they did it well, so the x86 gets acceptable performance. All that extra logic takes power though. So Intel has a tradeoff between power consumption and performance that they can make. This guy seems to be saying they will switch to reduce power consumption, and then make up for it by having the best manufacturing process once again.
And they do. For probably as long as chips continue to get smaller, Intel will have the advantage.
He's missing the point... (Score:5, Insightful)
Definition of "efficient" (Score:5, Insightful)
From Intel: Work done per watt
From ARM: System power draw small enough for handheld & long battery life
A year or two ago, I read a study that the most ops/watt were still done by high-end Intel processors sucking tons of power each. They did so much work so fast that the per-watt work done was still beyond the tiny-power-sipping ARMs that were relatively slow but still quite capable. Has this changed in the last generation or two of CPUs?
Re:Speed versus complexity (Score:5, Insightful)
And we know who lost that one. Badly.
We do? The world's fastest supercomputer (K computer [wikipedia.org]) is RISC based, and ARM is RISC, so it seems very much alive. Also CISC now has pipelining which was the thing that originally made RISC awesome, and RISC has gotten more complex, so they have evolved to be closer to each other. I am sure there are other factors that are more important for energy efficiency (mainly transistor size) and I don't have an opinion on that, but I don't understand where you are coming from.
Re:Speed versus complexity (Score:3, Insightful)
You know, we had the same argument with RISC versus CISC architecture. And we know who lost that one. Badly.
Wait, which one lost?
RISC lost because instructions took too much space and caused cache misses.
CISC lost because it couldn't perform -- practically every CISC processor designed today is a RISC processor + instruction set translation.
As is frequently the case between two pure ideas (that are both legitimate enough to be seriously considered long enough for a decent flame war), the winner is actually a clever but "impure" choice combining the merits of both.
And the reason for that is because the bandwidth outside the processor, the I/O, is so damnably slow compared to what's possible on the die itself. That's why the data transfers to and from the CPU are only about 1/30th or less the speed at which the CPU runs internally. The only logical course of action is to do as much as you can on each byte of data coming off the bus as you can.
Yes, which is why ARM, despite/because of being far from a true RISC, does so well: it speaks multiple instruction sets to fit more code in cache, and does some weird -- but elegantly implemented -- extra stuff to do more with data coming off the bus. (I'm thinking specifically of the inline barrel shifter, but there's a couple other, less drastic bits of cleverness I'm not thinking of ATM.) Plus there are SIMD extensions in wide use (just like x86 -- not an advantage, but not a disadvantage).
No reason a pure CISC instruction set from the 80s can't make the same performance, but only at the cost of added logic for interpreting instructions -- and more logic means more power.
Now this next bit is just silly, and a red herring to boot (since neither ARM nor x86 are typically "massively parallel" at all), but I'll answer it anyway:
Besides, look at Nvidia's GPU cores: They throw hundreds of cores onto the die, but it eats hundreds of watts as well. Massively parallel and simple instruction sets don't appear to translate into energy savings.
OK, now spec an x86 processor (with full modern SIMD instructions, naturally) that can do the operations typically done on such a GPU at the same speed. Or rather, spec a multi-processor machine or cluster with n CPUs, since that's what you'll need... Got it? Now multiply the TDP of that CPU by n, and compare. Oh, it looks like the massively parallel and simple instruction set does translate into energy savings for the same parallel-friendly workload.
Re:Definition of "efficient" (Score:5, Insightful)
Re:Speed versus complexity (Score:5, Insightful)
No one who had never seen x86 would design an instruction set like it has. It exists this way not because someone designed it from scratch but because it is the end result of a long series of backward's compatible decisions, stretching all the way back to the 4004. Everytime Intel tries to start from a clean slate those CPUs do not take off or get enough time in the market place to prove themselves. The customers always demand that the new CPUs be able to run old software.
It's actually a surprise that ARM is taking off more in higher end systems (higher end meaning tablets and smart phones). I think this is precisely because the backward's compatibility is not necessary there.
Re:Make mycleanpc reference shit eating (Score:4, Insightful)
Oh come on moderators.
That link is the 2nd most disgusting thing besides Goatse and I am sick and tired of that Mycleanx troll (wont say it as it will increase his SEO and page ranking.
The only way we can stop that dipshit is to lower his Google ranking or the more he spams the more we will bring troll sites for his potential customers instead.
Re:Speed versus complexity (Score:5, Insightful)
The GP didn't say anything of the sort. He was pointing out that to say "CISC won" is only true if you consider that x86 is CISC and Intel spend gobs of money to be at the forefront of CPU manufacturing technology, both in shrinking die size/increasing clock speed and shoehorning all the negative characteristics of the x86 design into a form that was more RISC like so it could allow for super-scalar and deep pipeline designs. Intel deserves a lot of credit in proving just how far CISC design can go. But it certainly wasn't that CISC won because it had greater strengths.
Sounds like Linux on the x86, actually. Seriously, though, RISC design tends to have a few very strong design elements: it tends to have a good many registers which absolves a lot of cache/stack work, it tends to have a fixed opcode size and requires aligned memory which usually improves throughput and allows for a much more streamlined instruction decoding engine, and precisely because there's a lot less need to support legacy platforms there's a lot more leeway to segment memory for power considerations.
Well, you can think MS's monopolistic actions for that. Seriously, "ugly and hackish"* might well describe near everything MS and Intel can be known for, in their question to maintain backwards compatibility. And if Intel had started out with an 8-bit RISC design, I'm certain there'd be the same problems, so it's not really an x86/CISC thing. Never the less, it's precisely the fact that Intel is unlikely to allow platform segregation points that x86 will probably never be low power.
*And please realize, I say this with a great deal of respect towards both Intel and MS in maintaining performance giving how many hacks they've put in over the years to compensate for not only their own bugs but the bugs of other developers. So, as pretty and clever as a lot of the hacks may be, it's still ugly overall to have the hacks in the first place and to have so many over so many places and to be so incapable of removing any without the risk of significant backlash or simply to lose their customer base. Ie, the code may be pretty but it's put them in an ugly place.
ARM has some advantages (Score:5, Insightful)
They don't have all the legacy instruction set issues to deal with. Intel must be backward compatible with all previous versions. Remember, the 8080 subset is still alive and well in the INTEL architecture. This comes with a cost.
It's easier to move up from a lower power system to a higher power system. In this context power can be thought of as both electrical power consumption and as compute power. Moving down means something must be simplified/eliminated, and the backwards compatibility issues makes this much harder.
When it comes to mobile devices, ARM owns the market and has the network effect working for it. This is how INTEL kept a stranglehold on the PC market, but it works against them for mobile.
ARM is not monolithic in the same way as INTEL. Because of the license based IP model, there are many more variations of ARM chips then INTEL chips. The resources to make variations comes from the IP user base, not from ARM. A single company, no matter how dominant, cannot afford to support that many variants. If some of the versions fail, the cost is not born by ARM. If INTEL guesses wrong and makes a dud, they have to absorb the cost.
INTEL is no pushover, but I think ARM has the advantage.
Re:Speed versus complexity (Score:4, Insightful)
Unfortunately every time you add circuitry like that, you also increase power consumption. Which is where difficulty comes in for Intel, when it's trying to make the tradeoff between power consumption and performance.
Re:Speed versus complexity (Score:2, Insightful)
Special operating modes in which R8-R12 aren't available are limited to certain exceptionally-low-latency-interrupt code, or in some supervisor modes. So unless you're actually in the code bridging to your kernel, or you're in an interrupt, R8-R12 are available for use. Even in those 'other modes' R8-R12 are available for use if you preserve them so that you don't trash over whatever regular code was running.
If we don't consider the mode-switching stuff sanely, x86 has NO general purpose registers since in some operating modes (like handling interrupts) you have to preserve their contents before using them! How awful, you have to do that for 4 out of 16 registers on ARM since they don't switch them in hardware for you.
Incidentally, there is no such thing as a stack pointer register on ARM. You can use the stack operations on any R0-15 register. Common C compilers use an ABI such that R13 is reserved for use as a stack pointer, but that's not an architectural requirement. There's nothing different between R13 and the other registers R0-12 - all the same instructions behave in all the same ways.
R14 and R15 are certainly 'special', but you can still use them in virtually any instruction and any addressing mode.
So for points of comparison, in normal operation, I'd claim that x86 has 4 general purpose registers, 4 effectively reserved for addressing magic since you're limited in what instructions can operate on them, and some others. On ARM, I'd claim 13 general purpose registers, with 2 that were special, and some others.
Re:Speed versus complexity (Score:4, Insightful)
Early ARM chips didn't have an integer divide instruction because it took up to 12 cycles[1] to perform integer division and you could get the same performance without complicating the pipeline without it. Integer division is often cited as one of the main reasons why RISC had problems, because newer techniques reduced the number of cycles required to perform integer division, so newer CISC chips just used those in place of the old microcoded loops, while RISC code got no benefit unless the instruction set was extended and the code recompiled. Modern RISC chips - including ARM - do have integer division instructions though, and compilers use them, so this is something of a moot point.
[1] It was a variable number, which made life very difficult for hardware designers. One of the benefits early RISC architectures had was the fact that their instructions took the same length of time to execute, so the pipeline could be very simple.
Re:Speed versus complexity (Score:4, Insightful)
As far as the IPC difference between Intel and ARM, I'm going to side with Intel this time and say that architecture doesn't really matter. The back-end of these chips all run RISC-like. Cache sizes are going to be similar and the Intel core isn't all that sophisticated. There is no reason to believe that, at a given frequency, x86 performance will be significantly better than ARM performance. The argument is whether or not, at a given frequency, the added area required to decode x86 represents a significant additional power draw (or, worse yet, additional pipeline stages, which would have a detrimental impact on x86 performance.)
As far a fabs go, Intel is playing this in an interesting way. Intel seems to be using mobile chips as a way to keep their older fabs busy. This makes the mobile chips very nearly free for them to manufacture. They're just keeping up with ARM, rather than moving to their current process and absolutely blowing them away. So, let's be clear. Intel could be a die shrink ahead of where they are, which probably would make the x86 cores on a newer process better than the ARM ones on an older process. Intel is staying on the old process for cost reasons, not performance ones.
AMD doesn't really have anything that plays in the mobile space, but their closest comparison is Bobcat. Bobcat is a pretty good core for the power envelope it works in. I think AMD could build an x86 core for the mobile space, if they wanted to. The real problem is that they couldn't maintain current performance while using a back-level process to compete with Intel on cost. In some ways Intel might prefer that they could, as it might make x86 in the mobile space seem less like locking yourself into a single vendor, indirectly helping Intel sell Medfield.