Forgot your password?
typodupeerror
Intel Hardware

Intel Dismisses 'x86 Tax', Sees No Future For ARM 406

Posted by samzenpus
from the end-of-the-road dept.
MrSeb writes "In an interview with ExtremeTech, Mike Bell — Intel's new mobile chief, previously of Apple and Palm — has completely dismissed the decades-old theory that x86 is less power efficient than ARM. 'There is nothing in the instruction set that is more or less energy efficient than any other instruction set,' Bell says. 'I see no data that supports the claims that ARM is more efficient.' The interview also covers Intel's inherent tech advantage over ARM and the foundries ('There are very few companies on Earth who have the capabilities we've talked about, and going forward I don't think anyone will be able to match us' Bell says), the age-old argument that Intel can't compete on price, and whether Apple will eventually move its iOS products from ARM to x86, just like it moved its Macs from Power to x86 in 2005."
This discussion has been archived. No new comments can be posted.

Intel Dismisses 'x86 Tax', Sees No Future For ARM

Comments Filter:
  • by Man On Pink Corner (1089867) on Thursday June 14, 2012 @10:30PM (#40331275)

    The instruction decoder is such an absurdly tiny part of a modern CPU that it really doesn't matter. CISC often has the ultimate advantage simply because it makes better use of the code cache.

  • by Anonymous Coward on Thursday June 14, 2012 @10:59PM (#40331415)

    The processor architecture is not wildly different between manufacturers. The System On Chip designs in which the CPU is just one element is what makes them different. Should Intel produce custom x86 SoC you can expect the same.

  • by Darinbob (1142669) on Thursday June 14, 2012 @11:01PM (#40331427)

    The instruction set decoder should be an absurdly tiny part, but in modern Intel processors they're not necessarily small. They're dynamically converting an archaic x86 instruction set into an internal RISC-like set.

  • by KingMotley (944240) on Thursday June 14, 2012 @11:15PM (#40331489) Journal

    x86-64 has 16 (64-bit) general purpose registers, but ARM has 8 (32-bit) general purpose registers, and a few specialized ones, some of which are only available in certain operating modes. PowerPC and SPARC both have 32 64-bit registers but can only do register-register type operations (load/store) which quickly forces the registers to be cycled, while x86-64 can do register-memory type operands which is much more efficient.

    And yes, Intel does do a lot of microcode, pipelining, and micro-ops. The great part is that because of it, the instruction set appears CISC externally, but internally through micro-ops, it gets 95% of the benefit that you would see through RISC. Today's x86 chips are more a CISC/RISC hybrid than they are of a pure CISC design. And of course a 200MHz SPARC was worse in performance than the Pentium 2's of the day, and isn't really in the same performance league as today's 4GHz x86-64 processors.

  • Re:He's mostly right (Score:5, Informative)

    by Darinbob (1142669) on Thursday June 14, 2012 @11:22PM (#40331505)

    ARM has fixed length instructions. Thumb is a separate instruction set from ARM and is also a fixed size set. You can't easily interchange ARM and Thumb without making a function call. There is Thumb 2 that interchanges them more easily now. However the instruction set decoder for Thumb to ARM is so very very simple that it could even be a standard project in an undergrad CS class. Thumb really is for people who are willing to give up some performance to save space anyway. ARM has plenty of registers compared to Thumb. I think it has the sweet spot of 16 registers which is enough to not feel cramped but not so many that context switching or interrupts get in your way. ARM is not micro-coded in any model as far as I know, it is RISC and there's no reason to do any micro-coding (maybe in an FPU coprocessor?).

    However it does have a goofy MMU at times, however this is treated as a separate coprocessor and is not intrinsic to the ARM (a different ARM system-on-chip will handle memory mapping and VM differently, it is not standardized).

  • by bzipitidoo (647217) <bzipitidoo@yahoo.com> on Friday June 15, 2012 @12:28AM (#40331821) Journal

    x86 is ugly. It's one of the most screwed up, inconsistent, crufty architectures ever created. Motorola's 68000 architecture was a lot cleaner. But Intel, through sheer brute force, has managed patch up many of its shortcomings and make x86 perform well in spite of itself.

    They went with a load and execute architecture for the x86 instructions. Then they didn't stick to that model for the floating point instructions, going with a stack for that. And remember they split the CPU into 2 parts. If you wanted the floating point instructions, you had to get a very expensive matching x87 chip. I still remember the week when 80387 prices collapsed from $600 to $200, and still no one would buy, not with free emulators and the 486DX nearing release. Another major bit of ugliness was the segment. Rather than a true 32bit architecture, they used this segmented architecture scheme, then buggered it up even more by having different modes. In some modes, the segment and address were simply concatenated for a 32bit address space, and in others 12 bits overlapped to give only a 20bit address space. Then you had all this switching and XMS and EMS to access memory above 1M. Nasty.

    x86 has been bashed for years for not having enough registers. And for making them special purpose. For instance, only one, AX, can be used for integer multiplication. Ask some compiler designers about the x86 sometime. Bet you'll get an earful.

    Few platform segregation points? Maybe, but one price is lots of legacy garbage. x86 still has to support those ancient segmented modes. Then there's junk like the ASCII adjust and decimal adjust instructions: AAA, AAS, AAD, and AAM, and DAA, and DAS. Nobody uses packed decimal any more! And hardly anyone ever used it. Those instructions were a crappy way to support decimal anyway. If they were going to do it at all, should have just had AA for ASCII Add instead of "adjusting" after a regular ADD instruction. Then there's the string search instructions, REPNE CMPSW and relatives. They're hopelessly obsolete. We have much better algorithms for string search than that. They also screwed up the instructions intended for OS support on the 286. That's one reason why the lowest common denominator is i386 and not i286. 286 is also only 16bit.

    You might be tempted to think x86 was good for its time. Nope. Even by the standards and principles of the 1970s, x86 stinks.

    Someone mentioned CISC, as if that beat out RISC? It didn't. Under the hood, modern x86 CPUs actually translate each x86 instruction to several RISC instructions. So why not just use the actual RISC instruction set directly? One argument in favor of the x86 instruction set is that it is denser. Takes fewer bytes than the equivalent action in RISC instructions. Perhaps, but that's accidental. If that is such a valuable property, ought to create a new instruction set that is optimized for code density. Then, as if x86 wasn't CISC enough, they rolled out the MMX, SSE, SSE2, SSE3, SSE4 additions.

    That makes a powerful argument in favor of open source. Could drop all the older SSE versions if only all programs could be easily recompiled.

  • by dgatwood (11270) on Friday June 15, 2012 @12:44AM (#40331911) Journal

    Three watts isn't even close to usable for a mobile phone. At that level of power consumption, you would either have to charge your phone every half hour (by the time you add in the chipset consumption) or build a phone that looks like one of those old portable phones from the 1980s with the small suitcase attached....

    Intel's latest Atom offerings, however, claim to draw about two orders of magnitude less power than that at idle, and are thus in the ballpark for being usable for phones and similar devices. It remains to be seen who will adopt it.

    BTW, last I read, a 2GHz Cortex A9 CPU based on a 40 nm process drew about 250 mW max, not 2W, though those numbers could easily be wrong.

  • by Dahamma (304068) on Friday June 15, 2012 @01:04AM (#40331987)

    And the most insightful post of the thread is from an AC... if you had posted non-AC I might have modded you up ;)

    It also points out how the GP post talking about slow off-die IO is way overrated and really not all that relevant to the mobile/embedded space.

    ARM is winning the embedded STB/TV/BD/phone wars because their core is tiny and integrates well in SoCs. Many of these SoCs have graphics, Ethernet, Wifi, USB, SATA, HW crypto, MPEG decoding, etc all on die, on a $10-20 part. Intel may have something a bit faster, but they don't have anything close in overall features for that price.

  • by KingMotley (944240) on Friday June 15, 2012 @01:07AM (#40332001) Journal

    Yeah, right. 37 of which you can only ever use at most 16. Of which, 5 are taken up already, and personally I wouldn't call the flags register a general purpose register, nor the stack pointer, etc, but apparently they do, lol. Also, look down at the nice graph right below your quote, you will also notice that during Fast Interrupt Routines, you have only 5 registers free to use (R8-R12), and during user mode, you only have 13 (R0-R12) free for your use, and during an IRQ you have 0 free? lol.

    So you have R0-R7 which are what most would consider general purpose, R8-R12 are special and only available in certain operating modes, and R13,14,15 aren't what most would consider general purpose.

  • by rev0lt (1950662) on Friday June 15, 2012 @01:42AM (#40332163)

    hen they didn't stick to that model for the floating point instructions, going with a stack for that. And remember they split the CPU into 2 parts. If you wanted the floating point instructions, you had to get a very expensive matching x87 chip.

    ... The same as Motorola.(http://en.wikipedia.org/wiki/Motorola_68881). They began to integrate an FPU about the same time (68040/486DX).

    Another major bit of ugliness was the segment. Rather than a true 32bit architecture, they used this segmented architecture scheme, then buggered it up even more by having different modes.

    You mean, having a 16-bit cpu support a FULL MEGABYTE instead of the usual 64Kb of Ram? In 1979? Pure evil.

    In some modes, the segment and address were simply concatenated for a 32bit address space, and in others 12 bits overlapped to give only a 20bit address space. Then you had all this switching and XMS and EMS to access memory above 1M. Nasty.

    You do know that XMS memory is just linear memory above real memory, right? And that EMS whas just a PC-compatible paging memory layout, right? Because you seem to lack basic understanding of the architecture.

    Few platform segregation points? Maybe, but one price is lots of legacy garbage. x86 still has to support those ancient segmented modes.

    Thank god. I can still run FreeDOS.

    They're hopelessly obsolete. We have much better algorithms for string search than that.

    While the instructions you mentioned are used for string comparison, that's not their sole purpose. They compare bytes. not strings.

    We have much better algorithms for string search than that.

    Please do tell. Because null detection in a couple of opcodes isn't something easy to come by.

    They also screwed up the instructions intended for OS support on the 286.

    If you are talking about MMU, they dind't screw up. Nobody cared about 16-bit support.

    That's one reason why the lowest common denominator is i386 and not i286. 286 is also only 16bit.

    Nobody cared about i386 MMU either, upto Windows 3.0. That's why early versions of 386 were buggy as hell (such as skipping the first GDT entry - yup. it's a 386 bug, not a feature).

    Someone mentioned CISC, as if that beat out RISC? It didn't. Under the hood, modern x86 CPUs actually translate each x86 instruction to several RISC instructions. So why not just use the actual RISC instruction set directly? One argument in favor of the x86 instruction set is that it is denser. Takes fewer bytes than the equivalent action in RISC instructions. Perhaps, but that's accidental. If that is such a valuable property, ought to create a new instruction set that is optimized for code density

    That is the first thing on your comment that is right on the spot.

    Then, as if x86 wasn't CISC enough, they rolled out the MMX, SSE, SSE2, SSE3, SSE4 additions.

    ...And then you lose it. Vector instructions were a FPU feature (the 487 ITT had it), and Intel also had a peek with their RISC cpus, i860/i960. With the advent of DSPs, this kind of technology came even more common.

    That makes a powerful argument in favor of open source. Could drop all the older SSE versions if only all programs could be easily recompiled.

    Older programs will run faster on new CPUs. In many cases, they won't take advantage of SSE at all if both the algorithm and the compiler aren't optimized for the use of those instructions.

  • by Ginger Unicorn (952287) on Friday June 15, 2012 @06:09AM (#40333189)
    There are two versions [arm.com] - speed optimised @2ghz and power optimised @800Mhz to ~1GHz. Speed optimised draws 1.9W and power optimised draws 0.5W.

On the Internet, nobody knows you're a dog. -- Cartoon caption

Working...