Research Shows RISC vs. CISC Doesn't Matter 161
fsterman writes The power advantages brought by the RISC instruction sets used in Power and ARM chips is often pitted against the X86's efficiencies of scale. It's difficult to assess how much the difference between instruction sets matter because teasing out the theoretical efficiency of an ISA from the proficiency of a chip's design team, technical expertise of its manufacturer, and support for architecture-specific optimizations in compilers is nearly impossible . However, new research examining the performance of a variety of ARM, MIPS, and X86 processors gives weight to Intel's conclusion: the benefits of a given ISA to the power envelope of a chip are minute.
isn't x86 RISC by now? (Score:5, Informative)
i've read the legacy x86 instructions were virtualized in the CPU a long time ago and modern intel processors are effectively RISC that translate to x86 in the CPU
Re:isn't x86 RISC by now? (Score:5, Informative)
After AMD lost the license to manufacture Intel i486 processors, together with other people, they were forced to design their own chip from the ground up. So they basically used one of the 29k RISC processors and put an x86 frontend on it. Cyrix did more or less the same thing at the time also coming with their own design. Since the K5 had good performance per clock but could not clock very high and was expensive AMD was stuck and to get their next processor they bought a company called NexGen which designed the Nx586 processor which was Intel compatible. AMD then worked on the successor of Nx586 as a single chip which was the K6. The K7 Athlon was yet another design made by a team headed by Dirk Meyer who used to be a chip designer at Digital Equipment Incorporated i.e. DEC. He was one of the designers of the Alpha series of RISC CPUs and the Athlon resembles an Alpha chip internally a lot because of that.
This is a myth that is not true (Score:5, Informative)
That is correct. Every time this comes up I like to spark a debate over what I perceive as the uselessness of referring to an "instruction set architecture" because that is a bullshit, meaningless term and has been ever since we started making CPUs whose external instructions are decomposed into RISC micro-ops. You could switch out the decoder, leave the internal core completely unchanged, and have a CPU which speaks a different instruction set. It is not an instruction set architecture. That's why the architectures themselves have names. For example, K5 and up can all run x86 code, but none of them actually have logic for each x86 instruction. All of them are internally RISCy. Are they x86-compatible? Obviously. Are they internally x86? No, nothing is any more.
This same myth keeps being repeated by people who don't really understand the details on how processors internally work.
You cannot just change the decoder, the instruction set affect the internals a lot:
1) Condition handling is totally different on different instruciton sets. This affect the banckend a lot. X86 has flags registers, many other architectures have predicate registers, some predicate registers with different conditions.
2) There are totally different number of general purpose and floating point registers. The register renamer makes this a smaller difference, but then there is the fact that most RISC's use same registers for both FPU and integer, X86 has separate registers for both. And this totally separates them, the internal buses between the register files and function units in the processor are done very differently.
3) Memory addressing modes are very different. X86 still does relatively complex address calculations on single micro-operation, so it has more complex address calculation units.
4) Whether there are operations with more than 2 inputs, or more than 1 output has quite big impact on what kind of internal buses are needed, how many register read and write ports are needed.
5) There are a LOT of more complex instructions in X86 ISA which are not split into micro-ops but handled via microcode. the microcode interpreter is totally missing on pure RISCs ( but exists on some not-so pure RISC's like Powe/PowerPC).
6) Instruction set dictates the memory aligment rules. Architectures with more strict alignment rules can have simples load-store-units.
7) Instruction set dictatetes the multicore memory ordering rules. This may affect the load-store units, caches and buses.
8) Some instructions have different bitnesses in different architectures. For example x86 has N x X -> 2N wide multiply operations which most RISC's don't have. So x86 needs bigger/different multiplier than most RISCs.
9) X87 FPU values are 80-bit wide(truncated to 64-bit when storing/loading). Practically all the other CPU's have maximum of 64-bit wide FPU values (though some versions Power have support for 128-bit FP numbers also)
Re:This is a myth that is not true (Score:4, Informative)
Some of what you said is legitimate. Most of it is irrelevant, since it does not speak to the postulate. You're speaking of issues which will affect performance. So what? You'd have a less-performant processor in some cases, and it would be faster in others.
No.
1) if the codition codes work totally differently, they don't work.
2) The data paths needed for separate and compined FP and integer regs are so different that it makes absolutely NO sense to have them together in chip that runs x86 ISA, even though it's possible.
3) If you don't have those x86-compatible address calculation units, you have to break most of memory ops into more micro-ops OR even run them with microcode. Both are slow. And if you have a RISC chip you want to have only the address calculation units you need for your simple base+offset addressing.
4) In the basic RISC pipeline there are two operands, one output/instruction. There are no data paths for two results, you cannot execute operations with multiple outputs such as x86 muliply which produces 2 values(low and high part of result), unless you do something VERY SLOW.
6) IF your RISC instruction set says you have aligned memory operations, you design your LSU to have only those, as it makes the LSU's much smaller, simpler and faster. But you need unaligned accesses for x86.
9) If your FPU calculates with different bit width, it calculates wrongly.
And
Re:isn't x86 RISC by now? (Score:4, Informative)
They're not the only ones. The IBM mainframes have long been VMs implemented on top of various microcode platforms.
But the microcode implemented part or all of an interpreter for the machine code; the instructions weren't translated into directly-executed microcode. (And the System/360 Model 75 did it all in hardware, with no microcode).
And the "instruction set" for the microcode was often rather close to the hardware, with extremely little in the way of "instruction decoding" of microinstructions, although I think some lower-end machines might have had microinstructions that didn't look too different from a regular instruction set. (Some might have been IBM 801s [wikipedia.org].)
So that's not exactly the same thing as what the Pentium Pro and successors, the Nx586, and the AMD K5 and successors, do.
Currently mainframe processors, however, as far as I know 1) execute most instructions directly in hardware, 2) do so by translating them into micro-ops the same way current x86 processors do, and 3) trap some instructions to "millicode", which is z/Architecture machine code with some processor-dependent special instructions and access to processor-dependent special registers (and, yes, I can hear the word PALcode [wikipedia.org] being shouted in the background...). See, for example, " A high-frequency custom CMOS S/390 microprocessor" [ieee.org] (paywalled, but the abstract is free at that link, and mentions millicode) and "IBM zEnterprise 196 microprocessor and cache subsystem" [christianjacobi.de] (non-paywalled copy; mentions microoperations). I'm not sure those processors have any of what would normally be thought of as "microcode".
The midrange System/38 and older ("CISC") AS/400 machines also had an S/360-ish instruction set implemented in microcode. The compilers, however, generated code for an extremely CISCy processor [ibm.com] - but that code wasn't interpreted, it was translated into the native instruction set by low-level OS code and executed.
For legal reasons, the people who wrote the low-level OS code (compiled into the native instruction set) worked for a hardware manager and wrote what was called "vertical microcode" (the microcode that implemented the native instruction set was called "horizontal microcode"). That way, IBM wouldn't have to provide that code to competitors, the way they had to make the IBM mainframe OSes available to plug-compatible manufacturers, as it's not software, it's internal microcode. See "Inside the AS/400" [amazon.com] by one of the architects of S/38 and AS/400.
Current ("RISC") AS/400s^WeServer iSeries^W^WSystem i^WIBM Power Systems running IBM i are similar, but the internal machine language is PowerPC^WPower ISA (with some extensions such as tag bits and decimal-arithmetic assists, present, I think, in recent POWER microprocessors but not documented) rather than the old "IMPI" 360-ish instruction set.
The main differences between RISC and CISC, as I recall were lots of registers and the simplicity of the instruction set. Both the Intel and zSeries CISC instruction sets have lots of registers, though.
Depends on which version of the instruction set and your definition of "lots".
32-bit x86 had 8 registers (many x86 processors used register renaming, but they still had only 8 programmer-visible registers, and not all were as general as one might like), and they only went to 16 registers in x86-64. System/360 had 16 general-purpose registers (much more regular than x86, but that's not setting the bar all that high :-)), and that continues to z/Architecture, althoug