Research Shows RISC vs. CISC Doesn't Matter 161
fsterman writes The power advantages brought by the RISC instruction sets used in Power and ARM chips is often pitted against the X86's efficiencies of scale. It's difficult to assess how much the difference between instruction sets matter because teasing out the theoretical efficiency of an ISA from the proficiency of a chip's design team, technical expertise of its manufacturer, and support for architecture-specific optimizations in compilers is nearly impossible . However, new research examining the performance of a variety of ARM, MIPS, and X86 processors gives weight to Intel's conclusion: the benefits of a given ISA to the power envelope of a chip are minute.
Re:isn't x86 RISC by now? (Score:4, Interesting)
As far i'm aware since the pentium pro line, the intel CPUs are RISCs with translation layers and AMD been on this boat since the original athlon.
Re:isn't x86 RISC by now? (Score:2, Interesting)
x86 instructions, are in fact, decoded to micro opcodes, so the distinction isn't as useful in this context.
It's a general purpose vs dedicated thing (Score:2, Interesting)
The CPU ISA isn't the important aspect. Reduced power consumption mostly stems from not needing a high end CPU because the expensive tasks are handled by dedicated hardware. What counts as top of the line ARM hardware can barely touch the processing power of a desktop CPU, but it doesn't need to be faster because all the bulk processing is handled by graphics cores and DSPs. Intel has for a long time tried to stave off the barrage of special purpose hardware. The attempts to make use of ever more general purpose CPU power sometimes bordered on sad clown territory (Remember Intel's attempt to make raytracing games look like something worth pursuing? Guess why: Raytracing is notoriously difficult to implement on graphics hardware due to the almost random data accesses.)
Re:isn't x86 RISC by now? (Score:5, Interesting)
That is correct. Every time this comes up I like to spark a debate over what I perceive as the uselessness of referring to an "instruction set architecture" because that is a bullshit, meaningless term and has been ever since we started making CPUs whose external instructions are decomposed into RISC micro-ops. You could switch out the decoder, leave the internal core completely unchanged, and have a CPU which speaks a different instruction set. It is not an instruction set architecture. That's why the architectures themselves have names. For example, K5 and up can all run x86 code, but none of them actually have logic for each x86 instruction. All of them are internally RISCy. Are they x86-compatible? Obviously. Are they internally x86? No, nothing is any more.
Re:isn't x86 RISC by now? (Score:5, Interesting)
x86 instructions, are in fact, decoded to micro opcodes, so the distinction isn't as useful in this context.
They're not the only ones. The IBM mainframes have long been VMs implemented on top of various microcode platforms. In fact, one of the original uses of the 8-inch floppy disk was to hold the VM that would be loaded up during the Initial Microprogram Load (IMPL), before the IPL (boot) of the actual OS. So in a sense, the Project Hercules mainframe emulator is just repeating history.
Nor were they unusual. In school I worked with a minicomputer which not only was a VM on top of microcode, but you could extend the VM by programming to the microcode yourself.
The main differences between RISC and CISC, as I recall were lots of registers and the simplicity of the instruction set. Both the Intel and zSeries CISC instruction sets have lots of registers, though. So the main difference between RISC and CISC would be that you could - in theory - optimize "between" the CISC instructions if you coded RISC instead.
Presumably somebody tried this, but didn't get benefits worth shouting about.
Incidentally, the CISC instruction set of the more recent IBM z machines includes entire C stdlib functions such as strcpy in a single machine-language instruction.
Re:It's a question that WAS relevant (Score:4, Interesting)
I think a large part of the confusion is that CISC often means accumulator architectures (x86, z80, etc) vs RISC which means general purpose register (ppc, sparc, arm, etc) In between you have variable width RISC like thumb2.
As an occasional assembly programmer (PowerPC currently) I far prefer these RISC instructions. With x86 (12+ years ago) I would spend far more instructions juggling values into the appropriate registers, then doing the math, then juggling the results out so that more math could be done. With RISC, especially with 32 GPRs, that juggling is near eliminated to the prologue/epilogue. I hear x86 kept taking on more instructions and that AMD64 made it a more GPR like environment.
-Samuel
Final nail in the Itanium coffin (Score:2, Interesting)
20 years ago, RISC vs CISC absolutely mattered. The x86 decoding was a major bottleneck and transistor budget overhead.
As the years have gone by, the x86 decode overhead has been dwarfed by the overhead of other units like functional units, reorder buffers, branch prediction, caches etc. The years have been kind to x86, making the x86 overhead appear like noise in performance. Just an extra stage in an already long pipeline.
All of which paints a bleak picture for Itanium. There is no compelling reason to keep Itanium alive other than existing contractual agreements with HP. SGI was the only other major Itanium holdout, and they basically dumped it long ago. And Itaiums are basically just glorified space heaters in terms of power usage.
CISC - reduced memory access ... (Score:4, Interesting)
x86 instructions, are in fact, decoded to micro opcodes, so the distinction isn't as useful in this context.
Actually it is. Modern performance tuning has a lot to do with cache misses and such. CISC can allow for more instructions per cache hit. The strategy of a hybrid type design, CISC external architecture and RISC internal architecture definitely has some advantages.
That said, the point of RISC was not solely execution speed. It was also simplicity of design. A simplicity that allowed organization with less money and resources than Intel to design very capable CPUs.
Again, what's the problem ? (Score:4, Interesting)
All the reason you list could all be "fixed in software".
The quotes around the "software" mean that i refer about the firmware/microcode as a piece of software designed to run on top of the actual execution units of a CPU.
No, they cannot. OR the software will be terible slow , like 2-10 times slowdown.
Slow: yes, indeed. But not impossible to do.
What matters are the differences in the semantics of the instructions.
X86 instructions update flags. This adds dependencies between instructions. Most RISC processoers do not have flags at all.
This is semantics of instructions, and they differ between ISA's.
Yeah, I pretty well know that RISCs don't (all) have flags.
Now, again, how is that preventing the micro-code swap that dinkypoo refers to (and that was actually done on transmeta's crusoe)?
You'll just end with a bigger clunkier firmware that for a given front-end instruction from the same ISA, will translate into a big bunch of back-end micro-ops.
Yup. A RISC's ALU won't update flags. But what's preventing the firmware to dispatch *SEVERAL* micro-ops ? first to do the base operation and then aditionnal instructions to update some register emulating flags?
Yes, it's slower. But, no that don't make micro-code based change of supported ISA impossible, only not as efficient.
The backend, the micro-instrucions in x86 CPUs are different than the instructions in RISC CPU's. They differ in the small details I tried to explain.
Yes, and please explain how that makes *definitely impossible* to run x86 instruction? and not merely *somewhat slower*?
Intel did this, they added x86 decoder to their first itanium chips. {...} But the perfromance was still so terrible that nobody ever used it to run x86 code, and then they created a software translator that translated x86 code into itanium code, and that was faster, though still too slow.
Slow, but still doable and done.
Now, keep in mind that:
- Itanium is a VLIW processor. That's an entirely different beast, with an entirely different approach to optimisation, and back during Itanium development the logic was "The compiled will handle the optimising". But back then such magical compiler didn't exist and anyway didn't have the necessary information at compile time (some type of optimisation requires information only available at run time. Hence doable in microcode, not in compiler).
Given the compilers available back then, VLIW sucks for almost anything except highly repeated task. Thus it was a bit popular for cluster nodes running massively parallel algorithms (and at some point in time VLIW were also popular in Radeon GFX cards). But VLIW sucks for pretty much anything else.
(Remember that, for example, GCC has auto-vectorisaion and well performing Profile-Guided-Optimisation only since recently).
So "supporting an alternate x86 instruction on Itanium was slow" has as much to do with "supporting an instruction set on a back-end that's not tailored for the front-end is slow" as it has to do with "Itanic sucks for pretty much everything which isn't a highly optimized kernel-function in HPC".
But still it proves that runing a different ISA on a completely alien back-end is doable.
The weirdness of the back-end won't prevent it, only slow it down.
Luckily, by the time Transmeta Crusoe arrived:
- knowledge had a bit advance in how to handle VLIW ; crusoe had a back-end better tuned to run CISC ISA
Then by the time Radeon arrived:
- compilers had gotten even better ; GPU are used for the same (only) class of task at which VLIW excels.
The backend of Crusoe was designed completely x86 on mind, all the execution units contained the small quirks in a manner which made it easy to emulate x86 with it. The backend of Crusoe contains things like {...} All these were made to make binary translation from x86 eas