Research Shows RISC vs. CISC Doesn't Matter 161
fsterman writes The power advantages brought by the RISC instruction sets used in Power and ARM chips is often pitted against the X86's efficiencies of scale. It's difficult to assess how much the difference between instruction sets matter because teasing out the theoretical efficiency of an ISA from the proficiency of a chip's design team, technical expertise of its manufacturer, and support for architecture-specific optimizations in compilers is nearly impossible . However, new research examining the performance of a variety of ARM, MIPS, and X86 processors gives weight to Intel's conclusion: the benefits of a given ISA to the power envelope of a chip are minute.
isn't x86 RISC by now? (Score:5, Informative)
i've read the legacy x86 instructions were virtualized in the CPU a long time ago and modern intel processors are effectively RISC that translate to x86 in the CPU
Re:isn't x86 RISC by now? (Score:4, Interesting)
As far i'm aware since the pentium pro line, the intel CPUs are RISCs with translation layers and AMD been on this boat since the original athlon.
Re: (Score:3)
Actually AMD did that way back in the K5 time. The K5 was a 29k RISC processor with a x86 frontend.
Re: (Score:2)
Interessing.
I was thinking the Atlhon that was the 29k.
Re:isn't x86 RISC by now? (Score:5, Informative)
After AMD lost the license to manufacture Intel i486 processors, together with other people, they were forced to design their own chip from the ground up. So they basically used one of the 29k RISC processors and put an x86 frontend on it. Cyrix did more or less the same thing at the time also coming with their own design. Since the K5 had good performance per clock but could not clock very high and was expensive AMD was stuck and to get their next processor they bought a company called NexGen which designed the Nx586 processor which was Intel compatible. AMD then worked on the successor of Nx586 as a single chip which was the K6. The K7 Athlon was yet another design made by a team headed by Dirk Meyer who used to be a chip designer at Digital Equipment Incorporated i.e. DEC. He was one of the designers of the Alpha series of RISC CPUs and the Athlon resembles an Alpha chip internally a lot because of that.
Re: isn't x86 RISC by now? (Score:3)
Actually, Nexgen was the first to do x86 -> RISC. Then Intel with Pentium Pro. Then AMD with K5. As far as I recall, Cyrix never did x86 -> RISC back then until they were acquired by VIA (ie, Cyrix M series chips executed x86 directly, but VIA Epia and later translate).
Re: (Score:2)
Right NexGen released their chip first but it is hard to compare because it was a dual-chip solution where the FPU came in a separate package while the K5 had its own FPU. NexGen only did a single-chip product later. K5 development process was highly protracted and difficult and the release was delayed many times. The Cyrix 5x86 was also similar in a lot of regards to the Pentium Pro. In fact I remember the Pentium Pro designer himself stating that they had a lot of interesting insights during Pentium Pro c
Re: (Score:3)
When the DEC Alpha was killed, many of the engineers were picked up by AMD.
Re:isn't x86 RISC by now? (Score:4, Insightful)
Yes. As noted by the study (That by the way isn't very good.) "When every transistor counts, then every instruction, clock cycle, memory access, and cache level must be carefully budgeted, and the simple design tenets of RISC become advantageous once again."
Essentially meaning that "If you want as few transistors as possible it doesn't help to have the CISC to RISC translation layer in x86"
They also claim things like "The report notes that in certain, extremely specific cases where die sizes must be 1-2mm2 or power consumption is specced to sub-milliwatt levels, RISC microcontrollers can still have an advantage over their CISC brethren." which clearly indicates that their idea of "embedded" systems is limited to smartphones.
The cases where you have a battery that can't be recharged on daily basis is hardly an extremely specific case. Not that any CPU they tested is suitable for those applications anyway. They have essentially limited themselves to applications where "not as bad as P4" is acceptable.
Re: isn't x86 RISC by now? (Score:3)
You're suggesting that the instruction set interpretation is a separate unit to to out of order execution code. In reality, the first stage of any highly optimized modern CPU able to minimize pipeline misses must actually "recompile the code" (an oversimplification) in hardware before the actual execut
Re: (Score:3)
ARM is RISC through and through. Though it complicates it somewhat with multiple simple instruction sets. Basic ARM ISA is all 32-bit instructions only, very much RISC from every angle you look at it, every bit as pure as MIPS. ARM Thumb ISA is 16-bit instructions only, and the machine translation from Thumb to ARM is very simple, just a fraction of the chip. Thumb2 gets slightly more complex allowing both 16 and 32 bit instructions intermixed, but again it's not that complicated. It's just RISC like a
Re: (Score:2)
Re: (Score:2, Interesting)
x86 instructions, are in fact, decoded to micro opcodes, so the distinction isn't as useful in this context.
Re:isn't x86 RISC by now? (Score:5, Interesting)
x86 instructions, are in fact, decoded to micro opcodes, so the distinction isn't as useful in this context.
They're not the only ones. The IBM mainframes have long been VMs implemented on top of various microcode platforms. In fact, one of the original uses of the 8-inch floppy disk was to hold the VM that would be loaded up during the Initial Microprogram Load (IMPL), before the IPL (boot) of the actual OS. So in a sense, the Project Hercules mainframe emulator is just repeating history.
Nor were they unusual. In school I worked with a minicomputer which not only was a VM on top of microcode, but you could extend the VM by programming to the microcode yourself.
The main differences between RISC and CISC, as I recall were lots of registers and the simplicity of the instruction set. Both the Intel and zSeries CISC instruction sets have lots of registers, though. So the main difference between RISC and CISC would be that you could - in theory - optimize "between" the CISC instructions if you coded RISC instead.
Presumably somebody tried this, but didn't get benefits worth shouting about.
Incidentally, the CISC instruction set of the more recent IBM z machines includes entire C stdlib functions such as strcpy in a single machine-language instruction.
Re:isn't x86 RISC by now? (Score:4, Informative)
They're not the only ones. The IBM mainframes have long been VMs implemented on top of various microcode platforms.
But the microcode implemented part or all of an interpreter for the machine code; the instructions weren't translated into directly-executed microcode. (And the System/360 Model 75 did it all in hardware, with no microcode).
And the "instruction set" for the microcode was often rather close to the hardware, with extremely little in the way of "instruction decoding" of microinstructions, although I think some lower-end machines might have had microinstructions that didn't look too different from a regular instruction set. (Some might have been IBM 801s [wikipedia.org].)
So that's not exactly the same thing as what the Pentium Pro and successors, the Nx586, and the AMD K5 and successors, do.
Currently mainframe processors, however, as far as I know 1) execute most instructions directly in hardware, 2) do so by translating them into micro-ops the same way current x86 processors do, and 3) trap some instructions to "millicode", which is z/Architecture machine code with some processor-dependent special instructions and access to processor-dependent special registers (and, yes, I can hear the word PALcode [wikipedia.org] being shouted in the background...). See, for example, " A high-frequency custom CMOS S/390 microprocessor" [ieee.org] (paywalled, but the abstract is free at that link, and mentions millicode) and "IBM zEnterprise 196 microprocessor and cache subsystem" [christianjacobi.de] (non-paywalled copy; mentions microoperations). I'm not sure those processors have any of what would normally be thought of as "microcode".
The midrange System/38 and older ("CISC") AS/400 machines also had an S/360-ish instruction set implemented in microcode. The compilers, however, generated code for an extremely CISCy processor [ibm.com] - but that code wasn't interpreted, it was translated into the native instruction set by low-level OS code and executed.
For legal reasons, the people who wrote the low-level OS code (compiled into the native instruction set) worked for a hardware manager and wrote what was called "vertical microcode" (the microcode that implemented the native instruction set was called "horizontal microcode"). That way, IBM wouldn't have to provide that code to competitors, the way they had to make the IBM mainframe OSes available to plug-compatible manufacturers, as it's not software, it's internal microcode. See "Inside the AS/400" [amazon.com] by one of the architects of S/38 and AS/400.
Current ("RISC") AS/400s^WeServer iSeries^W^WSystem i^WIBM Power Systems running IBM i are similar, but the internal machine language is PowerPC^WPower ISA (with some extensions such as tag bits and decimal-arithmetic assists, present, I think, in recent POWER microprocessors but not documented) rather than the old "IMPI" 360-ish instruction set.
The main differences between RISC and CISC, as I recall were lots of registers and the simplicity of the instruction set. Both the Intel and zSeries CISC instruction sets have lots of registers, though.
Depends on which version of the instruction set and your definition of "lots".
32-bit x86 had 8 registers (many x86 processors used register renaming, but they still had only 8 programmer-visible registers, and not all were as general as one might like), and they only went to 16 registers in x86-64. System/360 had 16 general-purpose registers (much more regular than x86, but that's not setting the bar all that high :-)), and that continues to z/Architecture, althoug
Re: (Score:2)
CISC ISAs may have individual "complex" instructions, such as procedure call instructions, string manipulation instructions, decimal arithmetic instructions, and various instructions and instruction set features to "close the semantic gap" between high-level languages and machine code, add extra forms of data protection, etc. - although the original procedure-call instructions in S/360 were pretty simple, BAL/BALR just putting the PC of the next instruction into a register and jumping to the target instruction, just as most RISC procedure-call instructions do. A lot of the really CISCy instruction sets may have been reactions to systems like S/360, viewing its instruction set as being far from CISCy enough, but that trend has largely died out.
I know you say "current", but one of the original ideas behind RISC was also to make each instruction "short", i.e. make each instruction take one cycle, and reduce cycle times as much as possible so that you could have really deep pipelines (MIPS), or increase clock speed. Now, while most "RISCs" today, sort-of follow this idea, by virtue of the ISA having been made with that in mind in the old days, i.e. load-store etc. they're typically not as strict about it (if they in fact ever where). I guess the CIS
Re: (Score:2)
Re: (Score:2)
RISC processors had hundreds of registers to store the stack frames. There is some smart overlapping of stack frames so that functions could pass by reference straight through registers. When you look at the depth of the function call stacks in some GUI systems, those are needed.
RISC processors with the letters "S", "P", "A", "R", and "C" in the instruction set name, in that order, did. The ones with the digits "8", "0", "9", "6", and "0" in the processor name also did, I think. The ones with "M", "I", "P", and "S" in the instruction set name, in that order, did not, nor did the ones with "A", "l", "p", "h", and "a" in the instruction set name, in that order, nor the ones with "A", "R", and "M" in the instruction set name, in that order, nor the ones with instruction sets having
Re: (Score:2)
And, given that most processors running GUI systems these days, and even most processors running GUI systems before x86/ARM ended up running most of the UI code people see, didn't have register windows, no, they're not needed. Yeah, SPARC workstations may have been popular, but I don't think register windows magically made GUIs work better on them. (And remember that register windows eventually spill, so once the stack depth gets beyond a certain point, I'm not sure they help; it's shallow call stacks, in which you can go up and down the call stack without spilling register windows, where they might help.)
I remember reading research back in the day, that showed that register windows were orthogonal to any RISC/CISC considerations, i.e. they were about as easy/costly to implement in either architecture and they gave the same boost/or not, in either case. As you point out, in practice it turned out to not be really worth the trouble, and they died out rather quickly.
CISC - reduced memory access ... (Score:4, Interesting)
x86 instructions, are in fact, decoded to micro opcodes, so the distinction isn't as useful in this context.
Actually it is. Modern performance tuning has a lot to do with cache misses and such. CISC can allow for more instructions per cache hit. The strategy of a hybrid type design, CISC external architecture and RISC internal architecture definitely has some advantages.
That said, the point of RISC was not solely execution speed. It was also simplicity of design. A simplicity that allowed organization with less money and resources than Intel to design very capable CPUs.
Re: (Score:2)
RISC came out when Intel was only doing tiny microchips, the RISC market was not competing with it. One of the advantages of CISC at the time was indeed that it was easy to implement, because you just build the micro architecture most of the rest was microcoding the instructions and putting that into ROM. If you needed to add a couple new instructions for the next release to stay competitive, it could be done very quickly (and you could patch your computers on the fly to get the new instructions too).
Yes,
Re: (Score:2)
RISC came out when Intel was only doing tiny microchips, the RISC market was not competing with it.
The reference CISC platform, the "competition", at the time was the VAX. Same arguments, different target.
Re: (Score:2)
Yes, simplicity of design was important, but the simplicity was to free up chip resources to use elsewhere, not to make it easier for humans to design it.
Well, yes. I think we're forgeting one of the main drivers for RISC, and that was making the hardware more compatible with what the then current compilers could actually fruitfully use. Compilers couldn't (and typically didn't) actually use all the hideously complex instructions that did "a lot" and were instead hampered by lack of registers, lack of orthogonality etc. So there was a concerted effort to develop hardware that fit the compilers, instead of the other way around, which had been the dominating p
Re: (Score:2)
Some of what you say is true. Register allocation was simple in RISC, but most of the competing CISC machines also had very orthagonal registers as well (PDP and VAX were the classic CISC machines, x86 isn't even in the picture yet). Also some CISC machines were adding instructions that compilers had trouble using, often because the compilers had to fit in a small amount of memory.
However many RISC machines required much more advanced compilers if you wanted optimization. I think for basic compilation RI
Re: (Score:2)
Much of the later complexity didn't exist in the late 70s.
Yes, I should have said that I put RISC as beginning with Hennessy & Patterson's work, that became MIPS and SPARC respectively. So we're a bit later than that. And of course when I said "compiler" I meant "optimizing compiler". Basic compilation as you say, was not a problem on CISC, but everybody observed that the instruction set wasn't really used. I remember reading VAX code from the C-compiler (on BSD 4.2) when I was an undergrad and noting that the enter/leave instructions weren't used. My betters
Re: (Score:2)
This. The x86 ISA is roughly analogous to ARM Thumb compressed instructions. It is just a front end to a register rich RISC core.
Re: (Score:2)
I dont' see it. The ARM Thumb instruction set is vastly more simple and regular than even the 286 instruction set. Thumb is already a reduced instruction set. There are no special purpose string instructions, it has general purpose registers that can be used as anything whereas 286 has only special purpose registers (AX is only accumulator, BX is the only base register, CX is the only register for counting instructions, etc). Yes, SP and PC are special purpose, but that's true of all the early RISC mach
Re: (Score:3)
I have to assume the wisc.edu folks know this and somebody gummed up the headlines along the way.
Re:isn't x86 RISC by now? (Score:5, Interesting)
That is correct. Every time this comes up I like to spark a debate over what I perceive as the uselessness of referring to an "instruction set architecture" because that is a bullshit, meaningless term and has been ever since we started making CPUs whose external instructions are decomposed into RISC micro-ops. You could switch out the decoder, leave the internal core completely unchanged, and have a CPU which speaks a different instruction set. It is not an instruction set architecture. That's why the architectures themselves have names. For example, K5 and up can all run x86 code, but none of them actually have logic for each x86 instruction. All of them are internally RISCy. Are they x86-compatible? Obviously. Are they internally x86? No, nothing is any more.
This is a myth that is not true (Score:5, Informative)
That is correct. Every time this comes up I like to spark a debate over what I perceive as the uselessness of referring to an "instruction set architecture" because that is a bullshit, meaningless term and has been ever since we started making CPUs whose external instructions are decomposed into RISC micro-ops. You could switch out the decoder, leave the internal core completely unchanged, and have a CPU which speaks a different instruction set. It is not an instruction set architecture. That's why the architectures themselves have names. For example, K5 and up can all run x86 code, but none of them actually have logic for each x86 instruction. All of them are internally RISCy. Are they x86-compatible? Obviously. Are they internally x86? No, nothing is any more.
This same myth keeps being repeated by people who don't really understand the details on how processors internally work.
You cannot just change the decoder, the instruction set affect the internals a lot:
1) Condition handling is totally different on different instruciton sets. This affect the banckend a lot. X86 has flags registers, many other architectures have predicate registers, some predicate registers with different conditions.
2) There are totally different number of general purpose and floating point registers. The register renamer makes this a smaller difference, but then there is the fact that most RISC's use same registers for both FPU and integer, X86 has separate registers for both. And this totally separates them, the internal buses between the register files and function units in the processor are done very differently.
3) Memory addressing modes are very different. X86 still does relatively complex address calculations on single micro-operation, so it has more complex address calculation units.
4) Whether there are operations with more than 2 inputs, or more than 1 output has quite big impact on what kind of internal buses are needed, how many register read and write ports are needed.
5) There are a LOT of more complex instructions in X86 ISA which are not split into micro-ops but handled via microcode. the microcode interpreter is totally missing on pure RISCs ( but exists on some not-so pure RISC's like Powe/PowerPC).
6) Instruction set dictates the memory aligment rules. Architectures with more strict alignment rules can have simples load-store-units.
7) Instruction set dictatetes the multicore memory ordering rules. This may affect the load-store units, caches and buses.
8) Some instructions have different bitnesses in different architectures. For example x86 has N x X -> 2N wide multiply operations which most RISC's don't have. So x86 needs bigger/different multiplier than most RISCs.
9) X87 FPU values are 80-bit wide(truncated to 64-bit when storing/loading). Practically all the other CPU's have maximum of 64-bit wide FPU values (though some versions Power have support for 128-bit FP numbers also)
Microcode switching (Score:3)
This same myth keeps being repeated by people who don't really understand the details on how processors internally work.
Actually, YOU are wrong.
You cannot just change the decoder, the instruction set affect the internals a lot:
All the reason you list could all be "fixed in software". The fact that silicon designed by Intel handles opcode in a way a little bit better optimized toward being fed from a x86-compatible frontend is just specific optimisation. Simply doing the same stuff with another RISCy back-end, i.e: interpreting the same ISA fed to the front-end, will simply require each x86 ISA being executed as a different set of micro-instructions. (some that are handled as single ALU opcode on Intel's si
Re: (Score:2)
This same myth keeps being repeated by people who don't really understand the details on how processors internally work.
Actually, YOU are wrong.
You cannot just change the decoder, the instruction set affect the internals a lot:
All the reason you list could all be "fixed in software".
No, they cannot. OR the software will be terible slow , like 2-10 times slowdown.
The fact that silicon designed by Intel handles opcode in a way a little bit better optimized toward being fed from a x86-compatible frontend is just specific optimisation.
Opcodes are irrelevant. They are easy to translate. What matters are the differences in the semantics of the instructions.
X86 instructions update flags. This adds dependencies between instructions. Most RISC processoers do not have flags at all.
This is semantics of instructions, and they differ between ISA's.
Simply doing the same stuff with another RISCy back-end, i.e: interpreting the same ISA fed to the front-end, will simply require each x86 ISA being executed as a different set of micro-instructions. (some that are handled as single ALU opcode on Intel's silicon might require a few more instruction, but that's about the different).
The backend, the micro-instrucions in x86 CPUs are different than the instructions in RISC CPU's. They differ in the smal
Again, what's the problem ? (Score:4, Interesting)
All the reason you list could all be "fixed in software".
The quotes around the "software" mean that i refer about the firmware/microcode as a piece of software designed to run on top of the actual execution units of a CPU.
No, they cannot. OR the software will be terible slow , like 2-10 times slowdown.
Slow: yes, indeed. But not impossible to do.
What matters are the differences in the semantics of the instructions.
X86 instructions update flags. This adds dependencies between instructions. Most RISC processoers do not have flags at all.
This is semantics of instructions, and they differ between ISA's.
Yeah, I pretty well know that RISCs don't (all) have flags.
Now, again, how is that preventing the micro-code swap that dinkypoo refers to (and that was actually done on transmeta's crusoe)?
You'll just end with a bigger clunkier firmware that for a given front-end instruction from the same ISA, will translate into a big bunch of back-end micro-ops.
Yup. A RISC's ALU won't update flags. But what's preventing the firmware to dispatch *SEVERAL* micro-ops ? first to do the base operation and then aditionnal instructions to update some register emulating flags?
Yes, it's slower. But, no that don't make micro-code based change of supported ISA impossible, only not as efficient.
The backend, the micro-instrucions in x86 CPUs are different than the instructions in RISC CPU's. They differ in the small details I tried to explain.
Yes, and please explain how that makes *definitely impossible* to run x86 instruction? and not merely *somewhat slower*?
Intel did this, they added x86 decoder to their first itanium chips. {...} But the perfromance was still so terrible that nobody ever used it to run x86 code, and then they created a software translator that translated x86 code into itanium code, and that was faster, though still too slow.
Slow, but still doable and done.
Now, keep in mind that:
- Itanium is a VLIW processor. That's an entirely different beast, with an entirely different approach to optimisation, and back during Itanium development the logic was "The compiled will handle the optimising". But back then such magical compiler didn't exist and anyway didn't have the necessary information at compile time (some type of optimisation requires information only available at run time. Hence doable in microcode, not in compiler).
Given the compilers available back then, VLIW sucks for almost anything except highly repeated task. Thus it was a bit popular for cluster nodes running massively parallel algorithms (and at some point in time VLIW were also popular in Radeon GFX cards). But VLIW sucks for pretty much anything else.
(Remember that, for example, GCC has auto-vectorisaion and well performing Profile-Guided-Optimisation only since recently).
So "supporting an alternate x86 instruction on Itanium was slow" has as much to do with "supporting an instruction set on a back-end that's not tailored for the front-end is slow" as it has to do with "Itanic sucks for pretty much everything which isn't a highly optimized kernel-function in HPC".
But still it proves that runing a different ISA on a completely alien back-end is doable.
The weirdness of the back-end won't prevent it, only slow it down.
Luckily, by the time Transmeta Crusoe arrived:
- knowledge had a bit advance in how to handle VLIW ; crusoe had a back-end better tuned to run CISC ISA
Then by the time Radeon arrived:
- compilers had gotten even better ; GPU are used for the same (only) class of task at which VLIW excels.
The backend of Crusoe was designed completely x86 on mind, all the execution units contained the small quirks in a manner which made it easy to emulate x86 with it. The backend of Crusoe contains things like {...} All these were made to make binary translation from x86 eas
Re: (Score:2)
but then there is the fact that most RISC's use same registers for both FPU and integer
With the minor exceptions of Alpha [hp.com], PA-RISC 1.x [hp.com] and 2.0 [hp.com], POWER/PowerPC/Power ISA [power.org], MIPS [imgtec.com], and SPARC [sparc.org].
Re:This is a myth that is not true (Score:4, Informative)
Some of what you said is legitimate. Most of it is irrelevant, since it does not speak to the postulate. You're speaking of issues which will affect performance. So what? You'd have a less-performant processor in some cases, and it would be faster in others.
No.
1) if the codition codes work totally differently, they don't work.
2) The data paths needed for separate and compined FP and integer regs are so different that it makes absolutely NO sense to have them together in chip that runs x86 ISA, even though it's possible.
3) If you don't have those x86-compatible address calculation units, you have to break most of memory ops into more micro-ops OR even run them with microcode. Both are slow. And if you have a RISC chip you want to have only the address calculation units you need for your simple base+offset addressing.
4) In the basic RISC pipeline there are two operands, one output/instruction. There are no data paths for two results, you cannot execute operations with multiple outputs such as x86 muliply which produces 2 values(low and high part of result), unless you do something VERY SLOW.
6) IF your RISC instruction set says you have aligned memory operations, you design your LSU to have only those, as it makes the LSU's much smaller, simpler and faster. But you need unaligned accesses for x86.
9) If your FPU calculates with different bit width, it calculates wrongly.
And
Re: (Score:2)
Re:isn't x86 RISC by now? (Score:5, Insightful)
This is why we use the terms "Instruction Set Architecture" to define the interface to the (assembler) programmer, and "microarchitecture" to refer to the actual internal implementation. ISA is not bullshit, unless you confuse it with the internal microarchitecture.
Re: (Score:3)
The very first paragraph of IBMs z/Architecture Principles of Operation:
The architecture of a system defines its attributes as seen by the programmer, that is, the conceptual structure and functional behavior of the machine, as distinct from the organization of the data flow, the logical design, the physical design, and the performance of any particular implementation. Several dissimilar machine implementations may conform to a single architecture. When the execution of a set of programs on different machin
Re: (Score:2)
Does that mean that the Transmeta Crusoe wasn't anything special?
Re: (Score:2)
RISC was not supposed to be a religion, which is what it seems to have turned into. No one should even ben arguing the point today because modern chips are so different from when the term 'RISC' was new. The whole premise behind RISC is being used extensively in modern CISC machines. The problem with trying to keep a RISC vs CISC debate alive is that is harms the education of the students.
RISC is primarily at its core about eliminated complex infrastructure where you can and reusing the resources for thi
It's a question that WAS relevant (Score:4, Insightful)
Back when compilers weren't crazy optimized to their target instruction set, people coding things in assembler wanted CISC, and people using compilers wanted RISC.
But nowadays almost no one still does the former, and the latter uses CISC chips a lot better.
This is now a question for comp sci history, not engineers.
Re:It's a question that WAS relevant (Score:5, Funny)
Re: (Score:2)
I actually wonder how relevant CISC even is to people doing assembly programming these days. There is no reason you can't pull an LLVM-like move and target something other than the native instruction set when you are programming.
That is basically all the x86 does anyway - convert CISC instructions into microcode. There is no reason that an assembler couldn't do the same thing, further blurring the lines between assembly and compiled code. If the whole point of writing assembly is to optimize your code, a
Re: (Score:2)
Usually, if you're coding in assembly, it's because you're trying to bring some very basic core functionality into a system, be it an OS component, a driver, or a compiler, and usually that means that you're engaged in enough system-specific behaviors that virtualization does you no good.
Java and .NET benefit from a virtual machine language precisely because they're high level languages, and it's easier to compile assembly to assembly in multiple ways than to compile high level languages to multiple assembl
Re: (Score:2)
easier to compile assembly to assembly in multiple ways than to compile high level languages to multiple assembly languages.
In other words, they don't want to be bothered writing real compilers.
Re: (Score:2)
Re: (Score:2)
virtual machine language
Don't mind me; I'm just twitching over here and fighting down the urge to vomit.
Re: (Score:3)
Oh no. A technology exists.
Let me rephrase that. I cannot comprehend your objections.
Re: (Score:2)
"Virtual" anything means there's at least one layer of abstraction between the thing and anything the layperson would consider remotely close to the hardware. "Machine" would imply something that is inversely quite close to the hardware. To my ears, it sounds like saying "pure hybrid"...you can't be both at the same time.
Maybe I'm mixing up (virtual (machine language)) and ((virtual machine) language). From the perspective of the Java/.NET compiler it conceptually resembles machine language but it sure does
Re: (Score:2)
I can see how Java being in a VM to begin with presents a similar model to running assembly on the actual machine but comparing the two in terms of efficiency and overhead is laughable. I was signalling my cognitive dissonance of conflating Java and assembly so directly.
You are aware that there are CPUs capable of executing Java bytecode directly? [wikipedia.org] I.e. that use Java bytecode as (one of) their native assembly instruction set(s)?
Re: (Score:2)
Yeah, and historically there were Lisp machines [wikipedia.org]. The PC-combatible/x86 was implied.
Re: (Score:2, Insightful)
No, the benefit of RISC is that you have many more on chip registers
Nothing about RISC makes more registers inherent, and nothing about CISC makes less registers inherent. Now shut the fuck up and let the real nerds discuss.
Re: (Score:2)
Re: (Score:2)
That's absolutely correct, unless of course you count the fact that you can't create a CISC CPU with just as many registers that can be used to store data, manipulate data, etc sans a cache hit as a RISC CPU given the same die size.
You can't? You can't trade off, say, transistors used for registers (especially given that the bigger processors do register renaming, so you have more hardware registers than the actual RISC/CISC instruction set provides) for transistors used for some other purpose?
Re: (Score:2)
Re: (Score:2)
No. That's correct. You can't add registers, keep the same functionality, and add all the circuitry to suport said functionality by reducing functionality and taking away regsiters. Who would have thought?
That isn't answering the question I asked.
The question I asked was "You can't trade off, say, transistors used for registers (especially given that the bigger processors do register renaming, so you have more hardware registers than the actual RISC/CISC instruction set provides) for transistors used for some other purpose?"
I said nothing about keeping all the same functionality, if by "functionality" you mean, for example, "on-chip caches of the same size" and "same number of hardware registers including
Re: (Score:2)
The problem is that you were assuming you are the only one speaking in this thread. The discussion was about adding more registers in a CISC architecture, and so CISC functionality is the context. When you ask what "the same functionality means" that is absurd. You can't implement a subset of the functionality and still have the same functionality.
I'll put this in simpler terms. Smart people design CPUs and they do
Re: (Score:2)
ROTFLMAO. That's pretty funny there #1252108 :-)
What you can't seem to grasp is even a layman could figure out how ridiculous your claim is with absolutely no understanding of the differences between RISC and CISC.
Rockgoon the PHB: How comes we cantz just adz us a bunch more registerz on the same die size with the same functionality?
Skilled CPU Designer: But PHBoss, we already have the die saturated with as much functionality as we can!
Re: (Score:2)
That is more or less accurate. The goals of the original RISC were stated to be making a Reduced Instruction Set Computer, but what was in fact produced was a Reduced Instruction Set Complexity CPU. By restricting the touching of memory to only loads and stores, all other instructions that were able to be executed in one clock COULD be executed in one clock always. Whereas some CISC instructions involving arrays could kick off 10+ memory touches as a side effect, RISC instructions could never do that (sa
Re: (Score:2)
To correct myself, based on something I read downstream (thanks, trparky), the P4 was 31 stages, not 19. That is really a number I shouldn't have misremembered.
Re: (Score:2)
Whereas some CISC instructions involving arrays could kick off 10+ memory touches as a side effect ... That CISC operation that made 10 memory touches took roughly 10-18 bytes of instruction storage (68K example)
OK, that's probably using "memory indirect postindexed mode". Addressing modes that complex are something some CISC processors had, but not others; x86 is much less complex (scaling, but no memory-indirect or auto-increment/auto-decrement), and S/3x0 even less complex than that (no scaling, just double-indexing).
How often was that addressing mode used, in practice? Was it used often enough that you saved enough code space that you could make the I cache smaller?
Re:It's a question that WAS relevant (Score:4, Interesting)
I think a large part of the confusion is that CISC often means accumulator architectures (x86, z80, etc) vs RISC which means general purpose register (ppc, sparc, arm, etc) In between you have variable width RISC like thumb2.
As an occasional assembly programmer (PowerPC currently) I far prefer these RISC instructions. With x86 (12+ years ago) I would spend far more instructions juggling values into the appropriate registers, then doing the math, then juggling the results out so that more math could be done. With RISC, especially with 32 GPRs, that juggling is near eliminated to the prologue/epilogue. I hear x86 kept taking on more instructions and that AMD64 made it a more GPR like environment.
-Samuel
Re: (Score:3)
Even though Itanium is all but dead, I did like the fact that you had 128 GP registers to play with. One could do all the loads in one pass, do the calculations, then toss the results back into RAM. The amd64 architecture is a step in the right direction, and I'd say that even though it was considered a stopgap measure at the time, it seems to have been well thought out.
Re: (Score:2)
The downside of having few registers in the ISA is it means the compiler may have to choose instruction ordering based on register availability or worse still "spill" registers to memory to fit the code to the available registers.
Re: (Score:2)
The downside of having few registers in the ISA is it means the compiler may have to choose instruction ordering based on register availability or worse still "spill" registers to memory to fit the code to the available registers.
Yes, but the score boarding takes care of those spills as well. The processor won't actually perform them. But, whether they're visible or not, the compiler still has to optimise as if they're there in order to have a chance to wring out the maximum performance, so whether they're visible or not turns out to not mean that much in practice, rather, keeping them invisible isn't that much of a gain, as the compiler will have to assume that they're backed by invisible ones anyway and you'll take a substantial p
Re: (Score:2)
An all the way CISC architecture would allow you to have both operands of an instruction to be pointers to memory, the CPU would have the circuitry to load values into a register in the background. That would eliminate manual register loading. You would also have 3 operand versions of the instructions, as well. x86 is not the most CISC architecture out there or take the concept as far as it could. This would be a very programmer friendly environment. AMD expanded the number of registers, they could add eve
Re: (Score:2)
With Moore's law flattening out, the pendulum might end up swinging back that way.
Right now, for a lot of tasks, we have CPU to burn, so the ISA doesn't really matter as much as it did during the 680x0 era.
But who knows... Rock's law may put the kibosh on Moore's law eventually, so we might end up seeing speed improvements ending up being either better cooling (so clock speeds can be cranked up), or adding more and more special purpose cores [1]. At this point, it might be that having code optimized by a c
It's a general purpose vs dedicated thing (Score:2, Interesting)
The CPU ISA isn't the important aspect. Reduced power consumption mostly stems from not needing a high end CPU because the expensive tasks are handled by dedicated hardware. What counts as top of the line ARM hardware can barely touch the processing power of a desktop CPU, but it doesn't need to be faster because all the bulk processing is handled by graphics cores and DSPs. Intel has for a long time tried to stave off the barrage of special purpose hardware. The attempts to make use of ever more general pu
Re: (Score:2)
And ironically specialized hardware is better than the CPU at raytracing and Intel might lose that battle as well after being its lone champion for so long.
http://techreport.com/news/261... [techreport.com]
Comment removed (Score:3)
Re: not now, but it certainly did in the past. (Score:2)
Intel has had several RISC chips (eg i960), but Itanium is VLIW (ie not RISC or CISC).
Certainly it can be argued that it's AMD's fault for the current dominance of x86, but it's also true that none of the other architectures were cheap enough for the general populace to adopt, hence the abundance of ARM nowadays and POWER, SPARC, and currently on life support
efficiency matters (Score:3)
This study looks seriously flawed. They just throw up their hands at doing a direct comparison of architectures when they try to use extremely complicated systems and sort of do their best to beat down and control all the factors that introduces. One of the basic principles of a scientific study is that independent variables are controlled. It's very hard to say how much the instruction set architecture matters when you can't tell what pipelining, out of order execution, branch prediction, speculative execution, caching, shadowing (of registers), and so on are doing to speed things up. An external factor that could influence the outcome is temperature. Maybe one computer was in a hotter corner of the test lab than the other, and had to spend extra power just overcoming the higher resistance that higher temperatures cause.
It might have been better to approach this from an angle of simulation. Simulate a more idealized computer system, one without so many factors to control.
Don't be silly (Score:2)
RISC architecture is going to change everything.
Re: (Score:2)
RISC architecture is going to change everything.
Agreed, as soon as they can do submicron technology. By the way, for some strange reason I feel like I've been sleeping a decade.
Re: (Score:2)
RISC is good!
Re: (Score:2)
They did... 20 years ago... CISC had changed its ways to be more RISCy
The article is bad - mfg technology dominates (Score:2)
They are seriously comparing some 90nm process with much better intel 32nm and 45 nm processes.
They have just taken some random cores made on random (and uncomparable) manufacturing technologies, throw couple of benchmarks and try to declare universal results based on these.
Few facts about the benchmarks setup and the cores cores:
1) They use ancient version of GCC. ARM suffers this much more than x86.
2) Bobcat is relatively balanced core, no bad bottlenecks. mfg tech is cheap, not high performance but relat
Re: (Score:2)
Final nail in the Itanium coffin (Score:2, Interesting)
20 years ago, RISC vs CISC absolutely mattered. The x86 decoding was a major bottleneck and transistor budget overhead.
As the years have gone by, the x86 decode overhead has been dwarfed by the overhead of other units like functional units, reorder buffers, branch prediction, caches etc. The years have been kind to x86, making the x86 overhead appear like noise in performance. Just an extra stage in an already long pipeline.
All of which paints a bleak picture for Itanium. There is no compelling reason to ke
Re: (Score:2)
Re: (Score:2)
As the years have gone by, the x86 decode overhead has been dwarfed by the overhead of other units like functional units, reorder buffers, branch prediction, caches etc. The years have been kind to x86, making the x86 overhead appear like noise in performance. Just an extra stage in an already long pipeline.
And all that long pipeline takes power to run (recently this argument comes up in discussions of mobile devices more than in the server room, because battery life is hugely important, and ARM in the serverroom is still a joke). ARM chips sometimes don't even have cache, let alone reorder buffers, and branch prediction. When you remove all that stuff, the ISA becomes more important in terms of power consumption.
Of course, as someone else pointed out, they were comparing 90nm chips to 32nm and 45nm. Why tha
Re: (Score:3)
Itanium was dead on arrival.
It ran existing x86 code much slower. So if you wanted to move up to 64bit (and use Itanium to get there), you had to pay a lot more for your processors, just t
Re: (Score:2)
Itanium was first conceived as a VLIW CPU. As its development progressed, it was found that the real estates savings due to moving everything into the compiler was minimal, while in the meantime, the compiler was a bitch to write. Also, under the original VLIW vision, software would need to be recompiled every time for a new CPU Which could be a dream for the GNU world, which requires the availability of source code, but practically, a bitch for the real world
Today's Itanium, unlike Merced, is now more
Re: (Score:2)
All of which paints a bleak picture for Itanium.
Wow, that is a rather bold prediction to be making in 2014. If Itanium does eventually start to falter in the marketplace, then you sir are a visionary.
Intel x86 CISC is converted to RISC via Microcode (Score:2)
As mentioned over many years of slashdot posts, x86 as a hardware instruction no longer truly exists and represents a fraction of the overall die space. The real bread and butter of CPU architecture and trade secrets rests in the microcode that is unique in every generation or edition of a processor. Today all intel processors are practically RISC.
Original sources (Score:3)
It is really surprising that neither the linked Extremetech article, nor the slashdot summary cite the original source. This research was presented in HPCA'13 [carch.ac.cn] in a paper titled "Power Struggles: Revisiting the RISC vs. CISC Debate on Contemporary ARM and x86 Architectures", by Emily Blem [wisc.edu] et al, from the University of Wisconsin's Vertical Research group [wisc.edu], led by Dr. Karu Sankaralingam. You can find the original conference paper [wisc.edu] in their website.
The Extremtech article indicates that there are new results with some additional architectures (MIPS Loongson and AMD processors were not included in the original HPCA paper), so I assume that they have published an extended journal version of this work, which is not yet listed in their website. Please add a comment if you have a link to the new work.
I do not have any relation with them, but I knew the original HPCA work.
The real difference is commodity vs custom (Score:2)
Soo... (Score:2)
CISC instruction sets are now abstractions (Score:2)
And actually so is RISC to a degree on POWER processors.
Back in the 80's going RISC was a big deal. It simplified decode logic (which was a more appreciable portion of the circuit area), reduced the number of cycles and logic area necessary to execute an instruction, and was more amenable (by design) to pipelining. But this was back in the days when CISC processors actually directly executed their ISAs.
Today, CISC processors come with translation front-ends that convert their external ISA into a RISC-like
Re: (Score:2)
Re: (Score:2)
I think that the movie you paid homage to was more realistic unfortunately...
Re:so why is intel's 14nm haswell still at 3.5 wat (Score:5, Insightful)
Granted, you can build a tablet to do specific tasks (like decoding video codecs) around a really slow processor and some special-purpose DSPs. But perhaps the companies in that business aren't making enough profit to interest Intel.
Re: (Score:2)
Well, the power consumption of various processor architectures are a *bit* more complicated than RISC vs CISC which is the point of this story.
Re: so why is intel's 14nm haswell still at 3.5 wa (Score:3)
No relation to energy used. It's in the article: Haswell will get it's work done faster and use about the same energy as the slower chips that take longer. What matters is architecture, not ISA (Atom is lower power than Haswell at the same process node).
Re: (Score:2)
Re: (Score:2)
Honest question:
What can you do on x86, that you can't do on POWER or MIPS?
Re: (Score:3)
That's easy: maintain compatbility with fucktons of legacy code; arguably more of which exists for x86 than every other architecture combined...
Re: (Score:2)
There's plenty of legacy code for SPARC, MIPS, and POWER. And a lot of code can be recompiled for a different platform without much trouble (and there's plenty that can't).
Re: (Score:2)
I'm waiting for a 'true Scotsman' comment :)
Re: (Score:2)