New RISC-V CPU Claims Recordbreaking Performance Per Watt (arstechnica.com) 74
Hmmmmmm shares a report from Ars Technica: Micro Magic Inc. -- a small electronic design firm in Sunnyvale, California -- has produced a prototype CPU that is several times more efficient than world-leading competitors, while retaining reasonable raw performance. EE Times reported on the company's new prototype CPU, which appears to be the fastest RISC-V CPU in the world. Micro Magic adviser Andy Huang claimed the CPU could produce 13,000 CoreMarks (more on that later) at 5GHz and 1.1V while also putting out 11,000 CoreMarks at 4.25GHz -- the latter all while consuming only 200mW. Huang demonstrated the CPU -- running on an Odroid board -- to EE Times at 4.327GHz/0.8V and 5.19GHz/1.1V. Later the same week, Micro Magic announced the same CPU could produce over 8,000 CoreMarks at 3GHz while consuming only 69mW of power.
Part of the difficulty in evaluating Micro Magic's claim for its new CPU lies in figuring out just what a CoreMark is and how many of them are needed to make a fast CPU. It's a deliberately simplified CPU benchmarking tool released by the Embedded Microprocessor Benchmark Consortium, intended to be as platform-neutral and simple to build and use as possible. CoreMark focuses solely on the core pipeline functions of a CPU, including basic read/write, integer, and control operations. This specifically avoids most effects of system differences in memory, I/O, and so forth. [...] With that said, it's worth pointing out that -- if we take Micro Magic's numbers for granted -- they're already beating the performance of some solid mobile phone CPUs. Even at its efficiency-first 3GHz clockrate, the Micro Magic CPU outperformed a Qualcomm Snapdragon 820. The Snapdragon 820 isn't world-class anymore, but it's no slouch, either -- it was the processor in the U.S. version of Samsung's Galaxy S7.
Part of the difficulty in evaluating Micro Magic's claim for its new CPU lies in figuring out just what a CoreMark is and how many of them are needed to make a fast CPU. It's a deliberately simplified CPU benchmarking tool released by the Embedded Microprocessor Benchmark Consortium, intended to be as platform-neutral and simple to build and use as possible. CoreMark focuses solely on the core pipeline functions of a CPU, including basic read/write, integer, and control operations. This specifically avoids most effects of system differences in memory, I/O, and so forth. [...] With that said, it's worth pointing out that -- if we take Micro Magic's numbers for granted -- they're already beating the performance of some solid mobile phone CPUs. Even at its efficiency-first 3GHz clockrate, the Micro Magic CPU outperformed a Qualcomm Snapdragon 820. The Snapdragon 820 isn't world-class anymore, but it's no slouch, either -- it was the processor in the U.S. version of Samsung's Galaxy S7.
What are you optimizing for? (Score:4, Interesting)
It's worth comparing systems approaches between this chip and Apple's M-1. How much is this new chip optimized to provide the best CoreMark score? What benchmarks exist for a more reasonable actual applications load? The approach Apple took for the M-1 has been described as a systems approach, with the software and systems people trying to identify specific areas where hardware acceleration/optimization would have the greatest ROI.
It'll be interesting to see how well this new chip performs on 'real world' application loads. But first, I'm sure, they'll need to produce sufficiently good optimizing compilers to get well-performing applications against the chip architecture's choices.
Re: What are you optimizing for? (Score:5, Insightful)
Re: (Score:2)
That benchmarks the storage controller as much as it benchmarks the CPU.
Re: (Score:2)
Re: (Score:2)
it's not the Linux kernel, but it's a reasonable benchmark:
https://build2.org/blog/apple-... [build2.org]
The answer is "surprisingly fast".
Re: (Score:2)
Linux kernel compile is a poor CPU benchmark because it depends on storage speed and RAM speed. It's okay for comparing systems, but not for comparing CPUs where, e.g. on Ryzen, RAM speed makes a huge difference.
Re: (Score:3)
How much is this new chip optimized to provide the best CoreMark score?
If you could cheat CoreMark then Intel would have already done it. No seriously, they would have because they kinda got a thing for cheating benchmarks.
https://www.pcworld.com/articl... [pcworld.com]
https://www.servethehome.com/i... [servethehome.com]
https://wccftech.com/intel-set... [wccftech.com]
https://tech.slashdot.org/stor... [slashdot.org]
Re: (Score:3)
I didn't suggest "cheat", I suggested 'optimize' This chip is not handicapped by x86 compatibility and in particular by the need to not break anything in Microsoft's code base.
Re: (Score:2)
I didn't suggest "cheat", I suggested 'optimize'
Bullshit, it's the same goddamn thing and you know it.
This chip is not handicapped by x86 compatibility
It's not an original ISA therefore it also has to conform just like x86.
Re: (Score:2)
If cheating is taking steroids optimizing is hard targeted training.
Re: (Score:2)
Exciting (Score:3)
Even being in the ballpark of previous generation ARMs is a good sign for RISC-V. Rather than being strictly a technical battle between ISAs it is most likely that the amount of effort (and time and money) put behind a CPU architecture then the closer it can approach mainstream performance.
Re: Exciting (Score:3)
I am curious what all this means for CISC CPUs and more specifically the Intel x86-64?
Re: (Score:3)
It's time to short INTC.
Re: Exciting (Score:5, Informative)
The dirty secret is that there aren't really any truly CISC CPUs any more. For processors based on x86, the ISA may be CISC, but the execution of that isn't. The instruction stream is converted to something RISC internally.
Recent-ish x86 CPUs are all basically "hybrid" processors in that regard.
Re: (Score:3)
Re: (Score:3)
Re: (Score:1)
Two ignorant people arguing decades-old fanboy arguments because they don't know any better. Microcode? What microcode? Dumbass thinks the decoder of modern x86 processors runs in microcode? You too stupid to realize how dumb that is?
Re: (Score:3)
Are you claiming that there's no microcode used in modern Intel processors at all, or simply that decoding is not done in microcode? Intel CPUs definitely use microcode and in fact you can patch it from the OS even. Most distros apply the latest intel patches at boot. As for decoding, if I read this right, https://en.wikipedia.org/wiki/... [wikipedia.org] says decoding is done by microcode, even on the latest i3, i5, and i7 processors.
I'm curious to know what you are referring to.
Re: (Score:2)
The decoder? There are many types of decoders and part of it is microcoded so you are simply wrong. Simple instructions are decoded in hardwired logic while complex ones and/or those not important for anything but compatibility is pushed into the slow microcode path .
Re: (Score:2)
So you don't understand the differences and the impact. Sad. Many such cases.
Re: (Score:2)
Re: Exciting (Score:5, Insightful)
It's not a "dirty secret", it's not a secret at all. People don't know it, but that doesn't mean it's a secret.
Also, the internal architecture of x86 processors is not "something RISC", it bears no relation to RISC. Interestingly, modern RISC processors are also designed this way, yet why would they be if they were already "RISC internally"?
RISC means "Reduced Instruction Set Computing". It was a philosophy that said that processors should be designed such that the logic that makes them is committed to the things that most matter. Operations that were rarely done did not get silicon. Operations that were complex did not get silicon. Instruction sets that required a lot of logic to decode got replaced with ones that didn't. Absolutely NONE of that philosophy is reflected in the the internal design of modern processors. People who say otherwise do not even know what RISC was. RISC is dead, CISC is dead, RISC vs. CISC is dead. Those are all meaningless distinctions today.
Finally, x86 processors have been made this way for DECADES, it's neither secret nor new.
Real world RISC/CISC/Microcode example from 1970s (Score:2)
My first job as a computer programmer was at a company that made minicomputers back in the 1970s. They were build around AMD2901 bit slices to create 16 bit minicomputers. (I think they had an 8 bit version before I started there.) Anyway, the digital engineers wrote microcode to get the simple bit slices to perform the instructions that we software folks wrote assembly language for. (I would've liked to learn how to write microcode for those. One engineer even showed me a listing of the microcode but
Re: (Score:2)
Modern x86 CPUs have many features associated with RISC internally. For example large numbers of registers and register renaming. Of course nobody outside of AMD and Intel knows what the core instructions are exactly but it's a good bet that they are basically RISC, a smaller core of highly optimized instructions that are combined via microcode to perform more complex operations.
Re: (Score:2)
Re: (Score:2)
It's not going to be like traditional microcode because all modern x86 divide resources like ALUs and FPUs up with the ability to dispatch to them dynamically from two separate threads. And on top of that they are doing all the usual OOE and related tricks.
Re: (Score:2)
Re: (Score:2)
>If that's RISC-like, I'm a banana.
OK, I'll hang around to see if you split . . . :)
hawk
Re: (Score:2)
Large number of registers isn't a defining characteristic of RISC and the internal format of instructions are guaranteed not to be RISC-like. This from decoding and executing data, some things simply isn't possible if the internals weren't partially complex. However the internal format is less complex than the x86 ISA however the hardware being simplified compared to instruction is the case for everything except VLIW, some early or very simple RISC, or strange things like TTA.
Exactly the core issue (Score:1)
There are significant issues with "CISC" chips namely the variable instruction size.
Read this article [medium.com] which notes that as a massive advantage for the M1 over what the CISC chips can ever deliver even with RISCish internals...
Scroll down to:
Why can't Intel and AMD add more instruction decoders?
The Apple M1 has eight decoders where, as the article states, four is about as large as an Intel instruction set chip can ever have.... and Apple can add more.
Re: (Score:2)
For processors based on x86, the ISA may be CISC, but the execution of that isn't.
The instruction stream is converted to something RISC internally.
This "RISC-ness" of x86 might be irrelevant or at least insufficient if an article I read lately is to be believed. What is important is that x86 still has a CISC layer, which is a bottleneck and is dragging the performance of the whole thing down. Basically, it seems that the Achilles heel of x86 is the performance of the instruction decoding module. Due to the varying size of x86 instructions, it might be very difficult to devise a high-performance decoding module, with multiple blocks which decode instru
Re: (Score:3)
I am curious what all this means for CISC CPUs and more specifically the Intel x86-64?
Here's the best metaphor to make your question graspable:
Dark Helmet: I am your father’s brother’s nephew’s cousin’s former roommate.
Lone Star: So what does that make us?
Dark Helmet: Absolutely nothing.
In other words, the ISA and design of x86-like "CISC" processors is nothing like the from-scratch RISC-V design which was originally made to work on FPGAs (programmable gate logic) so it had to be much simpler than what Intel and AMD do with full-custom transistor level logic design.
Assuming the simplicity required of the RTL designers to make RISC-V work on FPGAs translates well to standard-cell ASIC fabrication (which would be the only option for a
Re: Exciting (Score:4, Informative)
What? No. I don't know where you got that from but FPGA was NOT what RISC-V was invented for!
"RISC-V was started with a goal to make a practical ISA that was open-sourced, usable academically and in any hardware or software design without royalties." (Wikipedia).
The simplicity of the ISA goes back to the original goal of RISC: stop wasting silicon real estate on circuits to support thousands of legacy (what Linus would call "magical) opcodes that you have to support in every future version of your ISA. Swaths of the circuits in an x86 chip are more or less never used because compilers over time ditch the garbage pile that is Intel's instructions-of-the-week. ARM has the same problem, even though they evolved from a RISC design. My favorite example is ARM *still* has a Java acceleration opcode left over from the Blackberry days. Now, it just feedback into their main execution pipe, but they can never get rid of the opcode or implementing the circuitry to support it.
Modern x86 chips are basically RISC on the backend, with a bunch of circuitry on the front end to decode those x86 ops into
one or more RISC micro-ops. You're basically wasting silicon to support x86.
Re: (Score:3)
Modern x86 chips are basically RISC on the backend, with a bunch of circuitry on the front end to decode those x86 ops into one or more RISC micro-ops. You're basically wasting silicon to support x86.
This is just plain wrong and is basically the result of someone with no actual hardware knowledge but knows a tiny bit about instructions sets .. a fucking lame assed pretend expert above, posing as someone that knows something worthwhile, but fucking doesnt actually.
Those more complex ops describe work that needs to be done.
If the op being decoded had to be simpler "because RISC", the processor would need additional decode ability to emit the same work into the pipeline that the complex processor does
Re: (Score:2)
He mad! ...okay, then. The was pure nonsense on so many levels, that it's not worth the time to dissect it all. But I guess feel free to live in your magic x86 processor fantasy world?
For those that actually want to understand what RISC-V means and it effects on how a processor is designed, and works in the real work without the Intel and AMD fanboys flipping out... "Crash Course Computer Science" on YouTube is a great initial starting point. The episodes are easy to understand, and they walk you through th
Re: (Score:2)
RV32I has 31 registers plus x0, which always has the value 0.
What’s Different? Dedicating a register to zero is a surprisingly large factor in simplifying the RISC-V ISA.
The PC is one of ARM-32’s 16 registers, which means that any instruction that changes
a register may also as a side effect be a branch instruction. The PC as a register complicates
hardware branch prediction, whose accuracy is vital for good pipelined-performance, since
every instruction might be a branch instead of 10–20% of instructions executed in programs
for typical ISAs. It also means one less general-purpose register.
--The RISC-V Reader
Re: (Score:2)
You can actually see this at work on the M1. It has a massive amount of level 1 cache, far more than AMD and Intel CPUs. It needs it because it can't rely on complex instructions encoding large chunks of work that it can break down internally, it has to instead cache large numbers of instructions.
Re: (Score:2)
Which Java-acceleration opcode are you talking about? No current ARM chip supports the Jazelle Java acceleration mode. The ThumbEE checked array access instructions aren't Java-specific, they're applicable to any language that has bounds checks on array accesses (e.g. JavaScript, or even std::vector::at in C++).
Re: (Score:2)
Sorry, but that is incorrect. The instruction is still there, and thus, as with all of these hardly-if-ever-used instructions, they still have to implement support for it with wires and silicon.
https://en.wikipedia.org/wiki/... [wikipedia.org]
'Jazelle is denoted by a "J" appended to the CPU name, except for post-v5 cores where it is required (albeit only in trivial form) for architecture conformance.'
This is exactly what I'm talking about.
Re: (Score:2)
The "trivial form" doesn't actually accelerate Java though. All it does is fall back to the interpreter. It doesn't require much silicon to implement it as a NOP.
Re: Exciting (Score:2)
You're missing the point. That's just one simple example among thousands of unused instructions. Trivial != Free. You still have to put the traces and gates in to support all those instructions...forever! And supporting these legacy instructions is how x86 and ARM end up with so much cruft in their silicon. Much like Rob Pike talking about Golang vs C++ :
"Did the C++ committee really believe that was wrong with C++ was that it didn't have enough features? Surely,
[...] it would be a greater achievement to si
Re: (Score:2)
Modern x86 CPUs don't directly support all those old instructions, they don't dedicate silicon to them.
Instead they just have an exception handler installed that is triggered on an unknown op-code, and checks to see if it is one of the ancient ones, and if so executes a little bit of ROM code to emulate it and carries on. This technique is ancient, for example it was used to run code that used FPU instructions on non-FPU equipped CPUs back in the day.
Of course this is much slower than native support but it
Re: (Score:2)
Re: (Score:2)
An utterly meaningless statement, yet remarkably more informed that the other comments. I'm more interested in hearing about that microcode in those "hasn't been CISC" processors!
Wait until you learn about the micro-operations... (Score:2)
I'm more interested in hearing about that microcode in those "hasn't been CISC" processors!
Well if learning about the microcode excites you, just wait until you lean about the micro-operations. :-)
Re: (Score:2)
What about it? My i5 machine running CentOS 7 installs patched microcode at boot up. And my AMD machine also supports patching of the CPU microcode. You can try it on any machine and see what it says. Here's boot messages from an AMD machine:
# dmesg | grep microcode
[ 0.738462] microcode: CPU0: patch_level=0x08108109
[ 0.738473] microcode: CPU1: patch_level=0x08108109
[ 0.738490] microcode: CPU2: patch_level=0x08108109
[ 0.738508] microcode: CPU3: patch_level=0x08108109
[ 0.738511] microcode: M
Re: (Score:2)
https://www.pctechguide.com/pentium-cpus/pentium-pro
https://en.wikipedia.org/wiki/P6_(microarchitecture)
Re: Exciting (Score:5, Interesting)
A large fraction of a modern CPUs die is cache and memory hardware. The issue is that it might be possible to build a x86 CPU that consumes 200mW and gets a really good CoreMark score. However, once a huge cache is attached to the x86 CPU, then the 200mW power consumption number may go to pieces. The cache is often needed for real-world performance. With modern fabrication processes, every transistor has a certain amount of leakage current. For large caches the leakage current becomes a limitation on minimum power consumption.
It was many years ago, but Intel once came out with a paper that said something to the effect of: "CISC was better than RISC because the smaller CISC instructions enable better cache utilization." As such, do not write off the x86 architecture when it comes to performance/instruction byte. ARM developed the ARM-Thumb instruction set to let it compete on instructions / memory word (equivalently instructions / cache line) for the applications where this is a consideration.
Re: (Score:3, Informative)
"As such, do not write off the x86 architecture when it comes to performance/instruction byte."
But you could write them off for arguing that is a principle concern.
x86 code may have a lower footprint that improves memory utilization, but it requires more gates than more modern instruction set designs to achieve similar capabilities. Gates consume power whether they are in a cache or in the processor. All you're doing here is repeating an Intel talking point, fact is that compact, variable-width instructio
Re: (Score:2)
Turns out that RISC needs even more cache though.
The Apple M1 has 128k level 1 cache per core, compared to 32k for Zen 2 parts. It needs 4x the cache because simple instructions do less work, meaning it needs to read more of them to do the same amount of work as an AMD64 part.
Re: (Score:2)
I am curious what all this means for CISC CPUs and more specifically the Intel x86-64?
Probably not much when we're talking about chips that have a billion transistors. The memory architecture, the amount of cache, and the super-scalar pipeline all play a bigger role than the instruction set. At least in the examples of RISC and CISC that I see alive and well today.
Obligatory question... (Score:1)
Obligatory question...
Can this run Linux?
Re: (Score:2)
Evidently yes.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/riscv?h=v5.10-rc6 [kernel.org]
Please for the love of God don't tell Apple (Score:5, Funny)
Or else right after we switch over everything to ARM .. they'll be like "Psych!" and switch to RISC V. No thanks.
Re: (Score:2)
Re: (Score:1)
Or else right after we switch over everything to ARM .. they'll be like "Psych!" and switch to RISC V.
I know this was meant as humor but...
Is it even as good as the M1? They talked about comparing it to an oldish Snapdragon chip which the M1 seems significantly better than...
Maybe it's better, just seems like it's hard to judge with that CoreMark thing as the presented benchmark.
Re: (Score:1)
If Apple had bought P.A. Semi 3 years earlier then we could have avoided both the x86 and ARM transitions.
'Apple Silicon' could have been pumping out pwrficient OpenPOWER chips for both iPhone and Mac years ago.
Instead, PowerPC is a footnote in history.
Garbage collection (Score:2)
Until OS and software designers learn how to manage memory efficiently, the hardware will be hampered by the software.
Re: (Score:2)
So operating systems are VERY good at managing memory.... That said it is like blaming the garbage company for the fact that you have piles of garbage inside your house that you havent taken out in decades.
In reality - most real software is VERY good at memory management as well. Trust me garbage collection in the java sense is NEVER a good idea in a performance system - there are too many problems that get in the way.
2021: Year of Let's Beat the M1 (Score:4, Interesting)
I predict that this next year will bring any number of would-be "M1 Killers".
Unfortunately for them, Apple is likely already working on the next 3 Generations of "M" SoCs/CPUs and GPUs.
So, just like Qualcomm and Samsung have learned already, they will just never catch up.
BTW, this is not Flamebait. Ask Qualcomm and Samsung (and soon, Intel and AMD).
So, when can I buy a RISC-V board, like a RPi? (Score:2)
The SiFive HiFive Unleashed was way too expensive, (close to $2,000 US whence you got the PCIe & Ethernet expansion board). The very new, (and not shipped yet), SiFive HiFive Unmatched board is more reasonable at $665 US and does have more complete design. But still, I'd rather have a lower end board that is maybe less powerful, but is also a lot cheaper, like $200 US.
Apparently you can't "b
Re: (Score:3)
Re: (Score:3)
There's a lot of bad info in these comments (Score:2)
I've been studying RISC-V in depth for the past year. I had intended to make the simplest implementation I could in Minecraft, but life got in the way.
It comes down to this. Less instructions means less circuitry to implement said instructions, so you can optimize your design to (like the *nix design philosophy) due a few things very well, and very fast. x86 and ARM chips spend a good chunk of their real estate implementing support for old opcodes that most compilers don't even use. And that increases compl
Re: (Score:2)
I've been studying RISC-V in depth for the past year.
It comes down to this. Less instructions means less circuitry to implement said instructions, so you can optimize your design to (like the *nix design philosophy) due a few things very well, and very fast. x86 and ARM chips spend a good chunk of their real estate implementing support for old opcodes that most compilers don't even use. And that increases complexity and impact on the chip's thermal envelope.
You spent a whole year and didn't notice that
The way I understand the last point is that CISC can pack more computation into one instruction, so it takes less memory bandwidth. This becomes more important with faster CPUs and multiple cores. X86 processors have been internally RISC for a long time, and in some sense the x86 translation layer is
Re: (Score:3)
Re: (Score:2)
RISC-V's base ISA is frozen and there is no intention of ever changing it.
I would argue that every commodity processor ISA is "frozen". Any deviations from the base ISA are extensions.
From that point of view, the original 8086 is the base ISA for latest gen Intel and Ryzen processors. They can still execute original MS-DOS (if you can find a platform that will boot it). Both 32 bit and 64 bit "Intel" ISAs (yes I know 64 bit is actually AMD's intellectual property) are extensions to 8086 instruction set.
Re: (Score:2)
It comes down to this. Less instructions means less circuitry to implement said instructions, so you can optimize your design to (like the *nix design philosophy) due a few things very well, and very fast. x86 and ARM chips spend a good chunk of their real estate implementing support for old opcodes that most compilers don't even use
Not really. Each instruction input into the CPU goes through a lookup table and is converted into several micro-operations. These micro-operations control which inputs connect to the ALU, control where the ALU result is stored - we are talking about the control lines for all the basic components of the CPU.
When you add a new instruction to the ISA, you are just adding another entry to the lookup table. The underlying CPU does not change -- at least not for typical instructions. A new table entry does
Snake Oil (Score:1)
Currently this is all based on press releases from Micro Magic. If any third-party gets to verify performance of these CPUs I will be much more interested.