Variable Instruction Computing: What Is Old Is New Again (hackaday.com) 52
szczys writes: Higher performance, lower power. One of the challenges with hitting both of those benchmarks is the need to adhere to established instruction sets like x86. One interesting development is the use of Variable Instruction Sets at the silicon level. The basic concept of translating established instructions to something more efficient for the specific architecture isn't new; this is what yielded the first low-power x86 processors at the beginning of the century. But those relied on the translation at the software level. A company called Soft Machine is paving the way for variable instructions in hardware. Think of it as an emulator for ARM, x86, and other architectures that is running on silicon for fast execution while sipping very little power.
Re: Old is new (Score:4, Funny)
Wow dude... that's like, super meta... I mean, it's basically transmeta...
Get it? Is this thing on?
Flexible, but better than fixed? (Score:3)
Can they really get better performance per watt on general computing using a flexible substrate? Seems like whatever design they set up the flexible (FPGAish?) circuit to do, could be faster and lower power if it were put into a fixed silicon (or similar) implementation. Maybe if your workload devolves down to very simple needs for long periods of time, this might take advantage of that.
Re:Flexible, but better than fixed? (Score:5, Informative)
The problem with your analysis is that the words "flexible" and "fixed" don't have technical meaning here. Those aren't differing physical characteristics to choose between, those are just human-level descriptions of how the programming will be organized.
An FPGA intentionally has a whole bunch of extra circuits supporting each logical unit, those are expected to take a lot of extra power because it is additional functionality. An FPGA doesn't use more power per physical transistor, it just has a whole bunch of transistors and other logic devices for each programmable unit. When you then implement the circuit as an ASIC, it uses less power because it uses less logic devices, not because there is some other qualitative difference.
Something like this, any extra logic devices would be specifically designed to manage the other logic devices for low power use. That is a very reasonable thing to try to do. If their implementation is successful and useful in the market is a whole different issue, of course.
Transmeta was successful from an engineering perspective; their products used less power than their competitors. Problem was, they were only a few months ahead, and required too many changes in devices. All other companies had to do was be richer, and more able to secure access to new fab technologies.
One big difference here is that this will potentially change thread management for programmers in a way that many people will like. It might very well be able to fragment the industry and corner a significant chunk of interest.
So, its like x86 already is? (Score:5, Interesting)
So instead of the current situation where we have intel/amd processors doing something under the hood, using microcode as the language that translate the x86 environment into whatever is actually on the silicon ... and you're going to add ARM to it, and maybe some other ones?
Thats cool and all, but its not really all that useful, and intel can pretty much already do that on any CPU it wants with a microcode update. ARM may not run as efficiently on the core that intel uses, but it can be done from a technical point of view.
Its not worth it. Thats why no one does it.
You'll effectively do nothing well.
Intel was an ARM licensee (probably still is), they know ARM as good as anyone outside of ARM itself ... and they made entirely new silicon to run it (well technically they bought it if I recall correctly) ... and it even had its own microcode ... But what they never did was share a single core between both ARM and x86 CPUs that could change modes with a microcode update. No reason they couldn't other than its not efficient.
Re: (Score:3)
Too bad both Intel and AMD keep their microcode closed. There's so much fun we could do if they were documented and non-Tivoized.
Just the first use: shuffle opcode numbers, make your compiler emit those and recompile your software per-installation. Any exploits that use machine code are instantly thwarted.
And that's just a start...
Re: (Score:2)
So that wouldn't stop pure return-oriented-programming, or if anyone knew that you were doing something like that, exploits that can read the code memory.
Re: (Score:2)
If we're already recompiling per-install, it would be easy to randomize a lot about the code, making return-oriented-programming moot (or at least massively harder if you have too little entropy). Shell/perl/etc can be "compiled" into a scrambled form. We can randomize kernel syscall numbers even today. But all of that is worth comparatively little if the biggest risk, machine code, is easily exploited.
If you can read code memory then this technique can be defeated -- but needing to have two separate exp
Re: (Score:2)
Intel was an ARM licensee (probably still is),
Naw, they sold off XScale and the license presumably went with it
they know ARM as good as anyone outside of ARM itself
Naw, they knew how to make the ARM of the day fast, but not power-efficient. Everyone else's ARM sipped power per MIPS compared to XScale under Intel. Have not followed it under Marvell so I don't know if it ever turned out, but they still make it so it probably did.
But what they never did was share a single core between both ARM and x86 CPUs that could change modes with a microcode update. No reason they couldn't other than its not efficient.
it's just a waste of time. the demand is not there. why mess around with it?
Re: (Score:2)
Yep - the B1700 did this in the 1970s. Been there, done that... (I was a CPU engineer on one of them.)
Re: (Score:2)
Yep - the B1700 did this in the 1970s. Been there, done that... (I was a CPU engineer on one of them.)
But they didn't translate programs compiled to the various S-languages directly into microcode and execute the microcode. From this article about them [semiaccurate.com], it sounds as if that's what they're doing:
Re: (Score:2)
translate programs compiled to the various S-languages directly into microcode and execute the microcode.
What could possibly go wrong, in today's connected era. LOL.
Re: (Score:3)
translate programs compiled to the various S-languages directly into microcode and execute the microcode.
What could possibly go wrong, in today's connected era. LOL.
That's pretty much what these guys [ibm.com] do, although they've stopped calling what it gets translated to is "microcode" (in the current machines, it's Power Architecture code, possibly with a few extensions such as tag bits).
(They used to call the low-level OS and binary-to-binary translator code "vertical microcode", in the days before it was PowerPC/Power Architecture code, but that was for legal reasons; they didn't want to be forced to make the code available to clone makers. "Vertical microcode" ran out of
Re: (Score:1)
Yep - the B1700 did this in the 1970s. Been there, done that... (I was a CPU engineer on one of them.)
So did the i432 a few year later.
Intel iAPX 432 [wikipedia.org]
The innovative features of the iAPX 432 were individually detrimental to good performance. Combined together, it ran slower than contemporary conventional microprocessor designs such as the Motorola 68010 and Intel 80286. One problem was that the two-chip implementation of the GDP limited it to the speed of the motherboard's electrical wiring. A larger issue was the capability architecture needed large associative caches to run efficiently, but the chips had no room left for that. The instruction set also used bit-aligned variable-length instructions (as opposed to the byte or word-aligned semi-fixed formats used in the majority of computer designs). Instruction decoding was much more complex than in other designs. In addition, the BIU was designed to support fault-tolerant systems, and in doing so up to 40% of the bus time was held up in wait states.
Re: (Score:2)
Yep - the B1700 did this in the 1970s. Been there, done that... (I was a CPU engineer on one of them.)
So from within the dark entrails of Burroughs you were one of the Hardy Boys helping bring to light the Soul of a New Machine? That's awesome, and I'm not Kiddering. What if... those BCD architectures [wikipedia.org] were designed from scratch using today's tools and methods? Is there something the machine could do inherently better or faster, something that benefits from the use of errorless unbounded decimal arithmetic?
Re: (Score:2)
I remember studying about variable length instruction sets when I was studying CS back in the '70s. Scared the hell out of me.
"Variable" or "variable length"? There's nothing particularly exotic about instruction sets where not all instructions are the same length, so presumably you meant the former.
Re: (Score:2)
Well - variable length instructions, variable length data path too. The S-ops were Huffman encoded, i.e. the most often used were the shortest. The B1700 had a about a 3:1 code density advantage over the IBM 360 in Cobol if my memory serves me. Yes - likely Transmeta's JIT compiler is closer to what this is about. I also worked on the Cydra 5 which was a VLIW machine - it's Achilles heal was solved by the JIT trick.
that quote is wrong (Score:2)
Re: (Score:2)
I scrolled down and saw what you're referring to:
"Gort, klaatu nikto barada." -- The Day the Earth Stood Still
Slashdot, turn in your stash of fake nerd cards. You're not even at poser level anymore.
Re: (Score:2)
nikto is the last word. as written, it's a jelly doughnut.
"Gort, I am a jelly doughnut"?
Re: (Score:2)
"Gort, Klaatu needs a jelly doughnut to be brought back to life. Oh, and please don't destroy the earth."
Re: (Score:3)
Went RISC uOps to Get Perf. Anyone who thinks a PENTIUM was looking for Low Power Is AN IGNORANT SLUT!
(Pentium Pro, but whatever.)
No, but Transmeta did something similar to what it appears Soft Machine are doing, and did so to reduce power consumption; I think that's what "The basic concept of translating established instructions to something more efficient for the specific architecture isn't new; this is what yielded the first low-power x86 processors at the beginning of the century." was referring to.
ReTransMeta (Score:2)
Yes and Transmeta were not particularly successful in spite of the promise the technology showed.
Re: (Score:2)
The companies that survived were the ones with money in the bank and not necessarily the ones with the best ideas or products.
It also doesn't help much when your customers and potential customers either go bust or stop spending money.
So.... (Score:3)
Re: (Score:2)
Transmeta with a hardware morphing layer?
Maybe, maybe not. An article about them on SemiAccurate [semiaccurate.com] says "SM can run what it calls personalities in software but they are not implemented in the expected way. Personalities are software and are loaded at boot time, but they are both light and low-level. They don’t emulate code, they just translate it to the native ISA, a 32-bit add is a 32-bit add on both native and emulated hardware, but probably have differing opcodes." and "Personalities are not purely software though, there are hardware hooks
Re: (Score:2)
That breaks down very quickly when you get to any memory operations, as well as all the various flavours of SIMDs...
It really doesn't make much sense that you can be more power efficient in your implementation of the behaviour and ordering of an exclusive store-release transaction using generic ops compared to hardware that was explicitly built and optimized for that instruction.
Yeah, maybe your integer and
Re: (Score:2)
a 32-bit add is a 32-bit add on both native and emulated hardware
Hate to tell you this, but no...
On x86 a 32-bit add also updates a flags register that is commonly leveraged. A full emulation of this register would be quite expensive on architectures that dont automatically track all of the same things.
Re: (Score:2)
http://www.amazon.com/The-Soft-Machine-William-Burroughs/dp/0802133290
Or Soft Machine, the band [wikipedia.org].
What's old is old again... (Score:2)
Translating instruction to micro-ops to run on a VLIW-ish backend? I think every high performance architecture does that now (arm and x86)
Share processing resources between cores? AMD tried to share the FP pipeline (flex FP?) between cores starting with their bulldozer architecture, but it looks like they are going to abandon that with their zen architecture after getting beat up about single thread perf...
Re: (Score:2)
Translating instruction to micro-ops to run on a VLIW-ish backend? I think every high performance architecture does that now (arm and x86)
x86 - and z/Architecture, with the z13 chip - but do any ARM processors (or other RISC processors) do that?
Re: (Score:2)
Look up nvidia Denver, it is a wide CPU that is said to do that sort of things.
Uh... no (Score:2)
As the 6502 only had a single stack, limited in size to 256 bytes, and hard coded to reside at memory address range 0x0100-0x01ff, I might tend to disagree with that assessment.
Re: (Score:3)
You forget that the 6502 could do indirect addressing through any of the zero page locations giving you a potential 128 stack pointers for your stack machine. Also, the zero page had a special address mode so that loads, stores and increments/decrements could be done with two byte instructions instead of three byte instructions, reducing the fetch execute time by one cycle.
However, I think the main reason stack machines were often implemented on 6502 has more to do with its relative lack of registers. There
Re: (Score:2)
I've got an idea... (Score:1)
Make a cpu with just a few instructions and do complex stuff by repeating simple things many times, fast....oh hang on...
I can only wonder... (Score:2)
I can only wonder... if the Crusoe and Efficeon patents are being licensed from Intellectual Ventures (who ended up owning them), or if we are going to see another East Texas lawsuit over this.
Not that great (Score:2)
There is a reason the idea fizzled: If you have very special code, it may be able to compete speed-wise, otherwise it will be slower. As compilers optimize better these days, it will be even worse today. And the "low power" is a red herring: If you want that (at slow speed), compile to ARM code, not to x86.
My guess is somebody is looking for funding from clueless people.
Re: (Score:2)
Or maybe Netflix. It has the makings of a great Zombie movie - with Orson Welles as RMS, Vincent Price as Theo, Peter Cushing and Christopher Lee and Kernighan and Richie, Lon Chaney as SCO and Special appearance of Boris Karloff as Bill Gates.
I am willing to write the screen play (in K&R C) for a considerable fee. Someone else will have to do the CGI.
Transmeta? (Score:4, Informative)
This sounds like Transmeta. Remember that, Slashdot old-timers? The company had trouble, and was eventually bought by private equity. I'm too lazy to find out if this is a re-emergence by the rights holders, or if they're going to get sued by the guys who bought Transmeta's IP. IIRC, It was an Israeli company that took it off the US exchange. After that I lost track of it.