Ars Technica Gets Into Crusoe 210
redmist writes "Ars Technica has a great, in depth article about the new Crusoe chips. Enjoy." This one will answer most of the questions I've heard about Crusoe's guts, and how it differs from other microprocessors. "Must" reading for all hardware junkies!
Re:hi (Score:1)
damn i'm picky.
Why turn it off (Score:1)
Re:What I'd really like to hear about... (Score:1)
One possible problem is that the chip only has a finite number of registers (64, IIRC). So if you are emulating a chip with 40 registers, simultaneously emulating a second chip that needs more than 24 registers could cause problems. You'd probably store the extra registers in memory, which would slow the performance, but it wouldn't be any worse than a software-based emulator like vmware.
Give me a damn gcc or I don't want it (Score:1)
So even if Transmeta has no native C compiler, they still have a complete bootable operating system we can read.
And what C compiler does Transmeta use for Mobile Linux? Did they somehow remove the zillion lines of gcc'isms from the kernel code? Or is their compiler a derivative of gcc?
Meanwhile, I say: fuck the compatibility argument. I'm a big boy. If you tell me that my native VLIW binaries will crash and burn on the next model over, I can handle that. I'll recompile the program when I switch machines, but I want a native gcc, or the chip is not worth programming for.
Re:explain "cooL' (Score:1)
SMP Transmeta benefits? (Score:1)
just random thoughts...
Re:Sweet...or sour??? (Score:1)
I think the snake oil is pretty obvious if you look at benchmarks Transmeta has published. They are showing some "relative" time to complete typical Windows tasks vs. an Intel CPU and the Crusoe is loosing - though not by much. We don't get any "standard" benchmarks like SPEC or Drystones or MIPS or MFLOPS because if they ran those, Crusoe lack of processing power would just be all the more apparent. (Though it might come close if they ran those benchmarks as compiled native code as opposed to emulated x86.)
The reason they can get away with this, of course, is that you don't need a Pentium III 600 to run typical "Office" like apps - most of the CPU power on a chip like the PIII just gets burned up in system idle cycles anyway. Now, certainly the fact that Crusoe is low power is promising - a lot of people need a laptop that can run for 10 hours and they don't necessarily need to run Q3A full bore. It's also pretty cool that they put the "north bridge" and the memory controller on the same chip as the CPU - that's a really good idea, especially for the mobile market they are targeting. But this all doesn't excite me that much - does anybody remember the DEC StrongARM RISC? Another example of a chip that provides reasonably good performance from less than one watt of power - though it did not provide any kind of x86 compatibility.
Now obviously the Ars article points out that these aren't the ONLY CPUs Transmeta will produce. In the future they may build high performance workstation or server class chips. For now, I guess all the performance junkies can go back to drooling over the Alpha.
Just my 0.02
Re:You aren't SOPOSED to code in it's native set (Score:1)
Re:What I'd like to see... (Score:1)
Performance of the MPC8240 is in the range of 375 dhrystone MIPS at 266MHz. Would be nice if we had a similar benchmark for the Crusoe, yes? As another benchmark, the StrongARM SA1100 comes in at about 250 dhrystone MIPS at 220MHz - so similar performance. The StrongARM, of course, consumes less power (under 1 watt) than the MPC8240, but the 1100 does not have the built in PCI bridge.
Of course, then you can get into the "higher power" CPUs like the PowerPC G4 - it sits at 825 dhrystone MIPS at 450 MHz. Or, if you get into the SIMD vector processor, a billion floating point ops/seconds. That's pretty fast, though the chip consumes about 5 watts. Things like the Intel PIII and AMD Athlon provide about the same compute power as a G4, but consume MUCH more power - something in the range of 30 watts for these beasts. If your going to consume that much power, you might as well get yourself an Alpha which will give you double the performance of the Athlon on the same electrical budget. (You can't run x86 code native on a Alpha, but who gives a F* if you can get twice the performance for the same electrical budget?) Clearly a 30 watt CPU is well outside the notebook computer range. Obviously that's what "slow" low power chips like the Crusoe are for
Re:x86 only (mostly??) (Score:1)
The only stick Apple has over you is the Mac ROMs - you need these to boot MacOS, and Apple isn't handing out any licenses for these. So, if all you care about is running PPC code no problem! On the other hand, if its running MacOS your after - you'll have to deal with Apple.
I guess that was the long way of saying - Transmeta is free to make Crusoe run PowerPC instructions if they like. Apple has no say in the matter. In fact, Apple might even buy chips from Transmeta if they were to run PPC code and provided a better price/performance than "real" PPC chips.
explain "cooL' (Score:1)
I also note that Hotspot was heavily hyped and hasn't quite lived up to being the world-changing technology that it was supposed to be. I guess adaptive recompiling is harder than we thought...
Finally, VLIW *can* be damn fast. But what happens if you encounter a bunch of move instructions in a row, or a bunch of integer instructions, or whatever? Then only one of the four possible slots will be filled per clock cycle, while the other three instruction units sit around twiddling their thumbs, no?
IMO, we already have a a portable low level language. Its called C! I also suspect that any reasonable C compiler will out-optimize a JIT/Code Morpher/whatever just about any day of the week.
Hey - if I'm wrong, somebody please educate me! It sucks being ignorant!
Re:Slightly Off Topic (Score:1)
One thing that we may find, however, is that a certain architecture is emulated better than x86 (i.e. the PowerPC, ARM, or Alpha architecture may be easier to translate into native VLIW) Therefore it may be a better idea to run Linx over PPC/ARM/Alpha code-morphing software on a Transmeta chip (or maybe just specific type of Transmeta chip works better, etc., etc.)
Boy, this gets confusing after a while.
On a somewhat different topic:
I kind of wonder if IBM is actually getting some technology from Transmeta. They moved the AS/400 from 32-bit to 64-bit (CPUs) a few years back and had to make sure the new systems were able to execute old code (actually, I understand that AS/400 machine code is abstracted from the object code of programs, though probably not in quite the same way as how Transmeta did things - if that makes any sense at all..)
--
Ski-U-Mah!
Re:This is so cool... (Score:1)
--
Ski-U-Mah!
Re:Crusoe-VLIW native code (Score:1)
Heck, a Pentium III can run 8080/8086 code (maybe even 8008 code or 4004 code!)
Since the morphing code is running in Flash ROM, it can be upgraded, but if someone tried to load a morpher that doesn't work they're gonna have trouble reverting back to x86.
Heh, the thing I think is cool is that you could start off buying a chip this year, and if a new technology (Like SIMD or 3DNow!) comes out, you can just go to Transmeta's web site or whatever, download the new instructions, and go run a program that uses the new instructions! (Well, presuming that Transmeta will support older chips and whatnot -- that could be a problem with having different instruction sets for each chip. How long do you support an instruction set?)
--
Ski-U-Mah!
Re:Why I'm Disappointed in Crusoe (Score:1)
Also, much the same thing has happened in the Windows world. Many apps have 16-bit code under the hood, making Microsoft's transition from Windows 9x (16/32-bit OS) to the coming NT derivatives (fully 32-bit). This is also one reason why the WINE project can't run certain programs.
--
Ski-U-Mah!
Re:You aren't SOPOSED to code in it's native set (Score:1)
Real Speed (Score:1)
Re:Slightly Off Topic (Score:1)
And to beat a dead horse, the code morpher also optimizes. This is extremely important to the performance of Crusoe. It can actually run programs faster than if they were compiled natively, due to the run-time information available to the optimizer.
--
Re:Slightly Off Topic (Score:1)
What Transmaeta has essentially done is take the Merced core and execute the compiler at run-time. The alias handling structure acts like the ALAT on Merced.
Code executed through the translation layer should perform better than code executing on the bare metal because the translation software is learning and optimizing.
Think of it this way: would you rather manage your stock portfolio as is done today, by guessing what might happen, or would you rather know what the market is going to do and trade your stocks accordingly. I guarantee that I can beat your statically predictive management every time if I have that additional context.
--
Re:Slightly Off Topic (Score:1)
The translation software provides backward compatibility, yes, but it also provides flexibility for Transmeta.
What if Transmeta desigs the TM-ISA? It's a virtual machine designed to translate efficiently to the bare hardware. Now compilers can take advantage of the additional registers provided by TM-ISA. If a new core provides more physical registers, TM-ISA v.2 can be released, allowing the use of more registers by the compiler.
That's all well and good, but we get the additional benefit that old programs run on the new hardware just fine, and there's no additional hardware cruft to maintain compatibility.
Ok, that's pretty cool. Backward compatibility is important. But what's really neat is that Crusoe provides forward compatibility. Code written to TM-ISA v.2 will run just fine on processors released with TM-ISA v.1 as long as new firmware is loaded that can understand TM-ISA v.2. So now software houses can release code optimized for the latest and greatest without worrying about users behind the curve not being able to run their stuff.
How often do people moan about RedHat not providing Pentium-optimized packages? With Crusoe, RedHat can silence the critics without impacting us 486 users.
--
Re:You aren't SOPOSED to code in it's native set (Score:1)
Note that there is no reason Crusoe couldn't support a staging compiler. Transmeta could always release a virtual ISA that had support for doing this efficiently. And of course you could always write a dynamic compiler in x86 (ugh). The point is that Transmeta could directly provide support for something akin to DyC in a later processor. And still maintain both backward and forward compatibility.
Pretty neat trick, I'd say.
--
Re:You aren't SOPOSED to code in it's native set (Score:1)
--
Re:Yet another /. rant... (Score:1)
Re: native instruction set (Score:1)
I rather liked the idea that one poster suggested: rather than writing to the native instruction set, invent a new intermediate instruction set that is optimized towards making a better-performing code-morphing layer. It's a very interesting suggestion.
I also wanted to say that I'm surprised that more folks aren't really excited to read the insightful analysis at the end of the article where they gave a convincing argument for future transmeta chips that are not limited to the low-power mobile market. It had me salivating.
Re:You aren't SOPOSED to code in it's native set (Score:1)
Re:Slightly Off Topic (Score:1)
The point behind the Crusoe is not, not not NOT, to just be a better faster chip that optimizes better and consumes less power than those on the market now (though it is.)
The Crusoe's selling point is compatibility. Transmeta can churn out all sorts of chips, some optimized to sip current from batteries at a tenth of the rate of today's monsters, some designed to guzzle power even more and be speed demons. They can make radical changes to the basic design of the chip while doing this, and it won't matter, because though the way things are done internally may go topsy-turvy, the instruction set won't change, and the same programs can be run on each.
This neatly solves the drag placed on development by the need for backwards-compatibility (Want to run DOS 3.3 on your Athlon? You can if you feel like it.) Just like Windows, x86 chips have accumulated baggage - the sediment of silicon long since passed into figurative dust.
Transmeta has designed a beautiful thing - a chip that transcends backwards-compatibility. Writing to the bare metal on the Crusoe bolts it down, turns it into just another fixed-in-place bit-smashing engine. Kills it, in other words, removes what makes it an elegant hack.
Don't do it. Please.
The one true reason... (Score:1)
Re:Transmeta not impressive (Score:1)
OK, Transmeta have proven that they are pretty damn good at keeping secrets, so I would take the info obtained from that Usenet thread with a decent sized grain of salt (as opposed to most other Usenet "wisdom"
Finally, some clueful reporting and analysis. (Score:1)
Direct won't do much. But ... (Score:1)
Here is what needs to be done instead. Design an instruction set specific to the application that you are writing. Our current CPUs can handle very broad tasks and try to be good at everything and when it can't things like MMX, 3DNow and whatnot start to show up in the CPU.
So, If you know the box you are setting up is going to be a web server, design an instruction set that a web server would fly on. If you play games, design an instruction may looks like 3DNow on steroids.
Re:Some Question about Crusoe (Score:1)
I suspect that the translation units are based on so called 'basic blocks' which can most easily be described as anything in between a target label and a branch (i.e. entry and exit points in your code). This would allow optimisation of loop bodies.
This can be extended by going to 'super blocks' (multiple basic blocks) allowing sofisticated things like loop unrolling, software pipelining etc.
What I'm actually interested in is how the translation cache is being accessed. In a later post somebody states that the translation cache is maintained in main memory (therefore benefitting from the regular data cache). I'm not sure I understand how it is possible to do efficient cache lookups in this way. I assume they use hashing methods to map x86 memory pages to 'translation cache lines', but this has a much higher overhead then hardware based cache lookups.
I am also been a bit suprised by people being worried about loosing the cached translations when powering of a system. People, we're talking here about loops that are being executed 100s if not 1000s of times. Having to do the translation again for the first few iterations is not going to be the big performance loss they seem to think it is!
Re:computer architecture by hennesy and whatsisfac (Score:1)
Re:What I'd really like to hear about... (Score:1)
I would guess not, since there is only a single TLB, configured at boot time. Unless you wanted to flush it every time you changed instruction sets (!)
Re:Crusoe-VLIW native code (Score:1)
IMHO, people who aren't comfortable "rooting around inside their computers" probably won't be writing their own code morphers. This isn't script kiddie stuff...
Roadmap: high-end TM chips (Score:1)
I"d love to see these happen in next five years:
- Code Morpher for Alpha, PPC,
- Code Morpher to recognize the instruction set of a binary
- "optimization practically finalized for this piece of code" bit
- a TM CPU bus for several chips to share the same translation cache
(how necessary is this actually?)
- communication interface for operating systems
- ability to save final VLIW version of code beside the original binaries
Those would in essence offer the ability to turn a system eventually to VLIW binaries without actually putting any effort to it.
Once TM has covered its development investment:
- Open Source the Code Morpher
-> worldwide development of support for
- any chips
- integration with high-level compilers.
"No stop signs! No speed limits!" - AC/DC: Hghway to Hell
Re:You aren't SOPOSED to code in it's native set (Score:1)
Yes, but a good compiler will generate fully optimal code to begin with. A compiler that targets the Transmeta core Instruction Set should give you better code than the two level translation scheme.
But that's neither here nor there. Transmeta will not want people to code to the native Instruction Set because it will undermine their flexibility with the underlying hardware. Right now, the major benefit of the two level translation scheme is that the hardware architecture can be updated and improved while presenting the same programing model to application developers. This will allow Transmeta to aggresively experiment with the hardware architecture while maintaining software compatibility. This is very very cool!
Re:Is "mobile linux" GPLed? (Score:1)
Re:Crusoe-VLIW native code (Score:1)
Re:Slightly Off Topic (Score:1)
Pretty neat, but I haven't seen any real mention of emulating any architectures other than x86.
Re:What I'd really like to hear about... (Score:1)
--
Future quaking power? (Score:1)
The telling quote is at the end of the article though:
I'd say that it's only a matter of time before we hear an announcement of another product line from Transmeta. It won't be named Crusoe, because it won't be aimed at the mobile and embedded markets. It'll be a workstation and server class x86 CPU that runs Linux like a fiend, and it'll compete directly with Intel's IA-64. I can't wait.
It does make me wonder though, if such a chip (Slightly altered) would actually end up being superior for Quake. Given that the translating software is able to identify which parts of cache are used more often it becomes better at branch prediction, this could translate into faster gaming... I think... Contrary to this thought though is the fact that the Celeron is a good gaming processor with 128K cache... We shall see.
Along similar lines, if the x86 instructions are software, how much of the x86 instruction set does Quake use? Would the flexible software end up speeding up Quake by getting the x86 instructions out of the way?
Re:Is "mobile linux" GPLed? (Score:1)
Re:x86 only (mostly??) (Score:1)
Re:Slightly Off Topic (Score:1)
Not really. Transmeta could then just write a code-morphing layer to "morph" the ISA you coded to into the new one. No?
Re:Crusoe core instruction set? (Score:1)
Well, someone will still have to suffer the incompatibility mess: Those who write morphing sw for various cpu's. This will surely be more than just Transmeta, if the concept takes off.
The x86 instruction set isn't necessarily the best for this chip. Someone could make up a different one (perhaps something that use 32 registers or so) make a compiler for it, and have better performance than x86 code on the same chip.
This would have to be rewritten for another chip, but rewriting the instruction emulator is a lot less effort than recompiling the os and all apps. Still, someone must do it.
Re:Slightly Off Topic (Score:1)
Brilliant! The Meta Morphing Power Processors! Why stop there. Why not have transmeta write code morphing software that emulates their native instruction set and on top of that run code morphing software that emulates their native instruction set and on top of that run code morphing software that emulates their native instruction set and on top of that run...
IT'S TURTLES ALL THE WAY DOWN!!!
--Shoeboy
RE: Beowulfs --> notes from the past (Score:1)
way Way WAY back in micro-processor terms (1984-1985), I developed a white paper that attempted to extrapolate where PC's would develop by Y2K. (I'll put it up on my website if I can find which 5-1/4" floppy I saved it on, and re-hook a 5-1/4" drive to my PC).
Hopefully it doesn't seem self-congratulatory (because a number of my other conclusions stunk) or redundant to this thread to mention that three or four of the paper's conclusions fit the idea of developing a Crusoe type "beowulf in a box" exactly:
Just in time for Y2K. Interesting, eh?
Possibilities (Score:1)
While a lot of people are concentrating on how well this will work in small devices the author of this article is excited about the large-scale applications of the chip. I would have to agree. Think about a busy web server that is continuously generating web pages and doing database transactions. The code morphing software can spot that trend and be ready for it.
Should be interesting.
MBrod
Is "mobile linux" GPLed? (Score:1)
Re: code morph cache.. (Score:1)
The time it takes to re learn the optimization is very very short when compared to power on/off cycles.
Other Applications of Code Morphing? (Score:1)
I'm not a hardware guru so pardon the speculation...
Obviously, the code morphing is focused on x86 right now and, as the article suggests, may be adapted for PPC, Alpha, etc. in the future. Is it feasible that it could also be adapted for specialized processors such as graphics or sound?
I'm imagining an SMP-type of Transmeta box that, when you load Quake, automagically loads code morphing software onto one of the processors to act as the graphics accelerator or, if you're watching a DVD, can act as an MPEG decoder card
Is what I'm suggesting conceivable or am I way off base?
Overclocking (Score:1)
Any guesses as to how long until someone figures out how to patch the FlashROM so to allow overclocking? I give it about 6 months after Crusoe based systems hit the shelves.
-NooM
Re:Slightly Off Topic (Score:1)
Re:Some Question about Crusoe (Score:1)
SUB CX,AX
sets flags based on result in CX
If things are case sensitive, Cx would be a valid label.
Actually, both ADD and SUB set flags on x86.
Re:Possibilities (Score:1)
Shouldn't that be bravo ars technica?
There article on the K7 [arstechnica.com] was great, btw...
Crusoe core instruction set? (Score:2)
Re:What I'd like to see... (Score:2)
Re:What I'd really like to hear about... (Score:2)
Now I suppose Transmeta could design a full O-O-O core, but I don't see the point. If the software does a good job, the additional flexibility they gain to change the underlying machine is worth it.
As far as branches go, yes, you usually can guess a backward branch is going to be taken. But branches are still a huge problem. It's tough to keep a processor core fed. And don't even get me started on multiple branch prediction. The hit rate goes way down. A study was done here that showed processors today (or in the near future) spend about half the time recovering from branch mispredictions. That's a lot of wasted work. While the code morphing software can't do a perfect job, it is somewhat easier to tune the chip. And then think about per-application tuning. Load a different set of rules depending on the program you're running.
Interesting, no? :)
--
I guess you didn't read the article (Score:2)
"Crusoe's Code Morphing software not only keeps track of which blocks of code execute most often and optimizes them accordingly, but it also keeps track of which branches are most often taken and annotates the code accordingly. That way, Crusoe's branch prediction algorithm knows how likely a branch is to be taken, and which branch it should speculatively execute down. If a branch isn't particularly likely to go one way or the other, then Crusoe can speculatively execute down both branches.
Contrast this with speculative execution done on a normal CPU, where hardware limitations like buffer and table sizes limit the amount of information you can store about a particular branch and its execution history. Since Code Morphing keeps track of the branch histories in software, it can record a more finely grained description of the execution patterns of a wider window of code, and therefore assess more accurately whether or not a specific branch is likely to be taken."
Furthermore, since there's a software layer between the ISA of the binary and the machine's native ISA, Transmeta is free to beef up the execution engine (or any other part of the core) however they like, because the only thing that will require a recompile is the Code Morphing software. A case in point is the two chips in its product line. Each has a slightly different core (the Windows chip has special instructions in it that help speed up Windows), but they both are fully x86 compatible. There's nothing to keep them from stuffing new functions and features (SIMD anyone?) into the silicon, to help scale the product has high up as they want to go with it.
I'd say that it's only a matter of time before we hear an announcement of another product line from Transmeta. It won't be named Crusoe, because it won't be aimed at the mobile and embedded markets. It'll be a workstation and server class x86 CPU that runs Linux like a fiend, and it'll compete directly with Intel's IA-64. I can't wait."
--
My opinions may have changed, but not the fact that I am right.
Re:Why turn it off (Score:2)
Not to be an ass, but my Palmpilot doesn't lose its data when I switch batteries. Nor do (new) VCRs lose their programming on power loss. Devices called super capacitors (5V 1F style things) keep enough energy around to keep very low power components up and running in a sleep mode to ride out such interruptions.
Dynamic clock speed adjustment and BogoMIPS? (Score:2)
I'm curious to know how OSes will handle this. For example, we've already had a thread on the linux-kernel list about timing loops being thrown off by this for existing laptops (because the bogomips on which they're based are calculated at boot time). What was the outcome of that thread? Was a solution reached? Will it apply for Crusoe too?
Re:x86 only (mostly??) (Score:2)
Only if Apple (who didn't invent the PowerPC instruction set; it's a derivative of the IBM POWER instruction set) have some form of intellectual-property rights for the instruction set.
If there are any such rights owned by Somerset, Apple might also have some say in licensing it.
"The specs", in the sense of the instruction set specifications for PowerPC, are publicly available, although if the chip+software is intended to look like a particular PowerPC chip (I think the MMUs may differ, e.g. may have software TLB reload on some processors and hardware TLB reload on others), they'd need that spec as well (I think the specs for various PowerPC chips are also publicly available).
If somebody wanted to clone not only some PowerPC CPU but a support chip set for it, so that they could run OSes such as MacOS unmodified, that stuff might have to be changed...
...but that's just cloning a Mac, which Apple isn't allowing even if you use existing PowerPC chips.
Of course, there is the possibility that Apple would want to use a Transmeta chip in a Powerbook, say, in which case the Apple licensing issues go away.
Apple are unlikely to be the ones to block such a Code Morphing(TM)(R)(LSMFT) layer; they don't, as far as I know, have a problem with people building non-Mac-compatible PowerPC machines, and they already have, as far as I know, the tools to block people from building Mac clones.
Re:The short of it all .. (Score:2)
"Apply them all to hardware" in what sense? The binary-to-binary translators for Crusoe chips are software; they just happen to be running on hardware that offers some assistance, but the translation itself isn't done by hardware (and happens at a layer below even the lowest-level OS code; as far as the OS is concerned, all the way down to the lowest level, the chip looks like an x86).
Re:Slightly Off Topic (Score:2)
Correct. Compilers for the AS/400 (and its System/38 predecessor) for the languages in which applications are written generate code for a virtual machine with a very CISCy instruction set; low-level OS code translates that to the native instruction set. (That long antedates Transmeta; as indicated, it dates back to the System/38, which I think came out in the late '70's; IBM needed no technology from Transmeta to do that - binary-to-binary translation is hardly a Transmeta invention.)
It isn't done in exactly the same fashion, in that, on S/38's and AS/400's, the low-level OS code is written in languages that compile (or, for some code, assemble) into the native machine's instruction set, unlike Crusoe, where the only native code that's run is the translation software and the output of the translation software. Also, I don't think the translation on AS/400 is done as dynamically; I think programs are translated in their entirety the first time they're run, and the executable code for the entire program is kept around.
Re:Is "mobile linux" GPLed? (Score:2)
Where is the source? Read the GPL -- they don't have to release the source until they distribute the code. Mobile Linux hasn't been released yet, so they can sit on the source for now. Linus has promised it will be available RSN.
Steven E. Ehrbar
What I'd like to see... (Score:2)
After all, the Crusoe architecure is not a performance demon aimed at desktops/servers, and it is not aimed at the ultra-low power consuption StrongArm market. But might be suitable for the sorts of applications that embedded PPCs are currently used in...
Steven E. Ehrbar
Crusoe is like database SQL. Why is there SQL?! (Score:2)
As far as I can tell, the Crusoe processor engine itself is not special. If you are a "talented programmer programming to the bare metal", you might as well program in assembly on another pre-existing chip.
And then as a chip manufacturer, you'll face 20 years trying to ensure your vintage instruction set that those bare metal hackers employed.
You're missing the point.
Take database servers. Oracle, MySQL, Informix, Sybase, Uncle Joes Ultimate Data Thingy... Just about all of them allow access to their data through a standard SQL language.
But... But... but... Wouldn't it just be so insanely cool and fast if I could just direcly access the ISAM structures and indexes and modify disk sectors directly?!?! I fully expect every dedicated DBA and application designer to go to the bare iron to squeeze performance from their data warehouses!
Has that happened? No. Why? Because MOST, EVERY DAY APPLICATION DESIGNERS DON'T "PROGRAM TO THE BARE METAL". It's too complex, intensive, and fruitless a task. Why is Slashdot written in Perl and not assembly? Why isn't Linux 100% x86 assembly?
There is a BIG difference between just a cool hack and maintainable elegance.
Why do we have high level languages? Why do we have abstraction layers? Why?
The Code Morphing is an abstraction layer. Initially, that layer is the x86 instruction set, an arbitrary set of instructions that just happens to currently be widely used. Using Code Morphing, the Crusoe can leapfrog on that wide base of support, while throwing away the hardware architectural garbage traditionally needed to support it.
Back to SQL: Oracle supports SQL for access to data, but beneath, I'll bet you that a lot of the specific operations upon data that those SQL statements fire off has changed ENORMOUSLY over the years. What would have happened had they allowed programmers straight past the abstraction layer? They still would be trying to support that API today, and I bet they wouldn't be as free to rework their server software.
Furthermore, why do we have the DBI module and DBD modules in Perl? To provide a semi-universal abstraction layer across all databases. When one database's API changes for performance reasons, efficiency, whatever, you just change the morphing-- er DBD-- layer to accomodate it.
What is the point of Crusoe then?
Not to provide assembly hackers with a new opcode set to learn and tweak, which 90% of the application design world will never learn or exploit, and therefore will remain voodoo essentially.
The point is to provide an architecture which supports ABSTRACTION LAYERS of assembly opcodes. So Transmeta is free to vary the underlying hardware in any exotic or esoteric form they see fit, throwing backwards compatibility of their VLIW opcodes to the wind because the Code Morphing allows the SAME ABSTRACTION LAYER API to be exposed to the application designer.
Now, finally, note I keep saying 'application designer'. This is as opposed to 'dedicated hacker'.
Read the definition of a hack [tf.hut.fi]. The first two definitions are not my idea of elegance. Something that's quick and does the job but not well. Or, something that is incredibly good, but took a long time.
Now, read the definition of elegant [tf.hut.fi]. Something that combines simplicity, power, and grace. Something that is understandable, almost obvious in its expression. Something maintainable.
Tell me what's more maintainable: Assembly code for the Mx-650938 processor, or Java code. It's a close call, but I'll have to go with the Java code. It's harder to write a hack in Java, than it is to create an elegant design in assembly.
It's not about performance. We haven't even BEGUN to wring the performance from the chips we have-- and why? because it's not humanly possible for every applications designer to be a brilliant assembly hacker, which is why we have compilers!
So, finally, why spend your time learning the latest opcode set when you can just focus on a higher level language and leave the hand tweaking and performance tweaks to the man behind the curtain of the Code Morphing abstraction layer of OZ?!?!?!
Re:Crusoe core instruction set - Mobile Linux? (Score:2)
If they give you access to the underlying architecture, then they are committed to keeping that architecture in future versions. This way they can make up a new ISA for every chip, and just tweak the code morphing layer to make it work.
This gives them a performance hit now, but as Intel is forced to continue to support the x86 architecture in hardware for every new chip, they will have to make their chips ever bigger and ever hotter. Transmeta's approach will likely prove superior in the long run.
Re:You aren't SOPOSED to code in it's native set (Score:2)
Also, it sounds like they are optimising each chip to specifically support code morphing from a specific architecture. That means that x86 *is* a reasonably efficient instruction set for this particular hardware. Yes, you could probably make a faster one, but the gains would be marginal unless you actually got direct access to the underlying ISA, which defeats the whole purpose of this strategy.
Re:What I'd really like to hear about... (Score:2)
It seems a lot of posters are thinking the same thing. But...
You could say the same about a Celeron/P-III/Athlon/Whatever.
"I wonder how much faster my Athlon would go if I could rip out the silicon that does the intruction decoding / reordering / branch prediction / etc and code directly for the execution units."
It probably wouldn't go much faster (I'd guess that silicon does it's job pretty well) but by ripping out all those transistors you could significantly reduce power consumption.
In fact, if you think it through for five years or so you'll probably wake up one day and find you've re-invented Crusoe. Of course it'll be old news by then.
I really... (Score:2)
Re:Games and Crusoe (Score:2)
Re:Slightly Off Topic (Score:2)
IF YOU WANT TO CODE DIRECTLY TO A VLIW CORE BUY A &*$#ING MERCED!!!!!!
Sorry about that. You lot just aren't getting it. If you remove the code morphing layer, then you have to put backwards compatibility into the hardware down the road. That means lots o' transistors and high power consumption 2 or 3 years down the road. That also means that compiler complexity goes up dramatically. So you'll wind up having a crippled architecture and low quality compilers 10 years down the road. That's stupid. Additionally, if the compiler is entirely responsible for the optimization, you lose the niftly on-the-fly code tuning based on actual runtime data -- this is the coolest thing about the Crusoe.
--Shoeboy
Re:Why turn it off (Score:2)
Regardless, it will have to be turned off even if it is for battery changes. Also people may drain the battery after using it for a while, requiring a power down. These points aside, even low power devices today are turned off (i.e. laptops, palm pilots, etc.) since even standby mode drains too much power. The crusoe systems will probably drain more power than a palm pilot so you'll probably need to turn it off.
Re:What I'd really like to hear about... (Score:2)
My understanding from the articles I have read is that maybe, eventually, but right now it only emulates x86.
branch prediction (Score:2)
Branch prediction now is very stupid. Circuits try to guess, in real time, which branch will be taken. If the C compiler explained to the branch "predictor" that "this will loop 27 times, then stop looping".
Furthermore, explicit cache requests could be compiled. "I'll stay in this function for a while, but I'm also going to call these functions."
With profile-based optimizations and careful design you might never have a cache miss or a branch misprediction.
I've gotta get me one of these, and play around with alternative opcode sets. This is just the coolest toy for exploring computer architecture.
Re:What I'd really like to hear about... (Score:2)
Don't write to the VLIW, but... (Score:2)
Of course, one problem with this would be getting support for shadow programs built into the OS. I wonder if Transmeta has anyone that could handle this?
Re:You aren't SOPOSED to code in it's native set (Score:2)
The only way "code morphing" could run faster than native code is by exploiting runtime information to perform optimizations that are not possible at compile time. In other words, self-modifying code that runs faster than static code.
This is plausible, but that doesn't mean there would be no performance benefit in compiling native code. Research on self-modifying code is not unique to Crusoe---it's a very active area of research, and there are two major kinds: JIT and dynamic compilation. JIT, which you're probably all familiar with from Java, involves translationg code (typically from a foreign instruction set) and performing optimizations at runtime; dynamic compilation involves "staging" code at compile time to modify itself in a disciplined manner at runtime. JITs and dynamic compilation are very different in the nature of optimizations they perform; one of the major differences is that because dynamic compilation performs its analysis at compile-time, it can theoretically perform much deeper and more sophisticated optimizations.
Crusoe does no staging (it can't: it executes fully precompiled code), so its optimizations operate under severe time constraints. Therefore, Crusoe's code morphing is likely to produce code optimality akin to that emitted by a JIT compilation system: shallower analysis, shallower optimizations. Which almost certainly makes Crusoe's "code morphing" worse than native staged dynamic compilation would be.
In summary: my point is that self-modifying native code that improves its performance at runtime is entirely possible without "code morphing". On the other hand, binary x86 compatibility is arguably Crusoe's major selling point, so there's not much impetus for them to bother encouraging any kind of native code compilation. Anyway, I get the impression that Crusoe's entire architecture would have to be revamped if they wanted to run native code so it's a moot point.
If you're thoroughly confused by now, try visiting the dynamic compilation project [washington.edu] at the University of Washington for more information on dynamic compilation.
~k.lee
(BTW: this does not mean that Crusoe does not embody any technical innovations. In particular, the hardware support the chip provides for its runtime code translation is very interesting.)
You really, really, still don't get it... (Score:2)
You really, really still don't get it, do you? Firstly, Crusoe is the first chip Transmeta has got out the door. It's the simplest possible silicon, with the hard bits done in software. But there's no hard line between what functions can be done in hardware and what can be done in software. It's just that software is cheaper to tune.
When Transmeta have got code-mophing tuned the way they like it there is nothing to stop them releasing a new chip with the code-morphing engine in hardware.
But even if they don't, the limitation on performance computing design is cooling, as Cray amply showed. Crusoe consumes 1/32 the power of your PIII; so, for a given cooling system, you can stick 32 Crusoes in the same box. If each Crusoe gives you 66% of the compute power of the PIII, you've got a box which is going to deliver you more than 21 times the number of polygons your PIII can push.
One thing I haven't yet seen quoted is the part-price for a Crusoe, but if the silicon is as simple as people are suggesting the part-price could be very low - small dies have relatively lower reject rates because if you have one flaw per square inch, every inch square chip has a flaw whereas only one in ten 0.3 inch chips does.
By contrast your PIII is inherently an expensive part - it isn't expensive because Intel are profiteering, it's actually expensive to make. If Transmeta start shipping Crusoes at (say) around $10 per part in quantity, there isn't any way Intel can compete anywhere along the line.
I currently run two PII/300s in my desktop box. I bought them because two 300MHz parts and a motherboard to accomodate them were, at the time I bought them, a lot cheaper than one 500MHz part. If I can get, say, 8 400MHz Crusoes for the price of one 700 MHz Intel part, I will be quite happy to run them, and so I expect will a lot of other people.
Assuming, of course, that Linux 2.4 will run 8-way parallel on Crusoes, but I'm kind of prepared to bet it will :-)
Re:You aren't SOPOSED to code in it's native set (Score:2)
Was Quake3 running with a hardware accelerator? (Score:2)
The quake3 performance we saw on the ZDTV webcast was pretty damn impressive. Everyone seems to be assuming that they had 3d accelerators in those TM5400 laptops.
You can run quake 3 in software mode under mesa at about 3 frames per second.
But this is transmeta we're talking about and that was Dave Taylor, the SAME dave taylor that once leaked a document onto usenet ranting about
the inferiority of hardware graphics accelerators and that what he really wanted was a generic parallel processing chip that could do arbitary transforms.
GEE, a lot like the crusoe chip can do?
(anyone got the link to that usenet posting on deja that dave taylor tried to cancel?)
Isn't it feasible that they have put hooks into their code morphing software that optimises specially for 3d transforms and mesa/opengl?
Especially in the linux version? Where they have all the source code to linux and mesa?
Hmm, what fancy optimisations could those clever brains come up with?
Maybe those transmeta laptops WON'T need 3d accelerator ships?
And it would completely defeat the purpose of a low power laptop to put a big,hot,power sucking 3d chip in it. So I'm assuming that demo of quake3 they showed WAS running in software mode with some pretty fancy dynamic optimisations going on.
Maybe the reason they didn't make a big deal about this is that it's still a "work in progress" as Linus said about mobile linux so they don't want to hype it yet.
Someone prove me wrong?
Hint on Crusoe Webpad from 1-3-00 (Score:2)
Quote:
Early in the next century, Dean hopes his new concoction, which he says is "in the idea and invention stage," will be ready for the public: a sleek tablet that is magazine-size, inexpensive, programmable, and voice-activated. He expects his unnamed dream pad, which will run on a 24-hour battery, to provide everything a PC does, including streaming audio and video, word processing, and spreadsheets. It will even have a port for old fogies who can't give up their keyboards. And it will wirelessly put the Internet and other information at your fingertips.
End Quote.
Of course the article never mentions Transmeta, but I bet this web pad would be powered by Crusoe. Here's the link [usnews.com] for the article.
The next step? (Score:2)
Another thought I've had is that things just got harder for a company like Intel. It was no easy task for AMD to get big enough where they could afford to be competitive with Intel. But Crusoe-type processors sound like they would be much easier to design and produce...new companies will have a much lower barrier for entry into the competition. Lucky for Transmeta that they have their patents
numb
This is so cool... (Score:2)
OK, I'm just an applications geek, and know next to nothing about hardware, so this probably sounds pretty stupid. Live with it.
Re:Slightly Off Topic (Score:2)
They won't give a damn about backward compatibility, or what the "next" chip is going to implement - they're not programming for money, they're programming for fun, and they'll program using the VLIW instruction set because they'll think they can do it better than the Code Morpher can (for a particular chip, and a particular set of instructions). When they start playing with a new chip, they'll learn the VLIW instruction set for THAT chip and do it all over again.
BTW, regarding some of the replies:
1. "Transmeta's chips transcend backwards compatibility."
Bull.
Transmeta have to create versions of the Code Morpher to be "backwards compatible" with all of the various instruction sets that they choose to support from the other chip companies, plus any "improvements" to the instruction set that those chip companies make. They will have to create a Code Morpher version to run on each new chip that they develop. (Can you say, front-end/back-end?)
If they did a good job architecturally, and make it easy to upgrade the Code Morpher (assumedly in FlashROM or something similar), then given the current processor-types, it shouldn't be too difficult for them to create new front-ends and back-ends.
As time goes on, like any project, the Code Morpher code base will get more complicated & difficult to maintain. They'll make mistakes encoding the instruction sets, and then have to issue updates to correct it, etc.
2. "Code executed through the translation layer should perform better than code executing on the bare metal because the translation software is learning and optimizing."
By definition, a "perfect programmer" will always be able to do AT LEAST AS WELL as an optimizing compiler (even at run-time!), because he or she can USE THE SAME TRICKS as the optimizing compiler (write code which collects metrics & recreates itself based on those metrics). And because the programmer has application knowledge which the compiler doesn't, he or she will mostly likely be able to DO BETTER.
Like I said before: for the most part, programmers will use what Transmeta gives them - and for a very small fraction of programmers, in the tiny bits of their code where they want to squeeze out everything they can from the hardware, they're going to try to bang on the metal.
Based on the strong reaction to my reply, I'd say that at least a few people have been programming for a living so long, they've forgotten how much fun it is to "push the envelope" of any given piece of hardware.
Re:Slightly Off Topic (Score:2)
No matter how good the Code Morpher is, a talented programmer programming "to the bare metal" will be able to do better. A geek screaming for performance on their "baby" doesn't give a damn about whether the next processor will change its instruction set - he (or she) is interested in getting the max. performance out of the CURRENT processor - which DOESN'T mean you let somebody else's software get in the way.
As far as on-the-fly code tuning is concerned, no matter how good the "tuner" is, it can only react to changes & build code AFTER it has accumulated some metrics, whereas a programmer who is intimately familiar with his or her problem-space, can prebuild tuned code for handling most of their expected cases.
I fully expect dedicated hackers to do what every programming freak does - use the provided tools most of the time, and where they want total control & performance, to write the VLIW directly (no matter WHAT the people who made the chip say).
Frankly, ignoring all the hype, this is just a RISCier RISC chip - what the original RISC folks were aiming for in the first place, but which has fallen by the wayside as they tried to compete with Intel.
Re:Slightly Off Topic (Score:2)
But......
I have been thinking about this too and I'm wondering if it would be possible/logical to define some VLIW Instruction Set that could be used on all Transmeta chips, but would be faster and more efficient than translating x86. The CMS would still be translating from the "Transmeta Instruction Set" to the chip's native instruction set, so they could keep all the benefits as before.
Whadyall think?
Re: code morphing *has* been seen before (Score:2)
Re:explain "cooL' (Score:2)
As an instruction set, the x86 is pretty bad, however it's easy to code for and easy to optimize for, which are it's biggests strengths. As a mid-layer API for this device, it was probably a good choice -- x86 recompiles well on RISCy machines with lots of registers. PowerPC and others probably wouldn't. They have a lot of registers themselves and are more complex (plus the wide variety of x86 clones means that most people will likely shy away from dangerous instructions, whereas since the PPC is VERY standardized, many software packages could rely on subtle bugs in the silicon of the PPC. Believe me, bugs are the hardest part of any hardware archetecture to emulate.)
ARDI has some fairly interesting whitepapers [ardi.com] on their implementation of the 68k instruction set on x86. Keep in mind this is MUCH more difficult to do than the reverse. The 68k CPU has 16 registers, whereas x86 only has 8, etc.
The big problem with C is the same C source file compiled with the same compiler can often produce many wildly different results, and C doesn't solve the problem of hardware accesses, which almost always need to be done in a low-level language. This CPU/software will be beneficial to many companies due to the fact they will be able to reuse existing hardware, drivers and software with this. As long as the prices for the CPU get really low eventually, it could really lower the prices for PDA and hand-held computers. (Imagine playing a 3D accelerated game of Quake III on a hand-held machine)
The customary question... (Score:3)
Seriously though. The biggest problems with Beowulfs is space and heat, and imagine low-heat low-space processors wedged in there. Makes me horny.
From the mind of the most famous poster in all of slashdot
Re:What I'd really like to hear about... (Score:3)
That is missing the point, IMHO. One of the reasons the chip kicks ass is because they can change the hardware and you can't tell. Write native VLIW on this pig and you're fucked if they change, just like all the other processors.
... this is coming from a guy who prefers assembly to high-level languages in 98% of cases. I think they really struck on something here, don't fuck it up by asking to write in the "native tongue" of this beast. Well, unless you're writing your own processor.
Slightly Off Topic (Score:3)
Okay. The Crusoe is fully x86 compatible. Great. But how about developing applications for this processor that skip the translation step, and are already written in the processor's native language? Think about a Distributed.net client written SPECIFICALLY for this processor, with no x86 instructions.....
I'm betting that would speed up apps tremendously. Even Linux....ported directly to Crusoe's native instruction set. The problem I see is, the processor is designed to run x86 out of the box. Code would have to be written to change the Flash ROMs on the processor to bypass translation and hit the core directly, or at least do a straight-through delivery. (Why translate VLIW to VLIW?)
(IF YOU DO THIS AND FRY YOUR CRUSOE, I'M NOT LIABLE.)
-- Give him Head? Be a Beacon?
Beowulf (Score:3)
...damn it, now I'm horny.
--Shoeboy
Some Question about Crusoe (Score:3)
I have some concerns about the performance that the Cruose processors will actually have. The article mentions that translated instructions will be cached and then be reused if the CodeMorph software sees it again. However, it seems like the CodeMorph's state information will not be mantained between runs. If you power off the computer, the software loses the cached information and has to start from scratch again. In addition, the cache's size or location isn't given. Is it a small cache on die or is it located in system memory? The cache is probably on die for speed reasons but this would limit the size of the cache. This could be a performance hit since the cache is also used as a data cache and instruction cache.
Another question concerns the way the instructions are being cached. For example suppose the following instructions were given
ADD AX, BX
SUB CX, AX
JNZ Cx
Would the translation for each instruction be cached, or is the sequence cached? The article implies that the sequence is cached since the CodeMorph software can optimize the speed on subsequent passes. However, this seems to limit the benefit gained from caching to relatively tight loops or common sequences of code depending on the cache size.
On a side note, the article implies that the CodeMorph software lightyears beyond anything else. However, some of its highly touted features appeared in other software before. For example, DEC's FX!32 would initially just translate code but would also observe the application behaviour and then optimize the code based on that after the application finished executing. It could do this optimization several times, optimizing more aggressively on each pass. Also Apple's 680x0 emulator was also based in rom that would start up initially so what the MacOS could boot. The CodeMorph software has some new features if it really does OO scheduling and optimization on the fly but that seems like a pretty big hit on performance.
If future server/desktop oriented processors implement large parts of the CodeMorph software in hardware, how will that be any different than AMD or Intel's processors since they'll all be implementing a hardware instruction translation unit besides the Transmeta core being VLIW. Plus the transistor count and power consumption will also sky rocket along with that.
Re:Does anyone know... (Score:3)
All of those companies gave me precious device documentation and many of them gave samples as well. I used all of this in school and later in professional life ("we need a good low-power instrumentation amp." "I got a really cool one from AD which has great documentation, let's try it out and we can use them in volume (millions) later" "ok"). Semiconductor companies know the benefits of such behaviour and tend to act accordingly.
Embedded technologies are a very lucrative market that a lot of young people are jumping directly into (myself included). To deny the flow of information on your products would be like tying your own knot. I'm pretty sure Transmeta realises this.
Ask and ye shall receive.
Crusoe-VLIW native code (Score:3)
The primary reason is that they don't want to have to make these chips backwards compatible. Intel has a lot of problems with this - even the newest Pentium III's must support programs written for 386s. Intel has a hard time because it can't change these opcodes, but instead has to add new ones - hence MMX, SIMD instructions, the Katmai extensions (the P3 stuff), etc (and similarly, AMD has added 3dnow! et al).
Transmeta wants the freedom to be able to drastically change newer models of the CPU to keep it running at optimal speed/efficiency. If they wanted to allow us to write Crusoe-native code, then they'd need morphing software that allows newer models to morph old code to its own (modified) native code. In other words, a real pain in the rear and definately a problem if Crusoe can't run different "morphers" simultaneously (which I suspect it can't).
As for other morphing software to emulate other processors: I wouldn't be surprised if they allowed it to emulate some other chips - like the PPC, so it can run MacOS stuff - but it won't run nearly as well as x86 emulation will. The chip is meant to be able to morph code from many different platforms, but there are a lot of shortcuts to emphasize x86. I think that topic is addressed in the Ars Technica stuff, but basically Crusoe uses a FPU very similar to the x86 one. I think there are some other things for that in hardware, as well as the fact that we know they're dedicating most of their time to creating the x86 morphing software so it will be the most optimized.
I highly doubt that we'll be able to write our own morphers. I think that it's an extremely difficult thing to do, it would require knowledge of the Crusoe instruction set (which, as I said above, they don't want to release), and the morphing software is probably authenticated somehow. Since the morphing code is running in Flash ROM, it can be upgraded, but if someone tried to load a morpher that doesn't work they're gonna have trouble reverting back to x86.
Linus said that "Mobile Linux" is NOT a code fork - it's just the x86 version with a few modifications to make it run better on embedded platforms. Why reinvent the wheel?
Keep in mind that this is all SPECULATION - if anyone here has other information to the contrary, I'd like to hear it =)
What I'd really like to hear about... (Score:4)
There's of all kinds of fun I could have with this chip...
I also wonder whether it can multitask between different instruction sets. I guess the task switching overhead would be pretty brutal if there isn't room onchip for multiple instruction sets.
Transmeta not impressive (Score:4)
Efficiency isn't exactly exciting. Unless I am using a Palm Pilot, I really don't care if my PentiumIII or Alpha is sucking 34W and my Nvidia GeForce is sucking another 30. What I care about is how fast my performance is. How many transactions can I run? How many frames per second am I getting? How many polygons can I push?
Crusoe may be important for the coming ubiquitous computing revolution (if it ever happens), but they are not the first to go after low power (remember Rise? Remember WinChip IDT? Don't forget Strong ARM)
I think Crusoe is a nice chip, but the *HYPE* (and I mean hype) caused by deliberate secrecy and press leaks thoroughly destroyed any chance of it being seen as revolutionary in my eyes.
The Code Morphing technology is not revolutionary. Emulators have been doing dynamic instruction set recompilation for years now, DEC did it with FX32, Sun does it with Java JIT's (including HotSpot which does recompilation based on runtime profiles), SmallTalk VM's have been doing it, hell, even one of the Commodore 64 emulators does it if I recall. John Carmack's Quake3 engine even does it. I'm sure there are hundreds of projects in Academia that have been doing it. The only relevent difference is the hardware assist that the Crusoe has.
Chances are, when you hype something too much, it's going to be disappointing. There's a thread on Usenet that claims Transmeta's *ORIGINAL* goal was not low power, but the best performance, but when they couldn't attain it, they "fell back" to a low power selling point. I think it's in comp.arch.
You aren't SOPOSED to code in it's native set (Score:5)
The "code morphing" layer is what makes Crusoe stand apart from the rest. It optimizes on the fly the instuction set it's running on the fly. This means that your aps will run faster and faster as it runs. This layer is what gives the Crusoe it's speed. Coding nativly would be SLOWER then using the morphing layer. You also don't get the benifit of the optimaztion.
Also, the instruction sets are different for each chip. Each set is further optimized for what it's use is going to be. So if you code for one Crusoe chip natively , it doesn't run on the other. This lets Transmeta change the instruction set as needed to. Like if it's faster to do something one way, they can change it and not break compatability with anything. And they can give you the update with a software patch.
So, it doesn't matter if people don't have the instruction set for the native Crusoe processors. They will change alot, and everytime they change you would have to recode every program again. Why bother? Also you don't get to use what the Crusoe processor is all about, it's code morphing layer.
So, PLEASE, stop complaining that you can't code natively for this chip. The code won't go any faster, and as soon as Transmeta changes the set, your programs wouldn't run anyways. So it's a moot point to code navitly for it.