Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Hardware

Ars Technica Gets Into Crusoe 210

redmist writes "Ars Technica has a great, in depth article about the new Crusoe chips. Enjoy." This one will answer most of the questions I've heard about Crusoe's guts, and how it differs from other microprocessors. "Must" reading for all hardware junkies!
This discussion has been archived. No new comments can be posted.

Ars Technica Gets Into Crusoe

Comments Filter:
  • by Anonymous Coward
    Mr T vs. CmdrTaco would be funnier if you had Mr T saying "hella", like all the other, funnier Mr T vs. "whatever", instead of "helluva".






    damn i'm picky.
  • by Anonymous Coward
    Given their target of small mobile devices, like webpads and the like, its low power consumption and sleep mode, I don't think it's intended to be "turned off".
  • by Anonymous Coward
    Watch the videos (in RealMedia format, 1 [e-media.com] 2 [e-media.com] 3 [e-media.com]). They indicate that it would be possible to multitask multiple instruction sets simultaneously, and demoed a CPU running java bytecode. The instruction sets aren't stored onchip, they are implemented in software.

    One possible problem is that the chip only has a finite number of registers (64, IIRC). So if you are emulating a chip with 40 registers, simultaneously emulating a second chip that needs more than 24 registers could cause problems. You'd probably store the extra registers in memory, which would slow the performance, but it wouldn't be any worse than a software-based emulator like vmware.

  • by Anonymous Coward
    Mobile Linux is a derivative of Linux, and Linus Torvalds is not the only copyright holder of the Linux kernel. As Linus himself has said, he cleverly tied his own hands: so many people have copyrights in the kernel that nobody, not even Linus, can release a non-GPL version now.

    So even if Transmeta has no native C compiler, they still have a complete bootable operating system we can read.

    And what C compiler does Transmeta use for Mobile Linux? Did they somehow remove the zillion lines of gcc'isms from the kernel code? Or is their compiler a derivative of gcc?

    Meanwhile, I say: fuck the compatibility argument. I'm a big boy. If you tell me that my native VLIW binaries will crash and burn on the next model over, I can handle that. I'll recompile the program when I switch machines, but I want a native gcc, or the chip is not worth programming for.

  • by Anonymous Coward
    Explain why Crusoe is ``cool''? My friend, for those who know, no explanation is necessary. For those who don't know, no explanation is possible.
  • by Anonymous Coward
    I was just thinking about some interesting side-effects of a SMP Transmeta computer. Would it be possible to have a central optimized cache for all the processors? Any problems with contention? Could one processor be dedicated entirely to computing optimizations? Any advantages here? I guess I'm comparing this to current SMP server systems. Would it be silly to consider mobile SMP applications?

    just random thoughts...

  • by Anonymous Coward
    Well, if you really read the Ars article, he makes it pretty clear that the Crusoe is NOT a high performance chip. It is designed to run "typical" applications (like Office) at a similar pace as a full blown Intel CPU but with lower power consumption. (It seems to me we have heard this line before - AMD tried to sell us all the K5 - or was it the K6 that was "the fastest engine ever designed for Windows apps." The problem was, if you wanted to run Quake instead of Office, the CPU generally sucked ass performance wise.)

    I think the snake oil is pretty obvious if you look at benchmarks Transmeta has published. They are showing some "relative" time to complete typical Windows tasks vs. an Intel CPU and the Crusoe is loosing - though not by much. We don't get any "standard" benchmarks like SPEC or Drystones or MIPS or MFLOPS because if they ran those, Crusoe lack of processing power would just be all the more apparent. (Though it might come close if they ran those benchmarks as compiled native code as opposed to emulated x86.)

    The reason they can get away with this, of course, is that you don't need a Pentium III 600 to run typical "Office" like apps - most of the CPU power on a chip like the PIII just gets burned up in system idle cycles anyway. Now, certainly the fact that Crusoe is low power is promising - a lot of people need a laptop that can run for 10 hours and they don't necessarily need to run Q3A full bore. It's also pretty cool that they put the "north bridge" and the memory controller on the same chip as the CPU - that's a really good idea, especially for the mobile market they are targeting. But this all doesn't excite me that much - does anybody remember the DEC StrongARM RISC? Another example of a chip that provides reasonably good performance from less than one watt of power - though it did not provide any kind of x86 compatibility.

    Now obviously the Ars article points out that these aren't the ONLY CPUs Transmeta will produce. In the future they may build high performance workstation or server class chips. For now, I guess all the performance junkies can go back to drooling over the Alpha.

    Just my 0.02
  • The reason to do the optimization on the fly is that by doing so you gain extra profiling information that is impossible to get at compile time. Dynamic optimization/recompilation allows the processor to improve the execution speed of blocks it executes frequently, and also do things like adjust caching schemes and do better speculative (or predicated, I guess) execution.
  • by Anonymous Coward
    Interesting that you should ask this question. In fact, Motorola makes an embedded version of the 603e called the MPC8240. It has a built in 66MHz PCI "north bridge" and 100MHz SDRAM controller. Sounds a little like the Transmeta chip except that it consumes more power and only runs "native" PPC code. The 8240 is generally used in applications like routers or other network devices.

    Performance of the MPC8240 is in the range of 375 dhrystone MIPS at 266MHz. Would be nice if we had a similar benchmark for the Crusoe, yes? As another benchmark, the StrongARM SA1100 comes in at about 250 dhrystone MIPS at 220MHz - so similar performance. The StrongARM, of course, consumes less power (under 1 watt) than the MPC8240, but the 1100 does not have the built in PCI bridge.

    Of course, then you can get into the "higher power" CPUs like the PowerPC G4 - it sits at 825 dhrystone MIPS at 450 MHz. Or, if you get into the SIMD vector processor, a billion floating point ops/seconds. That's pretty fast, though the chip consumes about 5 watts. Things like the Intel PIII and AMD Athlon provide about the same compute power as a G4, but consume MUCH more power - something in the range of 30 watts for these beasts. If your going to consume that much power, you might as well get yourself an Alpha which will give you double the performance of the Athlon on the same electrical budget. (You can't run x86 code native on a Alpha, but who gives a F* if you can get twice the performance for the same electrical budget?) Clearly a 30 watt CPU is well outside the notebook computer range. Obviously that's what "slow" low power chips like the Crusoe are for ;)
  • by Anonymous Coward
    Ahh - no. They might have to license something from IBM or Motorola though. IBM "invented" the Power RISC architecture before Apple had anything to do with it. There have been groups in the past that had dreams of building a faster PowerPC than what IBM or Motorola can do - remember Exponential? The obtained a license to build PowerPC chips easy enough, its just that by the time they could develop their own PowerPC core, Motorola and IBM were cranking out the 604e, which provided about the same compute power as the Exponential chip, but at like 1/10 the power consumption. (The 604e was something like 7 watts where the Exponential chip was estimated at around 60 to 70 watts.) And so, Exponential went under - all the while pointing the finger at Apple when it was their own flawed product and development timeline that was to blame. I mean, OBVIOUSLY if Apple can buy a 604e from IBM or Motorola that produces the same computer power and 1/10 the heat output, they aren't going to be buying many chips from Exponential. Duh! Of course, this is all ancient history now - PowerPC has sense moved on to the G3 and G4 chips. And you are free to plug them into any piece of hardware you like.



    The only stick Apple has over you is the Mac ROMs - you need these to boot MacOS, and Apple isn't handing out any licenses for these. So, if all you care about is running PPC code no problem! On the other hand, if its running MacOS your after - you'll have to deal with Apple.



    I guess that was the long way of saying - Transmeta is free to make Crusoe run PowerPC instructions if they like. Apple has no say in the matter. In fact, Apple might even buy chips from Transmeta if they were to run PPC code and provided a better price/performance than "real" PPC chips.

  • by Anonymous Coward
    The low power aspects of Crusoe *are* cool. But I am curious about what makes code-morphing "cool". VLIW is old hat, and code-morphing sounds suspiciously like a JIT: it recompiles x86 into a native VLIW format. As an example, take Sun's Java hotspot compiler, which adaptively recompiles one machine language format (Java bytecode) into another (x86 or Sparc or whatever.) Hotspot *also* does on-the-fly optimization and it also analyzes a running program for "hotspots" that need to be aggressively optimized.

    I also note that Hotspot was heavily hyped and hasn't quite lived up to being the world-changing technology that it was supposed to be. I guess adaptive recompiling is harder than we thought...

    Finally, VLIW *can* be damn fast. But what happens if you encounter a bunch of move instructions in a row, or a bunch of integer instructions, or whatever? Then only one of the four possible slots will be filled per clock cycle, while the other three instruction units sit around twiddling their thumbs, no?

    IMO, we already have a a portable low level language. Its called C! I also suspect that any reasonable C compiler will out-optimize a JIT/Code Morpher/whatever just about any day of the week.

    Hey - if I'm wrong, somebody please educate me! It sucks being ignorant!
  • Rumor has it that they actually ported Linux to run on bare hardware, and it didn't really help enough to make it worth the trouble. Besides, a new version of Linux would likely have to be made for each different Transmeta chip (as the TM3120 and TM5400 have different instruction sets)

    One thing that we may find, however, is that a certain architecture is emulated better than x86 (i.e. the PowerPC, ARM, or Alpha architecture may be easier to translate into native VLIW) Therefore it may be a better idea to run Linx over PPC/ARM/Alpha code-morphing software on a Transmeta chip (or maybe just specific type of Transmeta chip works better, etc., etc.)

    Boy, this gets confusing after a while.


    On a somewhat different topic:
    I kind of wonder if IBM is actually getting some technology from Transmeta. They moved the AS/400 from 32-bit to 64-bit (CPUs) a few years back and had to make sure the new systems were able to execute old code (actually, I understand that AS/400 machine code is abstracted from the object code of programs, though probably not in quite the same way as how Transmeta did things - if that makes any sense at all..)
    --

    Ski-U-Mah!
  • I understand there was a proof-of-concept demo at the Crusoe unveiling that would switch from x86 to Java bytecodes. I'm not sure if swapping between x86 and Java required a reboot or anything like that -- I wish I could have been at the unveiling so I could have seen that in person (but then I'm probably a terrible reporter ;-)
    --

    Ski-U-Mah!
  • The primary reason is that they don't want to have to make these chips backwards compatible. Intel has a lot of problems with this - even the newest Pentium III's must support programs written for 386s

    Heck, a Pentium III can run 8080/8086 code (maybe even 8008 code or 4004 code!)

    Since the morphing code is running in Flash ROM, it can be upgraded, but if someone tried to load a morpher that doesn't work they're gonna have trouble reverting back to x86.

    Heh, the thing I think is cool is that you could start off buying a chip this year, and if a new technology (Like SIMD or 3DNow!) comes out, you can just go to Transmeta's web site or whatever, download the new instructions, and go run a program that uses the new instructions! (Well, presuming that Transmeta will support older chips and whatnot -- that could be a problem with having different instruction sets for each chip. How long do you support an instruction set?)
    --

    Ski-U-Mah!
  • Unfortunately, I think the x86 compatibility would end up being used as a crutch, like the Windows 3.x compatibility in OS/2. Software developers thought, "Well, if OS/2 runs Windows applications, there's no reason for me to port my app to OS/2."

    Also, much the same thing has happened in the Windows world. Many apps have 16-bit code under the hood, making Microsoft's transition from Windows 9x (16/32-bit OS) to the coming NT derivatives (fully 32-bit). This is also one reason why the WINE project can't run certain programs.
    --

    Ski-U-Mah!
  • I believe the morphing layer is compiled to native code. If it was the highest performance way to do the morpher, then that contradicts your claim. (ok so there is some minimal component that would have to be native to boot strap the morpher) Also, I estimate that 50% of the cpu cycles are spent running the morpher, so native code would get an automatic 2x advantage over x86 code.
  • We've heard about a ~650MHz TM chip being comparable to a 500MHz PIII. But the real question is, what fraction of the CPU cycles are running the morpher? That is a very interesting question. Especially when comparing different morpher's etc. A first guess would could be PIII @500 is ~700 MIPS, TM gets about 2 x86 Ins/cyc so 350 MHz are spent on application code and ~300 MHz on code morpher. I'm damm impressed with a near or better than 1-1 ratio!
  • This still doesn't solve the problem of third-party vendors. Shipping multiple binaries for different classes of CPU core is not economical. There are just too many variables. The support costs are nightmarish. Code morphing can alleviate this.

    And to beat a dead horse, the code morpher also optimizes. This is extremely important to the performance of Crusoe. It can actually run programs faster than if they were compiled natively, due to the run-time information available to the optimizer.

    --

  • You're ignoring the fact that the translator also optimizes. If you look at this post [slashdot.org], you'll see that the code morpher does some neat tricks to get around the aliasing problem. This is something a static compiler can't easily do. Sure, with profiling and re-compilation it might make some intelligent guesses, but isn't it simpler to let the translation software do it for you?

    What Transmaeta has essentially done is take the Merced core and execute the compiler at run-time. The alias handling structure acts like the ALAT on Merced.

    Code executed through the translation layer should perform better than code executing on the bare metal because the translation software is learning and optimizing.

    Think of it this way: would you rather manage your stock portfolio as is done today, by guessing what might happen, or would you rather know what the market is going to do and trade your stocks accordingly. I guarantee that I can beat your statically predictive management every time if I have that additional context.

    --

  • No, it's not a closed source problem. It might be a binaries problem. I don't know about you, but I wouldn't want to statically compile an ActiveX object every time I view a web page.

    The translation software provides backward compatibility, yes, but it also provides flexibility for Transmeta.

    What if Transmeta desigs the TM-ISA? It's a virtual machine designed to translate efficiently to the bare hardware. Now compilers can take advantage of the additional registers provided by TM-ISA. If a new core provides more physical registers, TM-ISA v.2 can be released, allowing the use of more registers by the compiler.

    That's all well and good, but we get the additional benefit that old programs run on the new hardware just fine, and there's no additional hardware cruft to maintain compatibility.

    Ok, that's pretty cool. Backward compatibility is important. But what's really neat is that Crusoe provides forward compatibility. Code written to TM-ISA v.2 will run just fine on processors released with TM-ISA v.1 as long as new firmware is loaded that can understand TM-ISA v.2. So now software houses can release code optimized for the latest and greatest without worrying about users behind the curve not being able to run their stuff.

    How often do people moan about RedHat not providing Pentium-optimized packages? With Crusoe, RedHat can silence the critics without impacting us 486 users.

    --

  • Moderators, knock this one up! Great explanation of JIT vs. Dynamic Compilation/Specialization!

    Note that there is no reason Crusoe couldn't support a staging compiler. Transmeta could always release a virtual ISA that had support for doing this efficiently. And of course you could always write a dynamic compiler in x86 (ugh). The point is that Transmeta could directly provide support for something akin to DyC in a later processor. And still maintain both backward and forward compatibility.

    Pretty neat trick, I'd say.

    --

  • No, a static compiler will never produce optimal code. It can't for several resons:
    • Separate compilation. Without being able to look at the whole program at once (including BTW, system libraries and kernel code), the compiler can't fully know the aliasing conditions present in the program. It can make some guesses, but a function call to an external module will pretty much kill the optimizer (though you can do things with locals and such).
    • Lack of context. A static compiler has no idea what will happen at run-time. Can the compiler elimiate a load after a store? Not if the store potentially writes to the same address from which the load reads. But there may be times when the load is not dependent on the store. Theoretically, the code morpher can take advantage of this. With profiling, a static compiler can gain some run-time context. But you're relying on the assumption that the profile runs are representative of the way the program will be used for all time. And then you get into that hazy, ugly area known as dynamic memory allocation...
    • ISA limitations. The compiler is restricted to the idioms provided by the machine's ISA. On the x86, performance is absolutely killed by the lack of general-purpose registers and the non-orthogonality of the instruction set. With the code morpher, Transmeta could theoretically release a "clean" ISA that is a nice compiler target. And improve it as the experience builds up.

    --

  • Because not all /. reades read Ars. Also, because posting the article on /. allows /. readers to discuss it.
  • I agree completely, but wanted to offer a reason why folks might be willing to live with the lack of forward-compatibility: perhaps the source for the software that they run is freely available and they don't mind recompiling. Just a thought.

    I rather liked the idea that one poster suggested: rather than writing to the native instruction set, invent a new intermediate instruction set that is optimized towards making a better-performing code-morphing layer. It's a very interesting suggestion.

    I also wanted to say that I'm surprised that more folks aren't really excited to read the insightful analysis at the end of the article where they gave a convincing argument for future transmeta chips that are not limited to the low-power mobile market. It had me salivating.

  • The article contains some good reasons for not doing it ahead of time in the compiler: with the code-morphing layer, you can keep real statistics on which blocks of code are actually used frequently, and whether or not a branch is likely to be taken -- under the actual conditions that the software is running. I know of no compiler that optimizes by running code with real data. Can it really be done? It just sounds like something best done dynamically to me.
  • Do you want to kill the Crusoe? Because that's what your thirty-year-old delusions of assembly-code grandeur will do to it.

    The point behind the Crusoe is not, not not NOT, to just be a better faster chip that optimizes better and consumes less power than those on the market now (though it is.)

    The Crusoe's selling point is compatibility. Transmeta can churn out all sorts of chips, some optimized to sip current from batteries at a tenth of the rate of today's monsters, some designed to guzzle power even more and be speed demons. They can make radical changes to the basic design of the chip while doing this, and it won't matter, because though the way things are done internally may go topsy-turvy, the instruction set won't change, and the same programs can be run on each.

    This neatly solves the drag placed on development by the need for backwards-compatibility (Want to run DOS 3.3 on your Athlon? You can if you feel like it.) Just like Windows, x86 chips have accumulated baggage - the sediment of silicon long since passed into figurative dust.

    Transmeta has designed a beautiful thing - a chip that transcends backwards-compatibility. Writing to the bare metal on the Crusoe bolts it down, turns it into just another fixed-in-place bit-smashing engine. Kills it, in other words, removes what makes it an elegant hack.

    Don't do it. Please.

  • Hrm, perhaps because there are folks such as I, who not only do not regularly read Ars Technica, but also aren't whiny bastards such as you?
  • There's a thread on Usenet that claims Transmeta's *ORIGINAL* goal was not low power, but the best performance, but when they couldn't attain it, they "fell back" to a low power selling point.


    OK, Transmeta have proven that they are pretty damn good at keeping secrets, so I would take the info obtained from that Usenet thread with a decent sized grain of salt (as opposed to most other Usenet "wisdom" :). However, they may have been aiming for peak performance, and discovered the low power aspect by accident, then concentrated on that. I don't know, you don't know, and Transmeta aren't about to tell either of us, are they?



  • After hearing media reports that varied from referring to Linus as a "key executive" within Transmeta (he's not a corporate executive, which should be obvious to anyone who's viewed the web site or bothered to read the press package distributed at the launch), to describing Crusoe as "Internet-powered" and then asserting it draws its electrical power from the Internet itself, it's nice to see that someone with a clue actually sat down, read, analyzed, and reported on the technology that we introduced yesterday.
  • First thing is that you won't have many useful instructions to do what you need with the simple native instruction set that Crusoe provides. So you would need to be creative and optimize your code very well to get the speed you are looking for. Remember, your code optimizing abilities are competiting with a very advanced code morphing technology. Next, for the few clock cycles that you are going to get out of doing native is not worth the programming effort.

    Here is what needs to be done instead. Design an instruction set specific to the application that you are writing. Our current CPUs can handle very broad tasks and try to be good at everything and when it can't things like MMX, 3DNow and whatnot start to show up in the CPU.

    So, If you know the box you are setting up is going to be a web server, design an instruction set that a web server would fly on. If you play games, design an instruction may looks like 3DNow on steroids.

  • Would the translation for each instruction be cached, or is the sequence cached? The article implies that the sequence is cached since the CodeMorph software can optimize the speed on subsequent passes. However, this seems to limit the benefit gained from caching to relatively tight loops or common sequences of code depending on the cache size.

    I suspect that the translation units are based on so called 'basic blocks' which can most easily be described as anything in between a target label and a branch (i.e. entry and exit points in your code). This would allow optimisation of loop bodies.

    This can be extended by going to 'super blocks' (multiple basic blocks) allowing sofisticated things like loop unrolling, software pipelining etc.

    What I'm actually interested in is how the translation cache is being accessed. In a later post somebody states that the translation cache is maintained in main memory (therefore benefitting from the regular data cache). I'm not sure I understand how it is possible to do efficient cache lookups in this way. I assume they use hashing methods to map x86 memory pages to 'translation cache lines', but this has a much higher overhead then hardware based cache lookups.

    I am also been a bit suprised by people being worried about loosing the cached translations when powering of a system. People, we're talking here about loops that are being executed 100s if not 1000s of times. Having to do the translation again for the first few iterations is not going to be the big performance loss they seem to think it is!

  • 'whatshisface' would be David Patterson, who together with David Ditzel authored the 'The Case for Reduced Instruction Set Computing' article which started the whole RISC thingie.
  • I also wonder whether it can multitask between different instruction sets. I guess the task switching overhead would be pretty brutal if there isn't room onchip for multiple instruction sets.


    I would guess not, since there is only a single TLB, configured at boot time. Unless you wanted to flush it every time you changed instruction sets (!)
  • Still, this could be a big pain in the ass for people who aren't comfortable rooting around inside their computers.

    IMHO, people who aren't comfortable "rooting around inside their computers" probably won't be writing their own code morphers. This isn't script kiddie stuff...
  • It would seem to me to take some work on top of what was released to be able to attack the server CPU market.

    I"d love to see these happen in next five years:

    - Code Morpher for Alpha, PPC, ...
    - Code Morpher to recognize the instruction set of a binary
    - "optimization practically finalized for this piece of code" bit
    - a TM CPU bus for several chips to share the same translation cache
    (how necessary is this actually?)
    - communication interface for operating systems
    - ability to save final VLIW version of code beside the original binaries

    Those would in essence offer the ability to turn a system eventually to VLIW binaries without actually putting any effort to it.

    Once TM has covered its development investment:

    - Open Source the Code Morpher
    -> worldwide development of support for
    - any chips
    - integration with high-level compilers.

    "No stop signs! No speed limits!" - AC/DC: Hghway to Hell
  • Coding nativly would be SLOWER then using the morphing layer. You also don't get the benifit of the optimaztion.

    Yes, but a good compiler will generate fully optimal code to begin with. A compiler that targets the Transmeta core Instruction Set should give you better code than the two level translation scheme.

    But that's neither here nor there. Transmeta will not want people to code to the native Instruction Set because it will undermine their flexibility with the underlying hardware. Right now, the major benefit of the two level translation scheme is that the hardware architecture can be updated and improved while presenting the same programing model to application developers. This will allow Transmeta to aggresively experiment with the hardware architecture while maintaining software compatibility. This is very very cool!

  • If you read the FAQ, it explicitly states that the source will be released.
  • Only time will tell whether Transmeta's making us pay a penalty up front in the form of morphing so that they don't have to deal with backwards compatiblity in future will pan out for them from a business point of view. If all this thing does is run x86 code at lower power, they aren't going to have a market lead long. Two things are happening right now, guaranteed: Somebody is reverse engineering it The big boys are doing the same damn thing as fast as they can One of these two items will cut Transmeta's legs out from under them. Unless they get a killer app for the CPU and penetrate the market as quickly as possible, I'm not sure there's enough here to justify the effort they've gone to (read the VC dollars pumped in) I mean, if you'd just sunk $100 million into a company over 5 years and they came out with a slower x86 clone, what would you think? Oh, and I guess I have a question too? Seems like sometime in the past I was under some foolish impression that software was a lot more expensive to develop than hardware. I'm just wondering how this fits into this idea of pushing function that used to be in hardware up into software?
  • Evidently you haven't been reading much of the Crusoe propaganda. They don't want anyone to access the native instruction set so that they can change the chip core without having to worry about legacy apps. Imagine a chip that could go from pure CISC to RISC without having to change the apps. In this way the hardware implementation is decoupled from the instruction set interface.

    Pretty neat, but I haven't seen any real mention of emulating any architectures other than x86.

  • The Ars article also points out that some of the registers are used by the code-morphing software, too... you couldn't count on having 40 + 24.

    --
  • Certainly not the type of chip I want to be playing quake on... For now.

    The telling quote is at the end of the article though:

    I'd say that it's only a matter of time before we hear an announcement of another product line from Transmeta. It won't be named Crusoe, because it won't be aimed at the mobile and embedded markets. It'll be a workstation and server class x86 CPU that runs Linux like a fiend, and it'll compete directly with Intel's IA-64. I can't wait.


    It does make me wonder though, if such a chip (Slightly altered) would actually end up being superior for Quake. Given that the translating software is able to identify which parts of cache are used more often it becomes better at branch prediction, this could translate into faster gaming... I think... Contrary to this thought though is the fact that the Celeron is a good gaming processor with 128K cache... We shall see.

    Along similar lines, if the x86 instructions are software, how much of the x86 instruction set does Quake use? Would the flexible software end up speeding up Quake by getting the x86 instructions out of the way?
  • It will probably be out, but Linus does not have to release it. Still, you know he will.
  • apple doesn't own the PPC ISA any more than intel owns x86. afaik, the only thing apple has to do w/ ppc is that they use them. i don't think apple had very much to do with the ppc isa development.
  • You lot just aren't getting it. If you remove the code morphing layer, then you have to put backwards compatibility into the hardware down the road.

    Not really. Transmeta could then just write a code-morphing layer to "morph" the ISA you coded to into the new one. No?
  • No, no, please! That would be a disaster!! The hole point of this architecture is to get rid of this compability mess.

    Well, someone will still have to suffer the incompatibility mess: Those who write morphing sw for various cpu's. This will surely be more than just Transmeta, if the concept takes off.

    The x86 instruction set isn't necessarily the best for this chip. Someone could make up a different one (perhaps something that use 32 registers or so) make a compiler for it, and have better performance than x86 code on the same chip.

    This would have to be rewritten for another chip, but rewriting the instruction emulator is a lot less effort than recompiling the os and all apps. Still, someone must do it.
  • Transmeta could then just write a code-morphing layer to "morph" the ISA you coded to into the new one. No?
    Brilliant! The Meta Morphing Power Processors! Why stop there. Why not have transmeta write code morphing software that emulates their native instruction set and on top of that run code morphing software that emulates their native instruction set and on top of that run code morphing software that emulates their native instruction set and on top of that run...
    IT'S TURTLES ALL THE WAY DOWN!!!
    --Shoeboy
  • Late to the commentary, but I hope this adds something to it.

    way Way WAY back in micro-processor terms (1984-1985), I developed a white paper that attempted to extrapolate where PC's would develop by Y2K. (I'll put it up on my website if I can find which 5-1/4" floppy I saved it on, and re-hook a 5-1/4" drive to my PC).

    Hopefully it doesn't seem self-congratulatory (because a number of my other conclusions stunk) or redundant to this thread to mention that three or four of the paper's conclusions fit the idea of developing a Crusoe type "beowulf in a box" exactly:

    • High speed, low power CPU cores would be required ( 200 Mhz speed). Why? Because even if I had the ability to write programs that could keep all the Crusoe processors running at full tilt 100% of the time, I could conceivably power 50 Crusoe Processors or so on the same power supply that used to supply two Athlons (68 Watts),
    • The CPU units would perform on-chip instruction decoding so that chip and system architectures could be developed more flexibly,
    • Each CPU would have an abundant amount of cache memory in which to put commonly executed code units, and finally that no matter what,
    • for performance, massively parallel execution was more important than raw speed in terms of overall CPU speeds, etc.
    Now then, programming for massively parallel system is a b----, and I couldn't do a Beowulf cluster if I tried, but these chips and the StrongArm series are the first ones which met all of the specs in a fifteen year old paper.

    Just in time for Y2K. Interesting, eh?

  • Bravo /. that is the kind of stuff I wanted to read about the chip.

    While a lot of people are concentrating on how well this will work in small devices the author of this article is excited about the large-scale applications of the chip. I would have to agree. Think about a busy web server that is continuously generating web pages and doing database transactions. The code morphing software can spot that trend and be ready for it.

    Should be interesting.

    MBrod
  • If yes, where is the source?
  • The example of code optimization they gave was a DVD player. After the first frame, pretty much all the needed code optimizations where completed and stored for the movie.

    The time it takes to re learn the optimization is very very short when compared to power on/off cycles.

  • I'm not a hardware guru so pardon the speculation...

    Obviously, the code morphing is focused on x86 right now and, as the article suggests, may be adapted for PPC, Alpha, etc. in the future. Is it feasible that it could also be adapted for specialized processors such as graphics or sound?

    I'm imagining an SMP-type of Transmeta box that, when you load Quake, automagically loads code morphing software onto one of the processors to act as the graphics accelerator or, if you're watching a DVD, can act as an MPEG decoder card

    Is what I'm suggesting conceivable or am I way off base?

  • So, we know by now that Crusoe only requires around 1 watt of power to operate, and that this results in a maximum temperature of 48C (thus, a fan isn't required to cool the thing). But, if you don't really care about power consumption and you installed a fan over the heatsink, one has to wonder how much faster these things can be clocked before they start showing glitches. The only problem I can see is the LongRun software which will automatically reduce power consumption if it's not necessarily needed -- this might mean that the only way to overclock the chip would be to modify the LongRun code (stored in FlashROM).

    Any guesses as to how long until someone figures out how to patch the FlashROM so to allow overclocking? I give it about 6 months after Crusoe based systems hit the shelves.

    -NooM
  • Even if you think Assembler is a high level language, you probably do not want to code directly to the bare metal. It is not a nice native VLIW machine code. It is the target for the code morphing layer. It's been a long time since I've even looked at microcode (early low-end IBM 370s were microcoded) but it tends to be obscure, twisted, very unfriendly, and I cannot imagine that it's gotten any better with time. Minor mistakes do very bad things. Only one program is written, the program to read and execute the "higher-level" machine code.
  • To nit-pick,
    SUB CX,AX
    sets flags based on result in CX
    If things are case sensitive, Cx would be a valid label.
    Actually, both ADD and SUB set flags on x86.
  • Bravo /. that is the kind of stuff I wanted to read about the chip.

    Shouldn't that be bravo ars technica?

    There article on the K7 [arstechnica.com] was great, btw...

  • by Anonymous Coward
    There is alot of glitzy information now available about Crusoe VLIW, a core instruction set that is nothing like x86 and the code morphing software. But the actually technical nitty gritty seems to be lacking. Can a program get access to the core instruction set thus bypassing the code morphing? Is it possible to detect the Crusoe processor with x86 compatible instruction so that in critical performance sections of an application Crusoe specific/pre-morphed code can be run if the Crusoe is detect but the application still can execute standard x86 code if it isn't detected? Can a programmer provide their own code morph software thus turning Crusoe into a fast Z80 for example? Does Transmeta have plans to code morph other instruction sets like PPC? And does "Linux Mobile" contain any Crusoe specific instructions or does it depend complettely on the software code morph of x86?
  • PPC chips arent really aimed at the mobile market. I want to see Crusoe vs StrongARM.
  • Except that the code morphing software has one very important property: it optimizes the code dynamically. You can't do that by statically compiling to the VLIW layer.

    Now I suppose Transmeta could design a full O-O-O core, but I don't see the point. If the software does a good job, the additional flexibility they gain to change the underlying machine is worth it.

    As far as branches go, yes, you usually can guess a backward branch is going to be taken. But branches are still a huge problem. It's tough to keep a processor core fed. And don't even get me started on multiple branch prediction. The hit rate goes way down. A study was done here that showed processors today (or in the near future) spend about half the time recovering from branch mispredictions. That's a lot of wasted work. While the code morphing software can't do a perfect job, it is somewhat easier to tune the chip. And then think about per-application tuning. Load a different set of rules depending on the program you're running.

    Interesting, no? :)

    --

    • This architecture allows for some interesting optimizations not feasible in conventional CPUs.

      "Crusoe's Code Morphing software not only keeps track of which blocks of code execute most often and optimizes them accordingly, but it also keeps track of which branches are most often taken and annotates the code accordingly. That way, Crusoe's branch prediction algorithm knows how likely a branch is to be taken, and which branch it should speculatively execute down. If a branch isn't particularly likely to go one way or the other, then Crusoe can speculatively execute down both branches.

      Contrast this with speculative execution done on a normal CPU, where hardware limitations like buffer and table sizes limit the amount of information you can store about a particular branch and its execution history. Since Code Morphing keeps track of the branch histories in software, it can record a more finely grained description of the execution patterns of a wider window of code, and therefore assess more accurately whether or not a specific branch is likely to be taken."

    • High performance on the desktop is also interesting: "So you see, they made the Code Morphing software extremely modular. They can implement whatever parts of it they like in hardware to get whatever degree of performance gain they want. Crusoe should be viewed more as a proof of concept than as the ultimate outcome of 5 years of work. Crusoe represents one extreme of a spectrum that stretches from "implement the bare minimum in hardware" to "implement everything in hardware." Now that Transmeta has a technology that's proven to work in the most difficult case (where 2/3 of the transistor logic has been moved into software), they can go back in the other (easier) direction and start putting stuff in silicon.

      Furthermore, since there's a software layer between the ISA of the binary and the machine's native ISA, Transmeta is free to beef up the execution engine (or any other part of the core) however they like, because the only thing that will require a recompile is the Code Morphing software. A case in point is the two chips in its product line. Each has a slightly different core (the Windows chip has special instructions in it that help speed up Windows), but they both are fully x86 compatible. There's nothing to keep them from stuffing new functions and features (SIMD anyone?) into the silicon, to help scale the product has high up as they want to go with it.

      I'd say that it's only a matter of time before we hear an announcement of another product line from Transmeta. It won't be named Crusoe, because it won't be aimed at the mobile and embedded markets. It'll be a workstation and server class x86 CPU that runs Linux like a fiend, and it'll compete directly with Intel's IA-64. I can't wait."

    I, for one, am really excited about the possibilities.
    --
    My opinions may have changed, but not the fact that I am right.
  • Regardless, it will have to be turned off even if it is for battery changes.

    Not to be an ass, but my Palmpilot doesn't lose its data when I switch batteries. Nor do (new) VCRs lose their programming on power loss. Devices called super capacitors (5V 1F style things) keep enough energy around to keep very low power components up and running in a sleep mode to ride out such interruptions.
  • The article talks about the chip dynamically adjusting its clock speed to minimize power usage. While this is done in existing laptops, it only tends to be triggered by specific events (being connected/disconnected from mains power, battery level reaching certain thresholds etc). The article seems to imply Crusoe will adjust its speed dynamically at any time.

    I'm curious to know how OSes will handle this. For example, we've already had a thread on the linux-kernel list about timing loops being thrown off by this for existing laptops (because the bogomips on which they're based are calculated at boot time). What was the outcome of that thread? Was a solution reached? Will it apply for Crusoe too?

  • wouldn't they have to liscence the instruction set from apple?

    Only if Apple (who didn't invent the PowerPC instruction set; it's a derivative of the IBM POWER instruction set) have some form of intellectual-property rights for the instruction set.

    If there are any such rights owned by Somerset, Apple might also have some say in licensing it.

    and even if they did give TM the specs so TM could write a code morphing layer to run PPC apps

    "The specs", in the sense of the instruction set specifications for PowerPC, are publicly available, although if the chip+software is intended to look like a particular PowerPC chip (I think the MMUs may differ, e.g. may have software TLB reload on some processors and hardware TLB reload on others), they'd need that spec as well (I think the specs for various PowerPC chips are also publicly available).

    also ars said the northbridge and the SD-DRAM (that right?) moduals were all intergrated on die so wouldn't that mean you need to change all of that stuff if you wanted to run a different arcitecture. after all PPC doesn't run on x86 core logic sets

    If somebody wanted to clone not only some PowerPC CPU but a support chip set for it, so that they could run OSes such as MacOS unmodified, that stuff might have to be changed...

    ...but that's just cloning a Mac, which Apple isn't allowing even if you use existing PowerPC chips.

    Of course, there is the possibility that Apple would want to use a Transmeta chip in a Powerbook, say, in which case the Apple licensing issues go away.

    Apple are unlikely to be the ones to block such a Code Morphing(TM)(R)(LSMFT) layer; they don't, as far as I know, have a problem with people building non-Mac-compatible PowerPC machines, and they already have, as far as I know, the tools to block people from building Mac clones.

  • It seems that what Transmeta has done is to take the ideas developed for JIT compilers and apply them all to hardware.

    "Apply them all to hardware" in what sense? The binary-to-binary translators for Crusoe chips are software; they just happen to be running on hardware that offers some assistance, but the translation itself isn't done by hardware (and happens at a layer below even the lowest-level OS code; as far as the OS is concerned, all the way down to the lowest level, the chip looks like an x86).

  • (actually, I understand that AS/400 machine code is abstracted from the object code of programs, though probably not in quite the same way as how Transmeta did things

    Correct. Compilers for the AS/400 (and its System/38 predecessor) for the languages in which applications are written generate code for a virtual machine with a very CISCy instruction set; low-level OS code translates that to the native instruction set. (That long antedates Transmeta; as indicated, it dates back to the System/38, which I think came out in the late '70's; IBM needed no technology from Transmeta to do that - binary-to-binary translation is hardly a Transmeta invention.)

    It isn't done in exactly the same fashion, in that, on S/38's and AS/400's, the low-level OS code is written in languages that compile (or, for some code, assemble) into the native machine's instruction set, unlike Crusoe, where the only native code that's run is the translation software and the output of the translation software. Also, I don't think the translation on AS/400 is done as dynamically; I think programs are translated in their entirety the first time they're run, and the executable code for the entire program is kept around.

  • Yes, it's GPLed.

    Where is the source? Read the GPL -- they don't have to release the source until they distribute the code. Mobile Linux hasn't been released yet, so they can sit on the source for now. Linus has promised it will be available RSN.

    Steven E. Ehrbar
  • Comparisons to the PowerPC chips.

    After all, the Crusoe architecure is not a performance demon aimed at desktops/servers, and it is not aimed at the ultra-low power consuption StrongArm market. But might be suitable for the sorts of applications that embedded PPCs are currently used in...

    Steven E. Ehrbar
  • No, no, no, **YOU** STILL DON'T GET **IT**.

    As far as I can tell, the Crusoe processor engine itself is not special. If you are a "talented programmer programming to the bare metal", you might as well program in assembly on another pre-existing chip.

    And then as a chip manufacturer, you'll face 20 years trying to ensure your vintage instruction set that those bare metal hackers employed.

    You're missing the point.

    Take database servers. Oracle, MySQL, Informix, Sybase, Uncle Joes Ultimate Data Thingy... Just about all of them allow access to their data through a standard SQL language.

    But... But... but... Wouldn't it just be so insanely cool and fast if I could just direcly access the ISAM structures and indexes and modify disk sectors directly?!?! I fully expect every dedicated DBA and application designer to go to the bare iron to squeeze performance from their data warehouses!

    Has that happened? No. Why? Because MOST, EVERY DAY APPLICATION DESIGNERS DON'T "PROGRAM TO THE BARE METAL". It's too complex, intensive, and fruitless a task. Why is Slashdot written in Perl and not assembly? Why isn't Linux 100% x86 assembly?

    There is a BIG difference between just a cool hack and maintainable elegance.

    Why do we have high level languages? Why do we have abstraction layers? Why?

    The Code Morphing is an abstraction layer. Initially, that layer is the x86 instruction set, an arbitrary set of instructions that just happens to currently be widely used. Using Code Morphing, the Crusoe can leapfrog on that wide base of support, while throwing away the hardware architectural garbage traditionally needed to support it.

    Back to SQL: Oracle supports SQL for access to data, but beneath, I'll bet you that a lot of the specific operations upon data that those SQL statements fire off has changed ENORMOUSLY over the years. What would have happened had they allowed programmers straight past the abstraction layer? They still would be trying to support that API today, and I bet they wouldn't be as free to rework their server software.

    Furthermore, why do we have the DBI module and DBD modules in Perl? To provide a semi-universal abstraction layer across all databases. When one database's API changes for performance reasons, efficiency, whatever, you just change the morphing-- er DBD-- layer to accomodate it.

    What is the point of Crusoe then?

    Not to provide assembly hackers with a new opcode set to learn and tweak, which 90% of the application design world will never learn or exploit, and therefore will remain voodoo essentially.

    The point is to provide an architecture which supports ABSTRACTION LAYERS of assembly opcodes. So Transmeta is free to vary the underlying hardware in any exotic or esoteric form they see fit, throwing backwards compatibility of their VLIW opcodes to the wind because the Code Morphing allows the SAME ABSTRACTION LAYER API to be exposed to the application designer.

    Now, finally, note I keep saying 'application designer'. This is as opposed to 'dedicated hacker'.

    Read the definition of a hack [tf.hut.fi]. The first two definitions are not my idea of elegance. Something that's quick and does the job but not well. Or, something that is incredibly good, but took a long time.

    Now, read the definition of elegant [tf.hut.fi]. Something that combines simplicity, power, and grace. Something that is understandable, almost obvious in its expression. Something maintainable.

    Tell me what's more maintainable: Assembly code for the Mx-650938 processor, or Java code. It's a close call, but I'll have to go with the Java code. It's harder to write a hack in Java, than it is to create an elegant design in assembly.

    It's not about performance. We haven't even BEGUN to wring the performance from the chips we have-- and why? because it's not humanly possible for every applications designer to be a brilliant assembly hacker, which is why we have compilers!

    So, finally, why spend your time learning the latest opcode set when you can just focus on a higher level language and leave the hand tweaking and performance tweaks to the man behind the curtain of the Code Morphing abstraction layer of OZ?!?!?!


  • One of the things that Crusoe supposedly does is it caches frequently used code in its "compiled" form. This means that you only take a performance hit the first time you run it, and then it should run pretty much at full speed.

    If they give you access to the underlying architecture, then they are committed to keeping that architecture in future versions. This way they can make up a new ISA for every chip, and just tweak the code morphing layer to make it work.

    This gives them a performance hit now, but as Intel is forced to continue to support the x86 architecture in hardware for every new chip, they will have to make their chips ever bigger and ever hotter. Transmeta's approach will likely prove superior in the long run.
  • Because writing a new code morpher for this architecture would take R&D dollars that would be better spent emulating real like PPC or IA64 architectures with existing user bases. The small increase in performance you'd get from a "native" ISA would not justify the additional costs of writing and supporting the software for it.

    Also, it sounds like they are optimising each chip to specifically support code morphing from a specific architecture. That means that x86 *is* a reasonably efficient instruction set for this particular hardware. Yes, you could probably make a faster one, but the gains would be marginal unless you actually got direct access to the underlying ISA, which defeats the whole purpose of this strategy.
  • ...is how much faster this thing will run if it's not emulating an x86. It looks pretty hot under the hood, and if, instead of using standard guess-aheads, you can tell it which branch to use as default or even tell it about branches ahead of time (which you often know well before the actual conditional looping operation) so it's not guessing at all.

    It seems a lot of posters are thinking the same thing. But...

    You could say the same about a Celeron/P-III/Athlon/Whatever.

    "I wonder how much faster my Athlon would go if I could rip out the silicon that does the intruction decoding / reordering / branch prediction / etc and code directly for the execution units."

    It probably wouldn't go much faster (I'd guess that silicon does it's job pretty well) but by ripping out all those transistors you could significantly reduce power consumption.

    In fact, if you think it through for five years or so you'll probably wake up one day and find you've re-invented Crusoe. Of course it'll be old news by then.

  • like the Ars article, it was well written. I think the Crusoe is impressive because it does what RISC was originally concepted to do. Look at MIPS, it's a RISC architecture yet it has some of the most complex processing units you'll find. Things like Crusoe and MAJC really rattle the cages of other chip makers because they take an entirely different approach to the chip design. Even PPC is getting really complex, especially by adding the AltiVec unit onto the die, while it improves performance in come calculations it adds signifigantly to the price and complexity of the chip. The human brain can calculate some pretty complex things yet it's processing is done in a massive amount of simple processes rather than a small number of complex ones. I think the next generation of super computers will be built a little more like Crusoe chips, maybe even using Crusoes. The more times it works a calculation the faster it does it, this would add phenominal performance to alot of things we use super computers for right now. Maybe in the next ten years we'll see desktop teraflop systems.
  • The best way for games to run would be like nVidia is doing with the GeForce's GPU, the GPU handles the graphics and transforms and all the heavy duty FPU calculations while the system's CPU handles the actual code of the game. The instruction set I would guess would be best for gaming is true RISC, it gets the job done as simple and quick as possible. Games as well as any graphical pose a challenge to processors and programmers because you have two things going on, the data manipulation and control of the program and then the graphical manipulation of the graphics. Look at any CLI programs, they have a single job to do usually at a time and can work in order, Quake needs to do 30 things at once.
  • Ahem...
    IF YOU WANT TO CODE DIRECTLY TO A VLIW CORE BUY A &*$#ING MERCED!!!!!!
    Sorry about that. You lot just aren't getting it. If you remove the code morphing layer, then you have to put backwards compatibility into the hardware down the road. That means lots o' transistors and high power consumption 2 or 3 years down the road. That also means that compiler complexity goes up dramatically. So you'll wind up having a crippled architecture and low quality compilers 10 years down the road. That's stupid. Additionally, if the compiler is entirely responsible for the optimization, you lose the niftly on-the-fly code tuning based on actual runtime data -- this is the coolest thing about the Crusoe.
    --Shoeboy
  • Given their target of small mobile devices, like webpads and the like, its low power consumption and sleep mode, I don't think it's intended to be "turned off"

    Regardless, it will have to be turned off even if it is for battery changes. Also people may drain the battery after using it for a while, requiring a power down. These points aside, even low power devices today are turned off (i.e. laptops, palm pilots, etc.) since even standby mode drains too much power. The crusoe systems will probably drain more power than a palm pilot so you'll probably need to turn it off.

  • I also wonder whether it can multitask between different instruction sets. I guess the task switching overhead would be pretty brutal if here isn't room onchip for multiple instruction sets.

    My understanding from the articles I have read is that maybe, eventually, but right now it only emulates x86.
  • My whole point was that branch prediction can be replaced by expicit pre-branch notification.

    Branch prediction now is very stupid. Circuits try to guess, in real time, which branch will be taken. If the C compiler explained to the branch "predictor" that "this will loop 27 times, then stop looping".

    Furthermore, explicit cache requests could be compiled. "I'll stay in this function for a while, but I'm also going to call these functions."

    With profile-based optimizations and careful design you might never have a cache miss or a branch misprediction.

    I've gotta get me one of these, and play around with alternative opcode sets. This is just the coolest toy for exploring computer architecture.
  • One would not write in the native VLIW - one would create a new instruction set that hid the VLIW, but used its best features, and interacted with the hardware better - ie saved the optimizations and the branch predictions for the next time the program is run. One need not write in VLIW to get rid on the x86 instruction set. ( I wonder if one could design an instruction set to run one's favorite operating system ( linux - *bsd ... )
  • why not save the cache to permanent storage. The processor optimizes the code and then saves the optimized code to disk as a "shadow" executable. The next time the program is loaded the OS would indicate that it has already been optimized and pass the shadow to the processor which could bypass the translator. The translator could attach a signature to the shadow, and if it didn't agree it would reload the program and translate from scratch. In this way, you would get permantly optimized code for all your programs while retaining the flexibility of the current design.

    Of course, one problem with this would be getting support for shadow programs built into the OS. I wonder if Transmeta has anyone that could handle this?

  • The "code morphing" layer is what makes Crusoe stand apart from the rest. It optimizes on the fly the instuction set it's running on the fly. This means that your aps will run faster and faster as it runs. This layer is what gives the Crusoe it's speed.

    The only way "code morphing" could run faster than native code is by exploiting runtime information to perform optimizations that are not possible at compile time. In other words, self-modifying code that runs faster than static code.

    This is plausible, but that doesn't mean there would be no performance benefit in compiling native code. Research on self-modifying code is not unique to Crusoe---it's a very active area of research, and there are two major kinds: JIT and dynamic compilation. JIT, which you're probably all familiar with from Java, involves translationg code (typically from a foreign instruction set) and performing optimizations at runtime; dynamic compilation involves "staging" code at compile time to modify itself in a disciplined manner at runtime. JITs and dynamic compilation are very different in the nature of optimizations they perform; one of the major differences is that because dynamic compilation performs its analysis at compile-time, it can theoretically perform much deeper and more sophisticated optimizations.

    Crusoe does no staging (it can't: it executes fully precompiled code), so its optimizations operate under severe time constraints. Therefore, Crusoe's code morphing is likely to produce code optimality akin to that emitted by a JIT compilation system: shallower analysis, shallower optimizations. Which almost certainly makes Crusoe's "code morphing" worse than native staged dynamic compilation would be.

    In summary: my point is that self-modifying native code that improves its performance at runtime is entirely possible without "code morphing". On the other hand, binary x86 compatibility is arguably Crusoe's major selling point, so there's not much impetus for them to bother encouraging any kind of native code compilation. Anyway, I get the impression that Crusoe's entire architecture would have to be revamped if they wanted to run native code so it's a moot point.

    If you're thoroughly confused by now, try visiting the dynamic compilation project [washington.edu] at the University of Washington for more information on dynamic compilation.

    ~k.lee

    (BTW: this does not mean that Crusoe does not embody any technical innovations. In particular, the hardware support the chip provides for its runtime code translation is very interesting.)
  • Efficiency isn't exactly exciting. Unless I am using a Palm Pilot, I really don't care if my PentiumIII or Alpha is sucking 34W and my Nvidia GeForce is sucking another 30. What I care about is how fast my performance is. How many transactions can I run? How many frames per second am I getting? How many polygons can I push?

    You really, really still don't get it, do you? Firstly, Crusoe is the first chip Transmeta has got out the door. It's the simplest possible silicon, with the hard bits done in software. But there's no hard line between what functions can be done in hardware and what can be done in software. It's just that software is cheaper to tune.

    When Transmeta have got code-mophing tuned the way they like it there is nothing to stop them releasing a new chip with the code-morphing engine in hardware.

    But even if they don't, the limitation on performance computing design is cooling, as Cray amply showed. Crusoe consumes 1/32 the power of your PIII; so, for a given cooling system, you can stick 32 Crusoes in the same box. If each Crusoe gives you 66% of the compute power of the PIII, you've got a box which is going to deliver you more than 21 times the number of polygons your PIII can push.

    One thing I haven't yet seen quoted is the part-price for a Crusoe, but if the silicon is as simple as people are suggesting the part-price could be very low - small dies have relatively lower reject rates because if you have one flaw per square inch, every inch square chip has a flaw whereas only one in ten 0.3 inch chips does.

    By contrast your PIII is inherently an expensive part - it isn't expensive because Intel are profiteering, it's actually expensive to make. If Transmeta start shipping Crusoes at (say) around $10 per part in quantity, there isn't any way Intel can compete anywhere along the line.

    I currently run two PII/300s in my desktop box. I bought them because two 300MHz parts and a motherboard to accomodate them were, at the time I bought them, a lot cheaper than one 500MHz part. If I can get, say, 8 400MHz Crusoes for the price of one 700 MHz Intel part, I will be quite happy to run them, and so I expect will a lot of other people.

    Assuming, of course, that Linux 2.4 will run 8-way parallel on Crusoes, but I'm kind of prepared to bet it will :-)

  • I'm sorry, I don't get it. Maybe I'm just dense. Why do all this "morphing" and optimizing at runtime, instead of at compile time? Binary compatibility with existing processors is a nice feature, and I'm sure it will help Crusoe get a foothold in the market, but why can't we at least have the option of bypassing the emulation when native software becomes available? (Or does the Crusoe already allow this? The reports haven't been clear on that.)
  • I originally posted this in a previous crusoe article but no one commented on whether it's actually feasible or not. Any big brain VLIW gurus want to tell me if what I suspect might actually be true?

    The quake3 performance we saw on the ZDTV webcast was pretty damn impressive. Everyone seems to be assuming that they had 3d accelerators in those TM5400 laptops.

    You can run quake 3 in software mode under mesa at about 3 frames per second.

    But this is transmeta we're talking about and that was Dave Taylor, the SAME dave taylor that once leaked a document onto usenet ranting about
    the inferiority of hardware graphics accelerators and that what he really wanted was a generic parallel processing chip that could do arbitary transforms.

    GEE, a lot like the crusoe chip can do?

    (anyone got the link to that usenet posting on deja that dave taylor tried to cancel?)

    Isn't it feasible that they have put hooks into their code morphing software that optimises specially for 3d transforms and mesa/opengl?

    Especially in the linux version? Where they have all the source code to linux and mesa?

    Hmm, what fancy optimisations could those clever brains come up with?

    Maybe those transmeta laptops WON'T need 3d accelerator ships?

    And it would completely defeat the purpose of a low power laptop to put a big,hot,power sucking 3d chip in it. So I'm assuming that demo of quake3 they showed WAS running in software mode with some pretty fancy dynamic optimisations going on.

    Maybe the reason they didn't make a big deal about this is that it's still a "work in progress" as Linus said about mobile linux so they don't want to hype it yet.

    Someone prove me wrong?
  • The Transmeta webcast reminded me of something I read in U.S. News & World report a few weeks ago. It was in an article about IBM's Mark Dean.

    Quote:
    Early in the next century, Dean hopes his new concoction, which he says is "in the idea and invention stage," will be ready for the public: a sleek tablet that is magazine-size, inexpensive, programmable, and voice-activated. He expects his unnamed dream pad, which will run on a 24-hour battery, to provide everything a PC does, including streaming audio and video, word processing, and spreadsheets. It will even have a port for old fogies who can't give up their keyboards. And it will wirelessly put the Internet and other information at your fingertips.
    End Quote.

    Of course the article never mentions Transmeta, but I bet this web pad would be powered by Crusoe. Here's the link [usnews.com] for the article.

  • The current instruction sets of most processors are probably designed based on certain price:performance ratios taking the cost of producing them as hardware as a major consideration. Transmeta could come up with their own virtual instruction set that would be optimized for thier chips. It would be an easy move for the software developers since their old code could still run on the processor anyway until they recompile to the virtual instruction set. I didn't read the whole Ars article because it's past my bedtime (I'll read it tomorrow at work.) But the author made a comment about framerates "(yet)" -- I didn't see what he was eluding to by the "(yet)" but I got the impression he expects Transmeta compete beyond the mobile arena.

    Another thought I've had is that things just got harder for a company like Intel. It was no easy task for AMD to get big enough where they could afford to be competitive with Intel. But Crusoe-type processors sound like they would be much easier to design and produce...new companies will have a much lower barrier for entry into the competition. Lucky for Transmeta that they have their patents ;)

    numb
  • Imagine one of these things loaded with two or three different code morphing modules. Your boot loader begins by asking which architecture you'd like to emulate. Want to run your games? Boot up as a x86 with Windows. Doing graphics design (or running one of Ambrosia's [ambrosiasw.com] cool games, which they refuse to port to Wintel)? Boot as a PPC with MacOS. Doing some TT&C on your satellite constellation? Zap, you're an Alpha!

    OK, I'm just an applications geek, and know next to nothing about hardware, so this probably sounds pretty stupid. Live with it.

  • I _know_ what you're saying, I _read_ the Transmeta whitepaper & have a pretty good idea of the concepts behind the Code Morpher, I _know_ what how the Transmeta people _want_ the chip to be used, and how a lot of people think it _should_ be used - just as I _know_ that there are going to be some people who will ignore all that & will hack on the VLIW instruction set directly. 99.9% of the people programming for the Transmeta chips won't - but there will be a few that will.

    They won't give a damn about backward compatibility, or what the "next" chip is going to implement - they're not programming for money, they're programming for fun, and they'll program using the VLIW instruction set because they'll think they can do it better than the Code Morpher can (for a particular chip, and a particular set of instructions). When they start playing with a new chip, they'll learn the VLIW instruction set for THAT chip and do it all over again.

    BTW, regarding some of the replies:

    1. "Transmeta's chips transcend backwards compatibility."

    Bull.

    Transmeta have to create versions of the Code Morpher to be "backwards compatible" with all of the various instruction sets that they choose to support from the other chip companies, plus any "improvements" to the instruction set that those chip companies make. They will have to create a Code Morpher version to run on each new chip that they develop. (Can you say, front-end/back-end?)

    If they did a good job architecturally, and make it easy to upgrade the Code Morpher (assumedly in FlashROM or something similar), then given the current processor-types, it shouldn't be too difficult for them to create new front-ends and back-ends.

    As time goes on, like any project, the Code Morpher code base will get more complicated & difficult to maintain. They'll make mistakes encoding the instruction sets, and then have to issue updates to correct it, etc.

    2. "Code executed through the translation layer should perform better than code executing on the bare metal because the translation software is learning and optimizing."

    By definition, a "perfect programmer" will always be able to do AT LEAST AS WELL as an optimizing compiler (even at run-time!), because he or she can USE THE SAME TRICKS as the optimizing compiler (write code which collects metrics & recreates itself based on those metrics). And because the programmer has application knowledge which the compiler doesn't, he or she will mostly likely be able to DO BETTER.

    Like I said before: for the most part, programmers will use what Transmeta gives them - and for a very small fraction of programmers, in the tiny bits of their code where they want to squeeze out everything they can from the hardware, they're going to try to bang on the metal.

    Based on the strong reaction to my reply, I'd say that at least a few people have been programming for a living so long, they've forgotten how much fun it is to "push the envelope" of any given piece of hardware.
  • I'm sorry, YOU aren't getting it.

    No matter how good the Code Morpher is, a talented programmer programming "to the bare metal" will be able to do better. A geek screaming for performance on their "baby" doesn't give a damn about whether the next processor will change its instruction set - he (or she) is interested in getting the max. performance out of the CURRENT processor - which DOESN'T mean you let somebody else's software get in the way.

    As far as on-the-fly code tuning is concerned, no matter how good the "tuner" is, it can only react to changes & build code AFTER it has accumulated some metrics, whereas a programmer who is intimately familiar with his or her problem-space, can prebuild tuned code for handling most of their expected cases.

    I fully expect dedicated hackers to do what every programming freak does - use the provided tools most of the time, and where they want total control & performance, to write the VLIW directly (no matter WHAT the people who made the chip say).

    Frankly, ignoring all the hype, this is just a RISCier RISC chip - what the original RISC folks were aiming for in the first place, but which has fallen by the wayside as they tried to compete with Intel.
  • There are several reasons why Transmeta doesn't want people coding for the native instruction sets. First of all, coding for a native instruction set will just give us the same problem as we have with x86 now -- too many applications to change the architecture, so crappy architecture ends up hanging around way longer than it should. Second, they stated that the instruction sets for the two chips are incompatible, so obviously there is no single "Transmeta Instruction Set". Third, they like the code morphing because it allows them to make fixes that can be downloaded. If people are coding apps to run natively, this can't be done.

    But......
    I have been thinking about this too and I'm wondering if it would be possible/logical to define some VLIW Instruction Set that could be used on all Transmeta chips, but would be faster and more efficient than translating x86. The CMS would still be translating from the "Transmeta Instruction Set" to the chip's native instruction set, so they could keep all the benefits as before.

    Whadyall think?
  • In the article there's this paragraph: Now, let me just stop and say that a number of folks, in their effort to show that they've "seen it all before" and can't be taken in by the hype, have tried to compare Code Morphing to Alpha's FX!32 or to an emulation program like SoftWindows. Such comparisons are like comparing a MinuteMan missile to a bottle rocket. In this case, you should feel free to believe the hype; Code Morphing is cool. I'd have to say that code morphing has been around. One only needs to look at executor from ARDI. It dynamically recompiles 68k code into x86 code using an instruction generator. i think ardi has a whitepaper on this on their site. Besides that, there's not that much difference between FX!32 and code morphing from the software perspective except for the fact that Crusoe had more hardware support of fixups (via the shadow register file and the gated store buffer), FX!32 runs offline instead of dynamically, and the threshold for code generation is much higher (FX!32 translates based on profile info, Crusoe probably only translates when they're enough blocks to make the translatation overhead worthwhile.) In addition, there *has* been work doing dynamic recompilation. That's essentially what a JIT is. Or you can look at a paper in the 1998 ASPLOS proceedings. There's a paper there describing such a system (called Shogun, I think), unfortunately, the target arch didn't have all the crusoe's aforementioned hardware hooks, so the performance isn't quite as high. Even VMware has done this stuff before, well VMware started off as simOS, which did have a dynamic translation as well as interpreted mode. Its just that no one has integrated the translator and added the hardware hooks to make it as efficient.
  • JIT is actually very different. A JIT compiler doesn't have to deal with things like a memory map that cannot change, self-modifying code, stack stuffing, hardware interrupts, etc. But this idea is far from new. The macintosh emulator for PC, Executor, used similar techniques. It was based on a dynamic recompiling CPU emulation core that would translate and simplify a series of instructions, cache the resulting instructions, and then execute them. These technique is only efficient when large amounts of code can be executed without interruption and without the requirement of cycle-level precision of timing. For 98% of the time, x86 software doesn't care or need to care about very precise CPU timings (there's too many different types of x86 CPUs out there to make it useful).

    As an instruction set, the x86 is pretty bad, however it's easy to code for and easy to optimize for, which are it's biggests strengths. As a mid-layer API for this device, it was probably a good choice -- x86 recompiles well on RISCy machines with lots of registers. PowerPC and others probably wouldn't. They have a lot of registers themselves and are more complex (plus the wide variety of x86 clones means that most people will likely shy away from dangerous instructions, whereas since the PPC is VERY standardized, many software packages could rely on subtle bugs in the silicon of the PPC. Believe me, bugs are the hardest part of any hardware archetecture to emulate.)

    ARDI has some fairly interesting whitepapers [ardi.com] on their implementation of the 68k instruction set on x86. Keep in mind this is MUCH more difficult to do than the reverse. The 68k CPU has 16 registers, whereas x86 only has 8, etc.

    The big problem with C is the same C source file compiled with the same compiler can often produce many wildly different results, and C doesn't solve the problem of hardware accesses, which almost always need to be done in a low-level language. This CPU/software will be beneficial to many companies due to the fact they will be able to reuse existing hardware, drivers and software with this. As long as the prices for the CPU get really low eventually, it could really lower the prices for PDA and hand-held computers. (Imagine playing a 3D accelerated game of Quake III on a hand-held machine)

  • by Anonymous Coward on Thursday January 20, 2000 @02:45PM (#1353562)
    can we run a Beowulf cluster with it? =-)

    Seriously though. The biggest problems with Beowulfs is space and heat, and imagine low-heat low-space processors wedged in there. Makes me horny.

    From the mind of the most famous poster in all of slashdot
  • by tzanger ( 1575 ) on Thursday January 20, 2000 @05:49PM (#1353563) Homepage
    ...is how much faster this thing will run if it's not emulating an x86.

    That is missing the point, IMHO. One of the reasons the chip kicks ass is because they can change the hardware and you can't tell. Write native VLIW on this pig and you're fucked if they change, just like all the other processors.

    ... this is coming from a guy who prefers assembly to high-level languages in 98% of cases. I think they really struck on something here, don't fuck it up by asking to write in the "native tongue" of this beast. Well, unless you're writing your own processor. :-)
  • by Accipiter ( 8228 ) on Thursday January 20, 2000 @03:22PM (#1353564)
    I just came up with a thought...

    Okay. The Crusoe is fully x86 compatible. Great. But how about developing applications for this processor that skip the translation step, and are already written in the processor's native language? Think about a Distributed.net client written SPECIFICALLY for this processor, with no x86 instructions.....

    I'm betting that would speed up apps tremendously. Even Linux....ported directly to Crusoe's native instruction set. The problem I see is, the processor is designed to run x86 out of the box. Code would have to be written to change the Flash ROMs on the processor to bypass translation and hit the core directly, or at least do a straight-through delivery. (Why translate VLIW to VLIW?)

    (IF YOU DO THIS AND FRY YOUR CRUSOE, I'M NOT LIABLE.)

    -- Give him Head? Be a Beacon?

  • by Shoeboy ( 16224 ) on Thursday January 20, 2000 @03:15PM (#1353565) Homepage
    Who cares about Transmeta Beowulf's. With the low transistor count and low temp, this chip could do the same SMP-on-a-chip thing that IBM is planning for the PPC. The only reason to have beowulf at all is that it's more economical than SMP sytems, it's not a better solution than massive SMP IF massive SMP can be made cheaply. Of course, some organizations will have a need for beowulf clusters of massively SMP systems...
    ...damn it, now I'm horny.
    --Shoeboy
  • by scheme ( 19778 ) on Thursday January 20, 2000 @03:30PM (#1353566)

    I have some concerns about the performance that the Cruose processors will actually have. The article mentions that translated instructions will be cached and then be reused if the CodeMorph software sees it again. However, it seems like the CodeMorph's state information will not be mantained between runs. If you power off the computer, the software loses the cached information and has to start from scratch again. In addition, the cache's size or location isn't given. Is it a small cache on die or is it located in system memory? The cache is probably on die for speed reasons but this would limit the size of the cache. This could be a performance hit since the cache is also used as a data cache and instruction cache.

    Another question concerns the way the instructions are being cached. For example suppose the following instructions were given

    ADD AX, BX
    SUB CX, AX
    JNZ Cx

    Would the translation for each instruction be cached, or is the sequence cached? The article implies that the sequence is cached since the CodeMorph software can optimize the speed on subsequent passes. However, this seems to limit the benefit gained from caching to relatively tight loops or common sequences of code depending on the cache size.

    On a side note, the article implies that the CodeMorph software lightyears beyond anything else. However, some of its highly touted features appeared in other software before. For example, DEC's FX!32 would initially just translate code but would also observe the application behaviour and then optimize the code based on that after the application finished executing. It could do this optimization several times, optimizing more aggressively on each pass. Also Apple's 680x0 emulator was also based in rom that would start up initially so what the MacOS could boot. The CodeMorph software has some new features if it really does OO scheduling and optimization on the fly but that seems like a pretty big hit on performance.

    If future server/desktop oriented processors implement large parts of the CodeMorph software in hardware, how will that be any different than AMD or Intel's processors since they'll all be implementing a hardware instruction translation unit besides the Transmeta core being VLIW. Plus the transistor count and power consumption will also sky rocket along with that.

  • by jfunk ( 33224 ) <jfunk@roadrunner.nf.net> on Thursday January 20, 2000 @06:27PM (#1353567) Homepage
    Most, if not all, semiconductor manufacturers are really cool about this. The companies that were the coolest to me were: Analog Devices, Microchip Technologies, Maxim, National Semiconductor, TI, and Motorola among others.

    All of those companies gave me precious device documentation and many of them gave samples as well. I used all of this in school and later in professional life ("we need a good low-power instrumentation amp." "I got a really cool one from AD which has great documentation, let's try it out and we can use them in volume (millions) later" "ok"). Semiconductor companies know the benefits of such behaviour and tend to act accordingly.

    Embedded technologies are a very lucrative market that a lot of young people are jumping directly into (myself included). To deny the flow of information on your products would be like tying your own knot. I'm pretty sure Transmeta realises this.

    Ask and ye shall receive.
  • by SMN ( 33356 ) on Thursday January 20, 2000 @03:44PM (#1353568)
    Transmeta does NOT want us programming directly in Crusoe VLIW-native code. In fact, the opcodes will NOT be the same on the 3400/5400 chips, and will probably change for all future chips (each model/variation would need its own code morphing software).

    The primary reason is that they don't want to have to make these chips backwards compatible. Intel has a lot of problems with this - even the newest Pentium III's must support programs written for 386s. Intel has a hard time because it can't change these opcodes, but instead has to add new ones - hence MMX, SIMD instructions, the Katmai extensions (the P3 stuff), etc (and similarly, AMD has added 3dnow! et al).

    Transmeta wants the freedom to be able to drastically change newer models of the CPU to keep it running at optimal speed/efficiency. If they wanted to allow us to write Crusoe-native code, then they'd need morphing software that allows newer models to morph old code to its own (modified) native code. In other words, a real pain in the rear and definately a problem if Crusoe can't run different "morphers" simultaneously (which I suspect it can't).

    As for other morphing software to emulate other processors: I wouldn't be surprised if they allowed it to emulate some other chips - like the PPC, so it can run MacOS stuff - but it won't run nearly as well as x86 emulation will. The chip is meant to be able to morph code from many different platforms, but there are a lot of shortcuts to emphasize x86. I think that topic is addressed in the Ars Technica stuff, but basically Crusoe uses a FPU very similar to the x86 one. I think there are some other things for that in hardware, as well as the fact that we know they're dedicating most of their time to creating the x86 morphing software so it will be the most optimized.

    I highly doubt that we'll be able to write our own morphers. I think that it's an extremely difficult thing to do, it would require knowledge of the Crusoe instruction set (which, as I said above, they don't want to release), and the morphing software is probably authenticated somehow. Since the morphing code is running in Flash ROM, it can be upgraded, but if someone tried to load a morpher that doesn't work they're gonna have trouble reverting back to x86.

    Linus said that "Mobile Linux" is NOT a code fork - it's just the x86 version with a few modifications to make it run better on embedded platforms. Why reinvent the wheel?

    Keep in mind that this is all SPECULATION - if anyone here has other information to the contrary, I'd like to hear it =)
  • by TheDullBlade ( 28998 ) on Thursday January 20, 2000 @03:10PM (#1353569)
    ...is how much faster this thing will run if it's not emulating an x86. It looks pretty hot under the hood, and if, instead of using standard guess-aheads, you can tell it which branch to use as default or even tell it about branches ahead of time (which you often know well before the actual conditional looping operation) so it's not guessing at all.

    There's of all kinds of fun I could have with this chip...

    I also wonder whether it can multitask between different instruction sets. I guess the task switching overhead would be pretty brutal if there isn't room onchip for multiple instruction sets.
  • by rcromwell2 ( 73488 ) on Thursday January 20, 2000 @07:22PM (#1353570)
    They have essentially built a Japanese Compact Car that is fuel efficient, and not an Italian sports car.

    Efficiency isn't exactly exciting. Unless I am using a Palm Pilot, I really don't care if my PentiumIII or Alpha is sucking 34W and my Nvidia GeForce is sucking another 30. What I care about is how fast my performance is. How many transactions can I run? How many frames per second am I getting? How many polygons can I push?

    Crusoe may be important for the coming ubiquitous computing revolution (if it ever happens), but they are not the first to go after low power (remember Rise? Remember WinChip IDT? Don't forget Strong ARM)

    I think Crusoe is a nice chip, but the *HYPE* (and I mean hype) caused by deliberate secrecy and press leaks thoroughly destroyed any chance of it being seen as revolutionary in my eyes.


    The Code Morphing technology is not revolutionary. Emulators have been doing dynamic instruction set recompilation for years now, DEC did it with FX32, Sun does it with Java JIT's (including HotSpot which does recompilation based on runtime profiles), SmallTalk VM's have been doing it, hell, even one of the Commodore 64 emulators does it if I recall. John Carmack's Quake3 engine even does it. I'm sure there are hundreds of projects in Academia that have been doing it. The only relevent difference is the hardware assist that the Crusoe has.

    Chances are, when you hype something too much, it's going to be disappointing. There's a thread on Usenet that claims Transmeta's *ORIGINAL* goal was not low power, but the best performance, but when they couldn't attain it, they "fell back" to a low power selling point. I think it's in comp.arch.







  • by HomerJ ( 11142 ) on Thursday January 20, 2000 @03:39PM (#1353571)
    That's the whole point of Crusoe, you DON'T code for it directly. It takes other instuctions, starting with x86, and runs them faster, better, and optimizes on the fly.

    The "code morphing" layer is what makes Crusoe stand apart from the rest. It optimizes on the fly the instuction set it's running on the fly. This means that your aps will run faster and faster as it runs. This layer is what gives the Crusoe it's speed. Coding nativly would be SLOWER then using the morphing layer. You also don't get the benifit of the optimaztion.

    Also, the instruction sets are different for each chip. Each set is further optimized for what it's use is going to be. So if you code for one Crusoe chip natively , it doesn't run on the other. This lets Transmeta change the instruction set as needed to. Like if it's faster to do something one way, they can change it and not break compatability with anything. And they can give you the update with a software patch.

    So, it doesn't matter if people don't have the instruction set for the native Crusoe processors. They will change alot, and everytime they change you would have to recode every program again. Why bother? Also you don't get to use what the Crusoe processor is all about, it's code morphing layer.

    So, PLEASE, stop complaining that you can't code natively for this chip. The code won't go any faster, and as soon as Transmeta changes the set, your programs wouldn't run anyways. So it's a moot point to code navitly for it.

Force needed to accelerate 2.2lbs of cookies = 1 Fig-newton to 1 meter per second

Working...