Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Transmeta Hardware

NVIDIAs 64-bit Tegra K1: The Ghost of Transmeta Rides Again, Out of Order 125

MojoKid (1002251) writes Ever since Nvidia unveiled its 64-bit Project Denver CPU at CES last year, there's been discussion over what the core might be and what kind of performance it would offer. Visibly, the chip is huge, more than 2x the size of the Cortex-A15 that powers the 32-bit version of Tegra K1. Now we know a bit more about the core, and it's like nothing you'd expect. It is, however, somewhat similar to the designs we've seen in the past from the vanished CPU manufacturer Transmeta. When it designed Project Denver, Nvidia chose to step away from the out-of-order execution engine that typifies virtually all high-end ARM and x86 processors. In an OoOE design, the CPU itself is responsible for deciding which code should be executed at any given cycle. OoOE chips tend to be much faster than their in-order counterparts, but the additional silicon burns power and takes up die area. What Nvidia has developed is an in-order architecture that relies on a dynamic optimization program (running on one of the two CPUs) to calculate and optimize the most efficient way to execute code. This data is then stored inside a special 128MB buffer of main memory. The advantage of decoding and storing the most optimized execution method is that the chip doesn't have to decode the data again; it can simply grab that information from memory. Furthermore, this kind of approach may pay dividends on tablets, where users tend to use a small subset of applications. Once Denver sees you run Facebook or Candy Crush a few times, it's got the code optimized and waiting. There's no need to keep decoding it for execution over and over.
This discussion has been archived. No new comments can be posted.

NVIDIAs 64-bit Tegra K1: The Ghost of Transmeta Rides Again, Out of Order

Comments Filter:
  • by taniwha ( 70410 ) on Tuesday August 12, 2014 @07:21AM (#47653699) Homepage Journal

    it's certainly different but not revolutionary, I worked on a core that did this 15 years ago (not transmeta) it's a hard problem we didn't make it to market, transmeta floundered - what I think they're doing here is the instruction rescheduling in software, something that's usually done by lengthening the pipe in an OoO machine - it means they can do tighter/faster branches and they can pack instructions in memory aligned appropriately to feed the various functional units more easily - My guess from reading this article is is that it probably has an LIW mode where they turn off the interlocks when running scheduled code.

    Of course all this could be done by a good compiler scheduler (actually could be done better with a compiler that knows how many of each functional unit type are present during the code generation phase) the resulting code would likely suck on other CPUs but would still be portable.

    Then again if they're aiming at the Android market maybe what;s going on is that they've hacked their own JVM and it's doing JIT on the metal

  • Re:Is it better? (Score:4, Interesting)

    by Predius ( 560344 ) <josh DOT coombs AT gmail DOT com> on Tuesday August 12, 2014 @08:33AM (#47653979)

    This is an area where post compile optimization can shine. By watching actual execution with live data, the post compiler optimizer can build branch choice stats to tune against based on actual operation rather than static analysis at compile time. HP's dynamo project IIRC was based around this idea, it'd recompile binaries for the same architecture it ran on after observing them running a few times. I believe the claims were an average 10% improvement in perf over just compiler optimized binaries.

  • by Rockoon ( 1252108 ) on Tuesday August 12, 2014 @10:56AM (#47654933)

    So it seems like a good idea to let the compiler do the work once and save on hardware. Except for one major monkey wrench: Memory load instructions

    Thats not the only monkey wrench. Compilers simply arent good enough in general, and there is little evidence that they could be made to be good enough on a consistent basis because architectures keep evolving and very few compilers actually model specific architecture pipelines...

    This is why Intel now designs their architectures to execute what compilers produce well, rather than the other way around. Intel would not have 5 asymmetric execution units with lots of functionality overlap in its latest CPU's if compilers didnt frequently produce code that requires it...

    Which leads to compiler writers spending the majority of their effort on big picture optimizations because Intel/etc are dealing with the whole low level scheduling issues for them... the circle is complete.. its self-sustaining.

"Little else matters than to write good code." -- Karl Lehenbauer

Working...