NVIDIAs 64-bit Tegra K1: The Ghost of Transmeta Rides Again, Out of Order 125

Posted by Unknown Lamer on Tuesday August 12, 2014 @04:47AM from the order-out-of dept.

MojoKid (1002251) writes Ever since Nvidia unveiled its 64-bit Project Denver CPU at CES last year, there's been discussion over what the core might be and what kind of performance it would offer. Visibly, the chip is huge, more than 2x the size of the Cortex-A15 that powers the 32-bit version of Tegra K1. Now we know a bit more about the core, and it's like nothing you'd expect. It is, however, somewhat similar to the designs we've seen in the past from the vanished CPU manufacturer Transmeta. When it designed Project Denver, Nvidia chose to step away from the out-of-order execution engine that typifies virtually all high-end ARM and x86 processors. In an OoOE design, the CPU itself is responsible for deciding which code should be executed at any given cycle. OoOE chips tend to be much faster than their in-order counterparts, but the additional silicon burns power and takes up die area. What Nvidia has developed is an in-order architecture that relies on a dynamic optimization program (running on one of the two CPUs) to calculate and optimize the most efficient way to execute code. This data is then stored inside a special 128MB buffer of main memory. The advantage of decoding and storing the most optimized execution method is that the chip doesn't have to decode the data again; it can simply grab that information from memory. Furthermore, this kind of approach may pay dividends on tablets, where users tend to use a small subset of applications. Once Denver sees you run Facebook or Candy Crush a few times, it's got the code optimized and waiting. There's no need to keep decoding it for execution over and over.

NVIDIAs 64-bit Tegra K1: The Ghost of Transmeta Rides Again, Out of Order

This discussion has been archived. No new comments can be posted.

Search 125 Comments Log In/Create an Account

Comments Filter:

Re:Sounds smart, but is it? (Score:4, Interesting)

by taniwha ( 70410 ) writes: on Tuesday August 12, 2014 @07:21AM (#47653699) Homepage Journal

it's certainly different but not revolutionary, I worked on a core that did this 15 years ago (not transmeta) it's a hard problem we didn't make it to market, transmeta floundered - what I think they're doing here is the instruction rescheduling in software, something that's usually done by lengthening the pipe in an OoO machine - it means they can do tighter/faster branches and they can pack instructions in memory aligned appropriately to feed the various functional units more easily - My guess from reading this article is is that it probably has an LIW mode where they turn off the interlocks when running scheduled code.
Of course all this could be done by a good compiler scheduler (actually could be done better with a compiler that knows how many of each functional unit type are present during the code generation phase) the resulting code would likely suck on other CPUs but would still be portable.
Then again if they're aiming at the Android market maybe what;s going on is that they've hacked their own JVM and it's doing JIT on the metal

Re:Is it better? (Score:4, Interesting)

by Predius ( 560344 ) writes: <josh.coombs@HORS ... minus herbivore> on Tuesday August 12, 2014 @08:33AM (#47653979)

This is an area where post compile optimization can shine. By watching actual execution with live data, the post compiler optimizer can build branch choice stats to tune against based on actual operation rather than static analysis at compile time. HP's dynamo project IIRC was based around this idea, it'd recompile binaries for the same architecture it ran on after observing them running a few times. I believe the claims were an average 10% improvement in perf over just compiler optimized binaries.

Re:Static scheduling always performs poorly (Score:4, Interesting)

by Rockoon ( 1252108 ) writes: on Tuesday August 12, 2014 @10:56AM (#47654933)

So it seems like a good idea to let the compiler do the work once and save on hardware. Except for one major monkey wrench: Memory load instructions
Thats not the only monkey wrench. Compilers simply arent good enough in general, and there is little evidence that they could be made to be good enough on a consistent basis because architectures keep evolving and very few compilers actually model specific architecture pipelines...

This is why Intel now designs their architectures to execute what compilers produce well, rather than the other way around. Intel would not have 5 asymmetric execution units with lots of functionality overlap in its latest CPU's if compilers didnt frequently produce code that requires it...

Which leads to compiler writers spending the majority of their effort on big picture optimizations because Intel/etc are dealing with the whole low level scheduling issues for them... the circle is complete.. its self-sustaining.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

NVIDIAs 64-bit Tegra K1: The Ghost of Transmeta Rides Again, Out of Order 125

NVIDIAs 64-bit Tegra K1: The Ghost of Transmeta Rides Again, Out of Order More Login

NVIDIAs 64-bit Tegra K1: The Ghost of Transmeta Rides Again, Out of Order

Re:Sounds smart, but is it? (Score:4, Interesting)

Re:Is it better? (Score:4, Interesting)

Re:Static scheduling always performs poorly (Score:4, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot