NVIDIA Jetson TX1 Performance Shines For GPU Computing

NVIDIA Jetson TX1 Performance Shines For GPU Computing (phoronix.com) 22

Posted by samzenpus on Monday November 16, 2015 @03:30PM from the running-the-numbers dept.

An anonymous reader writes: Following last week's announcement of the Jetson TX1 development board, NVIDIA is now allowing independent reports of performance for their $599 USD 64-bit ARM development board. Linux results published by Phoronix show very strong performance for the Jetson TX1 when looking at the Cortex-A57 speed relative to the Tegra K1 and older Tegra SoCs along with other ARM hardware like Calxeda and Raspberry Pi. The Jetson TX1 was generally multiple times faster than ARM hardware a few years old. The graphics performance was twice as fast as the year-old Jetson TK1 thanks to the Maxwell GPU. Compared to x86 hardware, in CPU-bound tasks the performance is comparable to an AMD Sempron/Phenom except when utilizing GPGPU computing where it's then faster than Intel Skylake and Xeon processors. The Jetson TX1 had a peak power consumption of 16 Watts and an average power use of under 10 Watts.

NVIDIA Jetson TX1 Performance Shines For GPU Computing

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 22 Comments Log In/Create an Account

Comments Filter:

Tegra X1 used to be the fastest ARM/GPU SoC (Score:1, Troll)

by edxwelch ( 600979 ) writes:

Tegra X1 used to be the fastest ARM/GPU SoC - but now the A9X in the new iPad Pro leaves it in the dust.
Meanwhile, the Cortex-A57 is probably one of ARMs worse cores to date. It really needs to be implemented on FinFETs to avoid overheating / throttling and performs poorly compared to custom ARM cores.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Tegra X1 used to be the fastest ARM/GPU SoC - but now the A9X in the new iPad Pro leaves it in the dust.
  Meanwhile, the Cortex-A57 is probably one of ARMs worse cores to date. It really needs to be implemented on FinFETs to avoid overheating / throttling and performs poorly compared to custom ARM cores.
  Yup, the A9X is faster than a chip several years old. Such breathtaking engineering prowess!
  - Re: (Score:1, Troll)
    
    by edxwelch ( 600979 ) writes:
    
    What are you talking about? Tegra X1 came out a few month ago
Heterogeneous Memory FTW (Score:1, Interesting)

by kaiser423 ( 828989 ) writes:

I'm pretty pumped about playing with the dev kit. It has a heterogeneous memory architecture between the CPU and GPU. For lots of GPGPU applications, the latency of transfer between system RAM and the GPU can be a bottleneck. You're transferring huge chunks of data, and if you need to bounce the problem back and forth between the CPU and GPU, which is pretty common or if you have any real-time requirements, it can be a big deal. In many applications it can be 40%+ of time spent in just transferring your
- Re: (Score:2)
  
  by Andy Dodd ( 701 ) writes:
  
  It's too bad that despite the chip not being much more expensive (as evidenced by the fact that TX1 consumer products are reasonably priced - the TX1-based Shield ATV is $199 including a controller, the TK1-based Shield Tablet was $299, and $100+ for battery/display makes a lot of sense) or around the same price, the TX1 dev board is 3 times the price of the TK1 dev board. (Jetson TK1 was $192).
  I was really hoping for a successor to the Jetson TK1 that used the X1, but this isn't really a successor - despi
- Re: (Score:2)
  
  by Arkh89 ( 2870391 ) writes:
  
  One of the main problem of this on the desktop is the bandwidth. Having true heterogeneous memory arch between CPU and GPU but with slower than current in-device bandwidth (around 250GB/s) would be a waste for a vastly larger applications pool than for those which would benefit from it.
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  meanwhile, this is already been done for a few years with AMD APUs...
  - Re: (Score:2)
    
    by Shinobi ( 19308 ) writes:
    
    And even AMD were well over a decade late to the party, compared to the Silicon Graphics O2, which used UMA.
Peculiar omission (Score:5, Interesting)

by serviscope_minor ( 664417 ) writes: on Monday November 16, 2015 @03:55PM (#50942457) Journal

I'm rather surprised there was no AMD APU in the GPGPU comparisons. That is, after all, rather the whole point of the APUs. And due to the super low latency of the AMD ones, they tended to do rather well compared to other chips. The HSA stuff seems to go rather beyond the GPGPU stuff in terms of its range of applications.

- - Re: (Score:2)
    
    by UnknownSoldier ( 67820 ) writes:
    
    Whoa...
    You wouldn't happen to have a link for that by chance please?
    - Re: (Score:2)
      
      by GiganticLyingMouth ( 1691940 ) writes:
      
      Here you are... https://www.phoronix.com/scan.... [phoronix.com]
      - Re: (Score:2)
        
        by UnknownSoldier ( 67820 ) writes:
        
        Sweet. Thanks!
- Archtectural Similarities (Score:2)
  
  by gentryx ( 759438 ) writes:
  
  The USP of AMD's APUs used to be having the GPU and the CPU on the same die. This is true for Jetson as well, but it is compatible with the whole CUDA universe, too. So now NVIDIA is eating AMD's lunch.
  - Re:Archtectural Similarities (Score:5, Informative)
    
    by serviscope_minor ( 664417 ) writes: on Monday November 16, 2015 @05:21PM (#50943027) Journal
    
    The USP of AMD's APUs used to be having the GPU and the CPU on the same die.
    No, it's much more than that. It's not just on the same die, it;s the same side of the MMU as the CPU and the same side of the cache. This means you can pass data back and forth between the two units with a latency measured in nanoseconds, because you can simply hand over a pointer in the same memory space. I believe HSA also specifies things like atomics which are consistent across the CPU and GPU, as well as synchronisation primitives.
    In other words HSA is much more than just bolting a CPU and a GPU onto the same bus on a die.
    
    - - Re: (Score:2)
        
        by serviscope_minor ( 664417 ) writes:
        
        Some HSA features have been available since 2012, but coherent memory has only been available since 2014 with Kaveri
        Yeah Kevari is where it got interesting, and where the LibreOffice benchmark with AMD trouncing everything else happened. Coherent memory turned it from a massive PITA into something much easier to program than normal GPGPU stuff. Lessons I learned from the supercomputer folks: latency is a killer.
        None of the chips with HSA features use less that 12W.
        That's comparable to the one in TFA: aver
Error on Page 7 ? (Score:2)

by UnknownSoldier ( 67820 ) writes:

Looks like all the graphs on Page 7 has an incorrect TK1 label instead of the correct TX1 ?
http://www.phoronix.com/scan.p... [phoronix.com]
With nVidia getting serious about low power devices the next few years are going to be very interesting as AMD, Arm, Broadcom, and Intel all duke it out.
I can't wait till OpenCL is supported as well.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

NVIDIA Jetson TX1 Performance Shines For GPU Computing (phoronix.com) 22

NVIDIA Jetson TX1 Performance Shines For GPU Computing More Login

NVIDIA Jetson TX1 Performance Shines For GPU Computing

Tegra X1 used to be the fastest ARM/GPU SoC (Score:1, Troll)

Re: (Score:1)

Re: (Score:1, Troll)

Heterogeneous Memory FTW (Score:1, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Peculiar omission (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Archtectural Similarities (Score:2)

Re:Archtectural Similarities (Score:5, Informative)

Re: (Score:2)

Error on Page 7 ? (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot