NVIDIA Announces Tesla K40 GPU Accelerator and IBM Partnership In Supercomputing 59
MojoKid writes "The supercomputing conference SC13 kicks off this week and Nvidia is kicking off their own event with the launch of a new GPU and a strategic partnership with IBM. Just as the GTX 780 Ti was the full consumer implementation of the GK110 GPU, the new K40 Tesla card is the supercomputing / HPC variant of the same core architecture. The K40 picks up additional clock headroom and implements the same variable clock speed threshold that has characterized Nvidia's consumer cards for the past year, for a significant overall boost in performance. The other major shift between Nvidia's previous gen K20X and the new K40 is the amount of on-board RAM. K40 packs a full 12GB and clocks it modestly higher to boot. That's important because datasets are typically limited to on-board GPU memory (at least, if you want to work with any kind of speed). Finally, IBM and Nvidia announced a partnership to combine Tesla GPUs and Power CPUs for OpenPOWER solutions. The goal is to push the new Tesla cards as workload accelerators for specific datacenter tasks. According to Nvidia's release, Tesla GPUs will ship alongside Power8 CPUs, which are currently scheduled for a mid-2014 release date. IBM's venerable architecture is expected to target a 4GHz clock speed and offer up to 12 cores with 96MB of shared L3 cache. A 12-core implementation would be capable of handling up to 96 simultaneous threads. The two should make for a potent combination."
What about OpenCL 1.2 support? (Score:1)
Nvidia has sidetracked OpenCL for CUDA?
Re: (Score:1)
All the major players are putting aside OpenCL. AMD is betting on Mantle for example.
Re: (Score:2)
But Mantle is an alternative to OpenGL.
Re:What about OpenCL 1.2 support? (Score:5, Informative)
CUDA vs. OpenCL seems to be an example of the ongoing battle between an entrenched and supported; but costly, proprietary implementation, vs. a somewhat patchy solution that isn't as mature; but has basically everybody except Nvidia rooting for it.
"Mantle", like 'Glide' before it, seems to be the eternal story of the cyclical move between high-performance/low-complexity(but low compatibility) minimally abstracted approaches, and highly complex, highly abstracted; but highly portable/compatible approaches. At present, since AMD is doing the GPU silicon for both consoles and a nontrivial percentage of PCs, it makes a fair amount of sense for them to offer a 'Hey, close to the metal!' solution that takes some of the heat off their drivers, makes performance on their hardware better, and so forth. If, five years from now, people are swearing at 'Mantle Wrappers' and trying to find the one magic incantation that actually causes them to emit non-broken OpenGL, though, history will say 'I told you so'.
Re: (Score:2, Interesting)
"CUDA vs. OpenCL seems to be an example of the ongoing battle between an entrenched and supported; but costly, proprietary implementation, vs. a somewhat patchy solution that isn't as mature; but has basically everybody except Nvidia rooting for it."
Wishful thinking. Intel doesn't give a crap about OpenCL, they don't even expose their GPU's for OpenCL under Linux, and as I mentioned AMD are betting on Mantle. As for "costly", there's nothing about CUDA that is costly that isn't costly with OpenCL
Mantle is f
Re:What about OpenCL 1.2 support? (Score:4, Interesting)
Mantle is less an open specification than CUDA is, CUDA does have a full x86 implementation available which is mostly slower due the CPU not taking too much advantage of the massive parallelism of the GPU (not sure about how this play out with Xeon Phi).
Mantle on the other hand is a very low level Graphics API that basically exposes SW to some low level interactions with AMD's GPU. It's more like GLIDE than OpenCL. From what I've seen so far it's not clear to me Mantle will be very portable across several AMD generations. It works for GCN based cards out now but who knows if it will be fast for GCN++ without a major rewrite of the application. NVIDIA could implement Mantle but would probably have to translate so much stuff in SW to make it work you'd lose the low SW overhead.
From the one or two talks I listened to Mantle seems to basically expose the same interface the driver developers have access to and lets you go to town. This is great for the latest architecture but now it's up to your application to evolve as the HW does. There's a whole lot of work being done to optimize for each architecture release in the driver which allow older games that the publisher doesn't really want to support anymore to work and see performance boosts.
Re:What about OpenCL 1.2 support? (Score:4, Insightful)
Nvidia has sidetracked OpenCL for CUDA?
Nvidia has never much liked OpenCL. And why would they? They currently hold the high ground in GPU computing, with a proprietary API that only they can implement. I'd assume that they have some sort of 'OpenCL contingency plan', just in case the market shifts, or they ever want to sell a GPU to Apple ever again; but as of right now, supporting OpenCL would just be a "Sure, please, commodify me, I'd love that!" move.
Re:What about OpenCL 1.2 support? (Score:5, Informative)
Re: (Score:2)
Re: (Score:3)
"They currently hold the high ground in GPU computing"
And yet they still can't even get a decent fucking hashrate with CUDA, meanwhile OpenCL and AMD stomps the fuck out of them for that.
AMD has essentially 'made' everything from Bitcoin to every game console this gen? What the hell is nVidia doing if they're so superior?
Re: (Score:2)
That's because Bitcoin mining is not something critical, AND happens to fall into the limited memory structures and computational capabilities that AMD provide. In real-world relevant computational tasks, nVidia and CUDA are dominating in ease of use, flexibility and computational throughput. Hence why HPC use Nvidia and not AMD.
Hashrate is just a gimmick anyway, since if you're serious about it, you go with a FPGA kit.
Re: (Score:2)
"That's because Bitcoin mining is not something critical,"
I guess you don't watch C-SPAN or pay attention to Bitcoin, otherwise you'd understand it's the most valuable currency on the planet right now. When a digital string of essentially randomly generated fucking numbers is worth more than PLATINUM, you'd better pay attention.
AMD makes you money. nVidia makes you broke and delivers not very much useful, it seems.
Re: (Score:2)
Not on my nVidia 320m, I'm not!
I should have bought USB ASIC miners when they were still available for cheap after the 75 USD price crash.
So, let me get this straight here... (Score:2, Insightful)
I'm shocked.
More to it than that... (Score:5, Insightful)
IBM has announced willingness to license the Power8 design in much the same way that ARM licenses their stuff to a plethora of companies. IBM has seen what ARM has accomplished at the lower end in terms of having relevance in a market that might otherwise have gone to Intel given sufficient time, and sees motivation to do that in the datacenter where Intel has significantly diminished POWER footprint over the years. Intel operates at obscene margins due to the strength of their ecosystem and technology, and IBM is recognizing that it needs to build a more diverse ecosystem itself if it wants to compete with Intel. That and the runway may be very short for such an opportunity. ARM as-is is not a very useful server platform, but that gap may close quickly before IBM can move, particularly as 64-bit ARM designs start getting more prevalent.
For nVidia, things are a bit more than 'sure we'll take more money'. nVidia spends a lot of resources on driver development and without their cooperation, using their GPU accelerator solution will get nowhere. nVidia has agreed to invest the resources to actually support Power. Here, nVidia is also feeling the pressure from Intel. Phi has promised easier development for accelerated workloads as a competitor to nVidia solutions. As yet, Phi hasn't been everything people had hoped for, but the promise of easier development today and promise for improvements later has nVidia rightly concerned about future opportunities in that space. Partnering with a company without such ambitions gives them a way to try to apply pressure against a platform that clearly has it's sights on closing the opportunity for GPU acceleration in HPC workloads. Besides, IBM has the resources to help give a boost in terms of software development tooling that nVidia may lack.
Even more to it than *that*... (Score:3)
According to the Reg [theregister.co.uk] (page 2) Power8 is going to have some sort of memory coherence function for accelerators. Allowing the GPU to be just another first-class processor with regards to memory could be a big win, performance-wise, not to mention making it easier to program.
The latest version of CUDA (version 6) has also just added features in the same area (unified memory mgmt). Anandtech [anandtech.com] has some more info about that.
This thing will be beast!
Anyone remember the Cray? (Score:4, Interesting)
Ah, the good old days.... when CPUs were measured in megahertz, and instructions took multiple clocks. :D
Really, what was the Cray when it first came out? One vector processing unit. How many does this new NVidia board have? How much faster are they than the original Cray?
Re: (Score:3)
People spent less CPU cycles getting to the moon than are wasted every day on cat videos and facebook.
Where's my flying car?
Re: (Score:3)
Re: (Score:2)
Re: (Score:2)
Really, what was the Cray when it first came out? One vector processing unit. How many does this new NVidia board have? How much faster are they than the original Cray?
2,880 "cores", each able to do one single-precision FMA per clock (double-precision takes three clocks for this card, but 24 clocks for most gaming GPUs). These are organized into fifteen "SMX Units", which have 192 ALUs apiece (with four schedulers and eight dispatch units). The exact clock rate is variable, as it will boost the clock speed above "normal" levels, thermal and power conditions permitting, but 1GHz is a good enough approximation. This comes out to about 1.92TFLOPS, 128GFLOPS per SMX, or (inte
Re: (Score:2)
Now imagine a Cray-sized cabinet stuffed with those cards.
Bwahahahahaha! Power!!!!
Speaking of the 21st Century (Score:2)
DRAM bandwidth (Score:2)
Re: (Score:3, Informative)
Right now memory bus width is a die size tradeoff. NVIDIA can get GK110's memory controller up to 7Gbps (GTX 780 Ti), which on a 384-bit bus makes for 336GB/sec, but relatively speaking it's a big honking memory controller. AMD's 512-bit memory controller in Hawaii isn't designed to clock up nearly as high, topping out at 5Gbps, or 320GB/sec. But it's designed to be particularly small, smaller than even AMD's old 384-bit mem
Re: (Score:3)
They are pursuing both, and in fact the Power8 will support both GPU and FPGA accelleration add-ons.
Re: (Score:2)
What pressures going forward would inspire the entities that influence this sort of thing to start fresh?
A license system that allows hardware vendors/users to port/recompile code to current designs? x86 has a legacy because of all that binary only windows software out there.
What in the competition between Intel IBM etc causes this apparently extreme level of backward compatibility?
Their customers want it so they don't have to buy new overpriced binaries every time they upgrade hardware. If they have to upgrade the software as well as the hardware, why not consider a competitor?
What is stopping them from building a niche product that abandons backward compatibility or does that already exist?
Sure, this is done from time to time, like those arm based windows RT tablets which didn't do well because they couldn't run x86 software.
Tesla GPU ? (Score:3)