Microsoft Demos C++ AMP At AMD Developers Summit 187
MojoKid writes "The second day of the AMD Fusion Developer Summit began with a keynote from Microsoft's Herb Sutter, Principal Architect, Native Languages and resident C++ guru. The gist of Herb's talk centered around heterogeneous computing and the changes coming with future versions of Visual Studio and C++. One of the main highlights of the talk was a demo of a C++ AMP application that seamlessly took advantage of all of the compute resources within a few of the various demo systems, from workstations to netbooks. The physics demo seamlessly switched from using CPU, integrated GPU, and discrete GPU resources, showcasing the performance capabilities of each. As additional bodies are added, workload increases with a ramp-up to over 600 of GFLops in compute performance."
AMP? (Score:5, Insightful)
In this context, AMP doesn't stand for amplifier, Adenosine monophosphate or Ampere, but for "Accelerated Massive Parallelism". Seems like a microsoftism for the more traditional term of "Massive Parallel Processing"
Re:AMP? (Score:5, Insightful)
Microsoft has a history of inventing names and acronyms that collide with established terms in unrelated areas. I suspect, they are trying to get potential users to see a new name as something they have heard but know nothing about its actual meaning, so term looks "established" in those people's eyes.
For example, ".Net".
Re: (Score:2)
I have to imagine that was also behind XP.
Re: (Score:2)
For example, ".Net".
This time, it wasn't them to start. To begin with, the "Microsoft Project Plan" was theirs long before the "Massive Parallel Processing" came into the picture.
(did I mention that I hate acronyms? Yes, I did... Oh, well, SNAFU... I'm still FUBAR)
</lame_joke>
Re: (Score:3, Insightful)
Microsoftie: isn't Microsoft great? They have managed code and no one else does.
Me: Isn't Java the same?
Microsoftie: No, that's a virtual machine, that's different!
Me:
Re: (Score:3)
C# is a "scripting language"?
Maybe when you graduate from high school, you'll learn that how cool you are is unrelated to the height of the language you program in.
Re:AMP? (Score:4, Interesting)
To be fair, doing any nontrivial assembly will put some serious hair on your balls. But it's just not very good for (almost*) any real work.
I use C for performance-critical code, C++ for complex performance-significant components (like a OpenGL million-poly renderer), Java or C# (depending on target platform(s)) for large but otherwise-modest programs, and scripting languages (mostly Python) for one-off programs or little tools that don't justify the involvement of a more heavyweight language.
Use the right tool for the job, as always. Can't go too far wrong with Java, and if you're going to hit its performance wall, you should know up front.
* Only large-scale assembly coding I've ever had to do was for a compilers class, but there was obviously no way around it. Fascinating to learn and do, but I sure hope I'm done with it...
Re: (Score:2)
>> * Only large-scale assembly coding I've ever had to do was for a compilers class, but there was obviously no way around it. Fascinating to learn and do, but I sure hope I'm done with it...
That is odd. In my compilers class the approach taken was : 1) lex/yacc and 2) bootstrapping
Re: (Score:2)
The compiler wasn't written in assembly, but it had to output it and debugging required following large codebases generated from an object-oriented language. That's what I meant... running through 500 lines of assembly to make sure the vtable was properly generated.
Re: (Score:2)
I write some fairly large projects entirely in assembler, and many modules that interface with C code. I work with microcontrollers though, not computers. 2k ROM for everything, software and data.
Re: (Score:2)
Java or C# (depending on target platform(s)) for large but otherwise-modest programs, and scripting languages (mostly Python) for one-off programs or little tools that don't justify the involvement of a more heavyweight language.
I could see why you'd prefer feature-rich languages that compile into bytecode that runs on a VM to a feature-rich language that compiles into bytecode that runs on a VM. That's smart planning.
Re: (Score:2)
Are you being snarky? Java and C# are significantly faster than Python because they're JIT'd (hopefully this is changing with PyPy). They're also significantly more straightforward to maintain than Python because of their strong type structure. Having a rather-rigid interface is superior for teamwork, but more flexible structure is helpful for the sort of little utilities I was talking about.
What's your point?
Re: (Score:2)
Or the casual use of "SQL Server" for "Microsoft SQL Server". (Not to be confused with all the other database products that might handle SQL, and have the phrase "SQL Server" in their full product name.)
Re: (Score:2)
Re: (Score:3)
Re: (Score:3)
Surely that would be WISA? Windows, IIS, SQL Server, ASP.
Re: (Score:2)
Misa dun wanna hear of WISA again.
Re: (Score:2)
Re: (Score:3)
Re: (Score:2)
Haven't you heard of AFT? Acronyms for techies?
Nope. Only "GNU's not Unix" ;)
Re: (Score:2)
Because "CUDA" and "GPGPU" are such obvious bits of terminology ... ?
Re: (Score:2)
Because "CUDA" and "GPGPU" are such obvious bits of terminology ... ?
BOHICA!
(may I hate you too, along with all the acronyms?)
Re: (Score:2)
Well, that's because PCMCIA.
Re: (Score:2)
I think picking on the acronym is a nice way to sidestep talking about Microsoft actually doing something cool.
Re: (Score:2)
I think picking on the acronym is a nice way to sidestep talking about Microsoft actually doing something cool.
Sidestepping? Maybe... but I still hate acronyms.
Re: (Score:3)
Yes, but I think the relevant question is: how precisely is it that they kill this one. They have a history of devising cool technology and then managing to fuck it up.
Re:AMP? (Score:4, Insightful)
How hard is it to write "AMP (Accelerated Massive Parallelism)" in a summary?
Re: (Score:2)
I've just finished editing the final draft of a report. One acronym was "DM", which I assume means "document manager" but now says, thanks to find and replace, dungeon master. Who will notice?
Re: (Score:2)
CUDA C++ and Thrust (Score:3)
Today, CUDA C++ already provides a full C++ implementation on NVIDIA's GPUs:
http://developer.nvidia.com/cuda-downloads [nvidia.com]
And the Thrust template library provides a set of data structures and functions for GPUs (similar in spirit to STL):
http://code.google.com/p/thrust/ [google.com]
- biased NVIDIA employee
Re: (Score:2)
Hey, any word on getting a new OpenCL 1.1 driver released? I know about the one you folks released last year to registered developers, but it's broken and only works with older GPU drivers. Any hope for OpenCL 1.1 in an upcoming CUDA 4.1 SDK?
Re: (Score:2)
Good question.. what happened to DirectCompute? Or is that going to be a layer underneath AMP?
Either way, OpenCL is what you probably want to be looking at. CUDA = NVidia lock-in, ATi/AMD Stream / APP = AMD/ATi lock-in, DirectCompute/AMP = Microsoft Windows lock-in. Not sure what Intel are pushing these days.
Re: (Score:2)
I am a CUDA C++ programmer. My biggest complaint about programming tools for the GPU is that there are no dense linear algebra libraries that work at the SM level. For my application I had to re-implement a big chunk of BLAS and part of LAPACK from scratch so that each SM runs a different problem instance. On the CPU you can just use openmp + single threaded BLAS to achieve the same granularity of parallelism. Thrust API does not address this granularity of parallelism. I'm eager to see if the AMP API d
This could push new hardware (Score:2)
Re:This could push new hardware (Score:4, Informative)
[quote]But, a lot of older computers which don't have DirectX 11 graphic cards have to emulate the DirectX DirectCompute API on the CPU[/quote].
They don't really have to emulate anything, most of the kernel (as in "compute kernel") functions and operations in DirectCompute have a one-to-one mapping with most CPU's SIMD instruction sets, such as x86's SSE/AVX. The primary difference then is that on the CPU you have a lot less cores, and on the GPU you may have thousands of cores/streaming processors, but you have higher memory latencies and at best only a L1 & L2 cache.
Yet another attempt at vendor lock-in. (Score:2)
Instead of contributing to open efforts regarding MP, they go on and do their own API. And a few years down the road, where everyone else uses the open API, they will let down their developers by supporting the open API, since it will no longer be viable economically to use their own API any more (like Silverlight/.NET in Windows 8).
Microsoft, when will you learn your lesson? instead of locking us in, why don't you contribute to the efforts of the community to solve the same problem?
Re: (Score:2, Insightful)
I don t think there are open affords which are attempting to do this by extending C++ compiler.
Open affords(clearly exclude CUDA) are usually inventing a new language (usually a subset of C). With much restricted language features, and is loosely integrated with host code.
I think they are the first and they are doing the right things here.
It is nice to have the host code tightly integrated to the GPU code, and with most of the useful C++ language features there.
Why go Microsoft? (Score:3)
There are already tons of such tools, most of which are not tied to specific architectures, operating systems, or compilers.
Really, why would you go Microsoft on this at all? Clusters and supercomputers usually don't even run Windows at all.
Doesn't look so seemless (Score:2)
Based on the code example, it simply looks like they extended cl to include (a form of) nvcc syntax.
I was hoping they meant a recompile of exiting code would leverage these resources.
If I want to port all my stuff to CUDA or OpenCL I will do so, and I don't need to lock myself into MS's platform and syntax.
Next Gen Direct X (Score:2)
I did RTFM and WTV and caught something most didn't. This is going to be included in the next version of Direct X and looks to be part of Windows 8. Is it any wonder that MS demo'd their latest version of DX on new hardware? Not to me. This doesn't discount the performance level's possible on the new CPU/APU designs both AMD and Intel are pursuing and if MS can include the new DX in Win8-ARM, we should be seeing some damn interesting capabilities in the next couple of years
Re:Where's my C# version? (Score:4, Funny)
Re: (Score:2)
Well, now you understand where their hunger for more speed comes from.
Re: (Score:3)
Ahh, the sinking feeling of having written a serious response to a post that's accruing funny mods...
Re: (Score:2)
Re: (Score:2)
C# has stolen all of the love for the past decade, it's high time Microsoft significantly retooled their native development languages and technologies. Us native developers are literally starving for new stuff, fortunately C++0x is nearly ratified and heterogeneous HPC with OpenCL and DirectCompute is gaining ground. My guess is that C++ AMP is a merging of their C++ compiler with DirectCompute.
C# and other managed languages aren't exactly the best choice for true HPC.
Re: (Score:2)
C# and other managed languages aren't exactly the best choice for true HPC.
That's okay - neither is Windows.
Re: (Score:3)
Re: (Score:2)
Paint me surprised: MS can C++??? Last time I checked they had no C++ anymore. I think by C++ they actually mean the silly managed(?) thing they try to pass as C++ instead. Meaning that it is at least half way already implemented in C#.
Re: (Score:2)
The Windows API is actually all C. All the managed stuff is just an easier to use wrapper.
Managed C++ doesn't even have working syntax highlighting in VS2010 because almost no one uses it. It's currently on the back-burner to eventually get fixed.
Re: (Score:2)
Re: (Score:2)
Re: (Score:3)
Yeah, I know you're trolling, but C# is a good language. I've coded millions of lines in C, C++, and C# and I can tell you which I'd rather code in any day of the week and twice on Sunday. Combined with VS, you simply get. stuff. done. very quickly and very easily.
Re: (Score:3)
Java is equally garbage.
Mm-yeeaah!... But, at least, it has a garbage collector. :)
Re: (Score:2)
Re: (Score:3)
Java is equally garbage.
Mm-yeeaah!... But, at least, it has a garbage collector. :)
If only it could collect itself.
You need something recursive for that: try Prolog and/or the "GNU's not UNIX" toolset :)
Re: (Score:2)
Java is equally garbage.
I think you mean: Java.equals(garbage).
What about Vala?
Re: (Score:2)
Re: (Score:2)
Just curious, retarded compared to what?
Currently my favorite tools for developing enterprise LOB apps (assuming web-based) would probably be something like:
These are in order. Ruby is frequently out due to, well, it being Ruby and lots of people with decision-making power will go "Ruby???". .NET is out when I want do be cross-platform (not
Re: (Score:2)
Re: (Score:2)
So you must have a reading-comprehension problem I take it. My personal workstation is a Windows XP thing, but if you had been able to read you would have seen that I also have to do cross-platform stuff. In such cases I am on Linux primarily. Some times I wander into Solaris territory.
There are a number of reasons that Windows (I would like it to be 7, but that isn't going to happen for another month or so) is my primary workstation. The main is that it is also my customers main workstation. Another is tha
Re: (Score:2)
That's a silly statement as I endorse, am proficient in and support over six quite different operating systems. But you only know perhaps the two you mention, but you think I am the one with narrow mind? If a person like me points out that one particular operating system has severe issues compared to five others in common enterprise use, it actually means something. Sorry about yo
Re: (Score:2)
Sigh. You really didn't get it did you?
I've found it superior to windows for quite a few years
Good for you. Now, can you please find me a good alternative to Photoshop? If you say GIMP you prove that you know nothing at all. How about Sony Vegas, can you do that for me? Hey, lets make it easy, can you find an IDE that comes close to Eclipse on Windows?
Your statement about proficiency is silly to say the least. Which six quite different Operating Systems are we talking about?
OK, I can play that game too. In 1985 I got my first PC, an XT clone, it had DOS and two
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Oh, like Grand Central Dispatch from Apple?
No, not really like that at all.
Re: (Score:2)
Why not? Care to elaborate? We are talking about a shared memory form of parallelism that automatically assesses system resources and allocated threads to appropriate cores, right?
Re:Grand Central Dispatch (Score:4, Informative)
The most relevant difference is that it automatically uses different types of compute resources for the same task, depending on what's available. Core Image can do some of that, but it's limited to graphics workloads.
So it's Grand Central Dispatch + Core Image + a bit.
So it's OpenCL then (Score:4, Informative)
No, it's really a lot more like OpenCL [apple.com].
Which is not Mac only BTW... but you can use it in OSX or IOS development.
Also Apple's Accelerate library (C library) takes advantage of OpenCL for BLAS and Linpack and so on...
Re: (Score:3)
Re: (Score:2)
Why not? Care to elaborate? We are talking about a shared memory form of parallelism that automatically assesses system resources and allocated threads to appropriate cores, right?
GCD is simply an implementation of the thread pool pattern. AMP takes parallel tasks and can seamlessly switch through utilising different computing resources to complete them rather than having to specify those resources and write resource-specific code for them. So no, it isn't like GCD.
Re: (Score:2)
Re: (Score:2)
They are complementary technologies. The best article was the original Snow Leopard review at Ars Technica, which explains the two technologies. Typing from a phone, so can't provide you with the URL for the moment.
Re: (Score:2)
The best article was the original Snow Leopard review at Ars Technica, which explains the two technologies
That article had multiple technical inaccuracies on every single page. If it was the best article that you read, then I can only assume that it was the only article that you read.
Typing from a phone, so can't provide you with the URL for the moment
That's probably for the best.
Re: (Score:2)
Can't speak for others, but in my case it's
Don't know what AMP is and can't understand TFS/TFA
Neither the summary nor the article seem to explain what AMP is.
For the benefit of everyone else who is trying to figure out, here is a link: Introducing C++ Accelerated Massive Parallelism (C++ AMP) [msdn.com] To quote from that page:
Re: (Score:2)
Re: (Score:3, Insightful)
Re:Who would be the target customers? (Score:5, Insightful)
Well, assuming your code has embarrassingly parallel components. Otherwise, it's pretty useless.
Re: (Score:2)
Re: (Score:2)
I just got a small laptop with one of the AMD Fusian Llano chips and I was pleasantly surprised by how well it performs. I wasn't expecting to be able to do any gaming, but I can game as long as I stick to games from several years back. Which is not bad considering it only has a dual core 1.6ghz processor and an AMD Radeon HD 6310.
The battery life and performance aren't as good as some of the Intel based ones, but it was a couple hundred dollars cheaper.
Normally I would say that too (Score:4, Insightful)
This is key innovation. It looks like an important new step we've needed for a long time. It looks like they have done well with it.
Of course it should be inspected for traps. From these folks there are always traps. But this particular time I think this is important enough that we look closely at it to see if there isn't something useful we can safely extract, while being mindful for the traps.
I've been here a long time. I've posted nearly 5,000 comments here over 8 years. Never once before have I said this about a Microsoft technology: This deserves a look.
Re:Normally I would say that too (Score:5, Interesting)
As someone actually at the event, someone who attended both the keynote and the later (and more in-depth) technical session, and someone who is employed as a GPU programmer, I would say that it's being vastly overblown. It is very easy to look at the examples in the keynote (dense matrix multiplication with very little code modification, and an N-body simulation for which the code is not shared) and believe that this is finally some panacea for the difficulties involved in GPU computing and massively parallel computing in general. But the reality is, much like some approaches before it, C++ AMP simply elides some of the verbosity in the CUDA/OpenCL APIs regarding memory allocation, thread configuration, etc. The matrix multiplication example appears dead simple because matrix multiplication on a GPU is dead simple. As soon as you start trying to write more advanced applications with this, you find that you need to take advantage of a fast shared memory to get worthwhile performance gains -- to do that, you add "tiles" to your "grid" (in CUDA terms, "blocks" and "grid", in OpenCL terms, "local workgroups" and "global workgroup"). As soon as your output starts getting more complicated than a nice, deterministic matrix multiplication or N-body simulation, you may find that you have potential race conditions that you have to address yourself. And when you've broken up your problem into a tiled grid, taken fast local shared memory and slow global shared memory into account, and ensured that you have no race conditions, you've basically done all of the work of writing a CUDA or OpenCL kernel. Only now you've done it in a way that is very proprietary, instead of the (comparatively open) CUDA and (way the fuck more open) OpenCL.
It's unfortunate that it is being sold as this amazing world-changing breakthrough, because although it is not by any stretch that, it is in fact quite a nice concept. This is something, like Microsoft's PPL, that can be used to parallelize existing code very easily provided the code is parallelizable and written in a parallel-friendly manner. It is not something, however, that will do the work of parallelization or even the work of optimizing parallel-friendly code for GPU hardware for you.
Re: (Score:3)
There's only one kind of kernel I deal with that actually gets near full utilization without extensive hand-tuning (i.e. that could be written by a computer without human guidance) - the ones that do simple atomic operations on N input arrays and spits out M output arra
Re: (Score:3)
I have no knowledge of or experience with CUDA or OpenCL (other than the general vague idea of what these are for), so let me clarify something. How easy is it to write a program in either of those that parallelizes across all computational devices available to the system (not just GPU, but also CPU cores), and can change the specific devices being used on the fly, all without recompiling or restarting the binary? My impression from the demo, at least, was that this is the main selling point, rather than it
Re: (Score:2)
How easy is it to write a program in either of those that parallelizes across all computational devices available to the system (not just GPU, but also CPU cores), and can change the specific devices being used on the fly, all without recompiling or restarting the binary?
OpenCL on iOS or Mac does that today, it runs the code on the fastest available processor - GPU or CPU.
If you don't mind going through libraries I believe iOS lets you use the Accelerate framework to do some common computational tasks (like
Re: (Score:2)
OpenCL on iOS or Mac does that today, it runs the code on the fastest available processor - GPU or CPU.
Can it switch that while the app is running?
By the way, why not CPU and GPU (or rather all available GPUs) - which would make most sense if you want to squeeze the most from a given hardware.
Re: (Score:2)
Can it switch that while the app is running?
Yes, although I'm not sure how automatic that part is.
By the way, why not CPU and GPU (or rather all available GPUs) - which would make most sense if you want to squeeze the most from a given hardware.
That is distinctly possible, though I'm not sure how you'd set it up to do that. But there's nothing preventing you from running OpenCL code on both at once.
Re: (Score:2)
Re: (Score:3)
An interesting part of AMP is that it is platform-agnostic. Their implementation uses DirectCompute under the hood, but none of that is exposed in the API. This means it could probably be implemented for *nix.
Believe it or not, Microsoft has also done this a couple other times recently -- with real results -- and it all comes from the native C++ team as part of Microsoft's new-found focus on C++ after so many years in .NET mode.
The Parallel Patterns Library integrates extremely well. It knows that it's a
Re: (Score:2)
Re: (Score:3)
I know that 5,000 comments isn't much to you, the one that has posted probably a million comments, but for the rest of us it's quite a bit.
Re: (Score:2)
With 5000 comments you should know not to feed the trolls.
Re: (Score:2)
Is this tech news? Things that matter?
Maybe I became too demanding now that I got old... (cranky?)
Well yeah, I was expecting it to make toast too. ;)
Re: (Score:2)
In response to your comment, Microsoft just announced the release of the Microsoft Pony(tm) Acquisition Suite. "Pony(tm) is designed to provide developers with a solution oriented roadmap for their every need and desire, as quickly as possible; this is the fulfillment of their every dream," announced Steve Ballmer. Critics, however, denounce Pony(tm) as a ripoff of Eliza, with the phrase "We will provide that within 6 months." inserted liberally in the responses.
Re: (Score:2)
Is this tech news? Things that matter?
Maybe I became too demanding now that I got old... (cranky?)
Matter less for old people: they are supposed to already have established themselves (so they do have time to whinge in /. "How's this new?")
For the younger generation, this matters more: they will need to know when, next year, they'll submit their resume to a job ad asking for "3 years+ experience in MS Visual AMP++".
Re: (Score:2)
It makes for a better demo, I guess, but that'd be my first question too.
My second question would be whether it's the application which decides where it runs, or the OS -- it seems like now we'll not only need multicore schedulers, but GPU-aware schedulers also, but it's still something I'd like the OS to have a say in. For instance, "We need the GPU for graphics now, so you get routed to the CPU instead."
I suspect my second question is more naive than my first.
Re:Microsoft C++ (Score:4, Informative)
That was back with MSVC++ 6.0 released in 1998 before the ISO C++ draft was fully ratified. MSVC++ today is one of the more standards compliant compilers, although their template instantiation mechanism is still somewhat broken so that it can still support their legacy MFC crap.
Re: (Score:3)
A "while ago", gcc didn't support C++ namespaces. So?
Re: (Score:2)
That's because of the stupid scheduler in Windows. Try to run it on Linux.
Re: (Score:2)
'Multiple exclamation marks,' he went on, shaking his head, 'are a sure sign of a diseased mind.' -- Terry Pratchett
Re: (Score:2)
Two things you should do to fix that. Number one, you should try to migrate off that Pentium 4 with 256M of RAM. Secondly you should stop buying software from Phantom Software and Bagles on the corner. There is no such thing as Visual Studio 2011, and I doubt there ever will be.
Now, if you want to try a really, really slow IDE that is not a figment of your imagination, try Eclipse