Matlab Integrates GPU Support For UberMath Computation 89
An anonymous reader writes "Matlab now comes with GPU native support in the 2010b version. This means loads of Matlab commands can be parallelized onto the GPU without having to re-code things in C++ or Fortran using CUDA. Pretty sweet for the HPC community."
Nice (Score:2, Interesting)
Re: (Score:2)
a quick google search turned up one or two discussion threads where the focus of the debate was whether certain GPU processing libraries were license compatible with Octave. After a few pages of discussions about GPL, BSD, linking libraries and system libraries, Brook and CUDA and something else, my eyes glazed over.
so, maybe?
Re: (Score:2)
is there a particular octave package or function set? I saw some parallel algorithm and MPI libraries, but is there something I missed relating to GPU parallelization?
How long until R supports this? (Score:2)
Re: (Score:2, Interesting)
http://cran.r-project.org/web/views/HighPerformanceComputing.html
See "Parallel computing: GPUs"
Re:How long until R supports this? (Score:2)
Support exists now:
http://brainarray.mbni.med.umich.edu/brainarray/rgpgpu/ [umich.edu]
Re: (Score:2)
Re: (Score:2)
I think it depends on what you want to do. Matlab is great for reading and working with log files. It's also great if your tasks can be vectorized; your code will be fast and require very few statements.
However, if your project requires iteration, it's going to be slow as hell in Matlab.
The biggest complaints I have about Matlab (besides the cost) are the way it handles memory management, and the way it handles pointers. I can't tell you the amount of times I've had Matlab tell me there wasn't enough memory
Re: (Score:2)
I can't tell you the amount of times I've had Matlab tell me there wasn't enough memory available on my 8GB machine, because I ran out of what it had allocated for me.
Verne, I think you're doing something wrong there. The only time I see that sort of error is when I've done something worthy of a palm-in-face like trying to pre-allocate a 7-D array with 1000 elements per dimension. Yes, you have to be careful how many times copies of large arrays are made, but that's true of any language.
Also, with the newer versions of Matlab, iterations aren't that slow, at least compared with the older versions from a decade ago. You do, however, need to be very careful about accur
Re:How long until R supports this? (Score:4, Informative)
To speed up the computation, I at first just wrote a Java class to be called from Matlab. this showed considerable speed improvement when compared to the Matlab code. I then decided that I could multithread the application in Java for even more through put. In this particular machine, I have 12 cores, so I used 10 threads and reduced the computation from over 70 minutes to less than a minute by using a Java class plus Java's concurrent libraries.
Now, in general I prefer to code in Matlab, because you can do more with less lines of code, but there are certain times where strictly Matlab is not fast enough. What is nice with Matlab 2010b, ( I don't remember how far this capability goes back), you can seamlessly use Java
Re: (Score:2)
I've had similar luck staying within Matlab by using their profiler. Although I've used C callouts for some high-performance computations (like implementing a fast 2D histogram), I try and stay within Matlab whenever possible as mostly, not always, but mostly, the time spent optimizing a computation would far, far outweigh the time gained from a faster algorithm. If we know from the get-go that a given algorithm will be run many times, or is performance critical, it might be coded up in Matlab to prove co
Re: (Score:2)
Re: (Score:2)
To me, this seems like more of a personal workflow/environment preference, rather than a problem with the language itself. Of course there are a few changes I would like to be made in the Matlab environment (and having this option would be a good thing), and the speed and memory issues certainly prevent usi
Re: (Score:2)
Old news (Score:2, Insightful)
Note the "R2010b" version number. That means that this capability has been out since the second half of 2010.
Re:Old news (Score:4, Interesting)
I think they meant 2011b which is not out yet (it's in beta). GPUmat as well as nVIDIA had toolboxes for MATLAB for a while now (although the CUDA toolboxes require manual code edits and compiling to get it to work) but there is only limited function support (eg. FFT on large arrays works wonders on CUDA) and even those had limited support (only single floating point precision for example). There is also the commercial Accelereyes with Jacket.
Re: (Score:2)
Re: (Score:2)
2010b did not include GPU array indexing support (among other things), making it fairly worthless for anything moderately complex.
2011a DOES do indexing on GPU arrays. It works very well in my experience so far.
Jacket (Score:3)
There is competition from Jacket:
http://www.accelereyes.com/products/jacket [accelereyes.com]
This product is more expensive but more effective than Matlab.
I tried the free trial and found it much more effective than Matlab.
Alas the cost is too high to justify Jacket in my case, I would rather
buy more hardware instead.
Re: (Score:1)
Jacket costs as much as a toolbox from Matlab (which is required to use their GPU stuff). I'm a Jacket user and am more than pleased with both performance and support. Jacket also supports more functions and is faster.
Pretty sweet for the HPC community.... (Score:2, Insightful)
who actually uses MATLAB for real HPC?
Re: (Score:1)
who actually uses MATLAB for real HPC?
Lots of people. I know for a fact that NASA's Columbia supercomuter has Matlab licenses. Moreover, for certain engineering applications, Matlab is the de-facto standard (like control theory and certain areas in signal processing). Sure, you could write every solver, optimizer, and toolbox in standard C, C++, OpenMPI (a lot of control is just numerical optimization), but it would mean a lot of coding from ground up. Alternatively, you can get a bunch of specialized libraries, but then the administrator shoul
Re: (Score:2)
People that can't code for shit. It's pretty popular in the scientific community simply because it combines the simplicity (and noobicity of the coders) of PHP, Python or Ruby with high-level mathematical constructs.
The thing is that for real coders it's actually harder because you're missing stuff like decent, inline evaluation of variables, loops and if/then/else constructs, evaluation of data types is hard to do, function overloading, regexp, greater-than-or-equal, and even the very basic of text evaluat
Re: (Score:2)
Seconded. I still remember some guy that ported his MATLAB finite element code to c++. The solving time for his problems went from 24 hours to 8 minutes -- 6 of which were the post-processing/display of the results... That was towards the end of his thesis, so basically he must have wasted countless months of productive work because of that.
MATLAB is never the right tool -- unless you are really incompetent, so you need the hand-holding, and really obtuse, so you can't handle the small differences with Octa
Re: (Score:1)
Re: (Score:2)
The common process is for one applied mathematician to write the algorithm in Matlab, then 15 people to convert it to optimized C++/CUDA.
Surely you can see the point in making Matlab faster, or for automated generation tools.
Support for FPU (Score:1)
Its about time .. (Score:2)
Have not used the software other then a few times for an assignment but if I already did not know that it is not a particularly nice piece of software then i would be surprised that it took them this long to do the obvious.
Re: (Score:1)
Python libraries (Score:1)
Re: (Score:1)
3 Years in the Making (Score:4, Informative)
For a full explanation of why I say "fake", read, http://www.accelereyes.com/products/compare [accelereyes.com]
For a brief explanation of why I say "fake" GPU support consider the question, what does supporting GPUs mean? If you can run an FFT are you content? Or do you want to use INV, SVD, EIG, RAND, and the list goes on and on. Jacket has 10X the functionality of PCT-GPU.
Why else is the PCT-GPU implementation weak? Well, it is so poorly constructed (shoehorned into their legacy Java system), that it is rarely more beneficial to use the GPU than the CPU with the PCT-GPU implementation. It takes 600 cycles to load-then-store global memory on the GPU (required in each kernel call). The main innovation that led us to build Jacket is the ability to generate as few kernels as possible to eliminate as many 600 cycle roundtrip transfers as possible. For example, Jacket's runtime system may only launch one kernel for every 20 lines of code. PCT-GPU on the other hand is limited to launching a GPU kernel for every basic function call.
Jacket also has a GFOR loop which is the only parallel FOR-loop for GPUs, http://wiki.accelereyes.com/wiki/index.php/GFOR_Usage [accelereyes.com]
I'm not aware of any MATLAB programmer that has had a good experience with PCT-GPU.
Finally, because I'm so thrilled at this getting slashdotted (despite it being a link promoting PCT-GPU), I'm be happy to offer free 3 month Jacket subscriptions to anyone that emails me in the next 48 hours with the word "slashdot" in the subject, at john.melonakos@accelereyes.com
Cheers!
PS: Roblimo, if we can get some blurb love in your summary on the main slashdot.org page, it would really mean a ton to all our guys that have worked on this project for the last 4 years!
great! (Score:5, Insightful)
Now we have a vendor of an overpriced add-on battling it out with the vendor of the mother of all overpriced and badly designed pieces of scientific software. As someone who actually uses numerical scripting languages, let me tell you: I'm not impressed.
My guess is that within a year or two, there will be better open-source alternatives to Jacket, just like there are better open source alternatives to MATLAB alrady. I'll just wait, thank you very much.
Re: (Score:1)
within a year or two
That's just it. We've been in the world of parallelization for years now, but relatively few open source developers have innovated or even ported for performance. Why? Because such performance gain is a luxury. You pay more for luxuries, that's just a fact of life.
Nevertheless, your timeframe sounds decent--but that's only as there become more varied and more open tools to support parallel implementations.
Re: (Score:2)
A lot of parallel processing that's coming out today commercially was pioneered by open source projects years ago. OMP and distributed computing are widely used.
On GPU computing, the speedups are barely worth it today unless you really hand-optimize your application for parallelization; you're not going to get a lot of speedups with Jacket on real
Re: (Score:1)
Jacket is meant to be a luxury as was mentioned elsewhere... providing a faster, better approach to what you could try to reinvent by hand if you had infinite energy.
The Canny Edge benchmark is a full blown application (of which Canny Edge detection
Re: (Score:2)
CONV2 is one of the most trivial cases for GPU programming; if you didn't screw up badly, I can't beat your code, but you couldn't beat mine either.
Most people shouldn't be using GPU programming at all because it i
Re: (Score:1)
1) Expert convolutions on the GPU (that work well for both separable/non-separable cases, arbitrary input matrix sizes, and arbitrary kernel sizes) are extremely difficult. I don't think you can be our implementation. If you can, I will try to entice you away from other pursuits in life.
2) CONV2 (i.e. convolutions) are very useful in many applications and often make more sense that pursuing some sort of other arithmetic expression. I do agree with your statement
Re: (Score:2)
Your error is with the "i.e." part. Convolutions are very useful, but CONV2 is almost never the right function to call. Most convolutions are separable. Those that aren't can usually be made separable. If you're really stuck with a non-separable large 2D convolution, you can use 2D FFT in some cases. And if you have a non-separable small 2D convolution, there's usually some other known trick you can use to speed it up. Anybody who has any b
Re: (Score:1)
Re: (Score:2)
So? For you to present benchmarks of a CONV2 that detects separability against the built-in CONV2 that explicitly does not use separability is dishonest, because much of the speedup you measure has nothing to do with GPU computing.
Re: (Score:2)
It's interesting that you find that Jacket's CONV2 doesn't detect separability, because the guy who runs the company claimed that it did; he was talking about all the "advanced algorithms" that CONV2 contains.
Of course, GPU computing speeds up convolutions; that's what GPUs were designed to do. The questions we have been discussing are the following.
First, is GPU cost-effective for most applications right now and are people going to see the speedups they hope for? In my experience, the answer is "no", bec
Re: (Score:2)
Isn't that always the case? A slight demand for something ( easy gpu programming at the matlab/octave level ), a company starts up to offer that service. Over time, if there is enough general demand, people start putting code snipits into the FS community, and it becomes a project, and over a 2 year period it becomes usable. But in the intervening period a company is trying to make money on this recent need that will become almost common place in a few years.
As far as matlab vs octave- There are still som
Re: (Score:2)
There is no "real need" for GPU computing yet because for most people, it's not cost effective: the speedups are modest at best, and you only get them if you know what you're doing (in which case you wouldn't be using these tools). Get yourself a multicore machine and use OMP and your code is likely going to run faster with less effort.
Open source developers will tackle GPU computing in scripting languages when it makes sense to do so. That's not because they need commercial leadership or leaks of "code s
Re: (Score:2)
Yes, you can speed up individual algorithms sometimes by an order of magnitude. But that often doesn't help you much with overall program performance, because once you eliminate one bottleneck, another one takes its place. Buying more cores gives you less speedup for each algorithm, but makes it much more likely that you spe
Re: (Score:1)
People have been saying that open source would swamp Jacket since we launched in 2007. The reality is that it is too stinking hard to build good stuff open source (i.e. where the developers aren't paid), when there isn't an enormous user community to fuel the effort in intangible benefits back to the contributors. Otherwise, we'd open source Jacket and try to live off the service contracts like every other open source project.
So we end up pricing the software inli
Re: (Score:2)
Most open source developers I know are paid and the stuff they produce has wiped away pretty much anything commercial and proprietary in any area where they have developed it.
The reality is that GPU computing barely makes sense today, and it certainly didn't make sense in 2007. And it may just be another fad, taken over again by general purpose CPUs, just like the last few times.
Re: (Score:2)
I don't dispute that there are alternatives to Matlab, but "better" is still premature in my opinion. Over the past year I had an interest in removing my work's dependence on proprietary software, so I have researched the Matlab alternatives, and I have even been using Python for some of my work (instrument
Re: (Score:2)
The first thing you should do is stop thinking of it as "levels". Matlab has a few packages that you can't easily get for Python, and it has Simulink. There are many other areas where Matlab isn't even remotely close to Python's level. The two are, as the technical term goes, "incomparable".
Well, closed source is even more fragmented! There's Matlab, Ma
beware: bad benchmarks (Score:2)
In your benchmarks, you list "1.26 hours" for Canny edge detection on a 4 Mpixel image in Matlab without GPU computing, and you miraculously speed that up to 8 seconds using your GPU tools:
http://www.accelereyes.com/products/benchmarks [accelereyes.com]
On my three year old desktop, using just 1 CPU from a Core 2 Duo, I can do Canny edge detection on a 4 Mpixel PGM image in about 1.7 seconds with straightforward C code (no pointer tricks), including I/O, parsing the PGM, and god knows what else. It's about the same in Pytho
Re: (Score:1)
Re: (Score:1)
Article seems slightly inaccurate (Score:2)
Mathematica has this feature, too (Score:1)
For those with a bent toward Mathematica, GPU computing is baked into Version 8.
There's more information at http://reference.wolfram.com/mathematica/guide/GPUComputing.html [wolfram.com]
In the spirit of full disclosure, I'm solely a long-time user, not a Wolfram employee.
MAGMA (Score:4, Interesting)
For those interested in an open-source alternative, there's MAGMA [utk.edu], which provides a bunch of linear algebra routines implemented in CUDA. I haven't tried it myself yet, but it looks promising.
cdb read and collate next? (Score:2)
Interesting. i wonder if the GPU could be used to perform functions on large sets in a constant database.
Re: (Score:1)
Re: (Score:2)
I'm not sure by "quick code" your joking but... Below is a snippet of code from the testbed of my app. It doesn't have the output tied to the map of lists in it but it is small enough you can see what is going on. A quick descriptions is: it reads packets off the interface. It orders information so it can be inserted into a mapsource));
dstPort = to_string(ntohs(tcp->dest));
pktWin = ntohs(tcp->window);
int flagArray[] = {ntohs(tcp->ack),ntohs(tcp->fin),ntohs(tcp->psh),ntohs(tcp->res1),nt
Re: (Score:2)
oops. something went wrong. i don't think it is going to jive with /. the post part of the reply should have been:
I'm not sure by "quick code" your joking but... Below is a snippet of code from the testbed of my app. It doesn't have the output tied to the map of lists in it but it is small enough you can see what is going on. A quick descriptions is: it reads packets off the interface. It orders information so it can be inserted into a map of lists called connections. I update the map with the packet in
Re-code (Score:2)
This means loads of Matlab commands can be parallelized onto the GPU without having to re-code things in C++ or Fortran using CUDA
But you will have to re-code soon when a new version of Matlab is released and functions have changed over and over again! Yes, talking from personal experience...
Lots of GPU-accelerated numerical packages (Score:2)
There are tons of other CUDA accelerated numerical packages besides Matlab -- Mathematica, LabView, plugins / wrappers / libraries for Python, R, IDL. Some of these are linked from NVIDIA's website
http://www.nvidia.com/object/numerical-packages.html [nvidia.com]
Others from
http://www.nvidia.com/object/data_mining_analytics_database.html [nvidia.com]
ANSYS (Score:1)