Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Math Hardware

Matlab Integrates GPU Support For UberMath Computation 89

An anonymous reader writes "Matlab now comes with GPU native support in the 2010b version. This means loads of Matlab commands can be parallelized onto the GPU without having to re-code things in C++ or Fortran using CUDA. Pretty sweet for the HPC community."
This discussion has been archived. No new comments can be posted.

Matlab Integrates GPU Support For UberMath Computation

Comments Filter:
  • Nice (Score:2, Interesting)

    by Anonymous Coward
    But can't Octave do that already?
    • a quick google search turned up one or two discussion threads where the focus of the debate was whether certain GPU processing libraries were license compatible with Octave. After a few pages of discussions about GPL, BSD, linking libraries and system libraries, Brook and CUDA and something else, my eyes glazed over.

      so, maybe?

  • Matlab is great, but my god the language is cumbersome. More so than R, though less than SAS. Also, it costs money, and I'm cheap. So, I'm wondering if this could be worked into R somehow. Since R seems to execute code in a single tread sort of manner (I say, knowing just enough to be dangerous about these matters), each wee bit of speed is a godsent.
    • Re: (Score:2, Interesting)

      by Anonymous Coward

      http://cran.r-project.org/web/views/HighPerformanceComputing.html

      See "Parallel computing: GPUs"

    • Can you elaborate on how Matlab is cumbersome (or give a link)? I started using Matlab after using C for a few years and "cumbersome" is the last word I would use to describe Matlab. I haven't really used any of the other scientific computing packages, though.
      • I think it depends on what you want to do. Matlab is great for reading and working with log files. It's also great if your tasks can be vectorized; your code will be fast and require very few statements.

        However, if your project requires iteration, it's going to be slow as hell in Matlab.

        The biggest complaints I have about Matlab (besides the cost) are the way it handles memory management, and the way it handles pointers. I can't tell you the amount of times I've had Matlab tell me there wasn't enough memory

        • by pz ( 113803 )

          I can't tell you the amount of times I've had Matlab tell me there wasn't enough memory available on my 8GB machine, because I ran out of what it had allocated for me.

          Verne, I think you're doing something wrong there. The only time I see that sort of error is when I've done something worthy of a palm-in-face like trying to pre-allocate a 7-D array with 1000 elements per dimension. Yes, you have to be careful how many times copies of large arrays are made, but that's true of any language.

          Also, with the newer versions of Matlab, iterations aren't that slow, at least compared with the older versions from a decade ago. You do, however, need to be very careful about accur

          • by stewbee ( 1019450 ) on Monday May 23, 2011 @01:07PM (#36219842)
            I can't speak for 10 years ago, but Matlab is still slow when using 'for' loops. Just recently I was updating a grid search algorithm that was originally done in VBA. I originally ported it to Matlab using all of the vectorization tricks that were available, but I still needed for loops. For smaller data sets, it was tolerable but when our input data grew to a certain size, it would take over 70 minutes for the computation to complete (as a side note, this is with Matlab 2010b).

            To speed up the computation, I at first just wrote a Java class to be called from Matlab. this showed considerable speed improvement when compared to the Matlab code. I then decided that I could multithread the application in Java for even more through put. In this particular machine, I have 12 cores, so I used 10 threads and reduced the computation from over 70 minutes to less than a minute by using a Java class plus Java's concurrent libraries.

            Now, in general I prefer to code in Matlab, because you can do more with less lines of code, but there are certain times where strictly Matlab is not fast enough. What is nice with Matlab 2010b, ( I don't remember how far this capability goes back), you can seamlessly use Java .jar files and create Java objects in your Matlab code. As an added bonus, Matlab creates 'double' arrays by default for numeric values. This can be passed in directly as an argument to your method without casting types like you might need to when using a .dll file.
            • by pz ( 113803 )

              I've had similar luck staying within Matlab by using their profiler. Although I've used C callouts for some high-performance computations (like implementing a fast 2D histogram), I try and stay within Matlab whenever possible as mostly, not always, but mostly, the time spent optimizing a computation would far, far outweigh the time gained from a faster algorithm. If we know from the get-go that a given algorithm will be run many times, or is performance critical, it might be coded up in Matlab to prove co

          • Shame on Matlab for this, though. Self-growing arrays with better amortized overhead have been around for ages (see any decent C++ vector implementation, for instance).
    • by Rufty ( 37223 )
      As to the cost, try octave [gnu.org] - mostly matlab-ish, but GPL.
  • Old news (Score:2, Insightful)

    by Anonymous Coward

    Note the "R2010b" version number. That means that this capability has been out since the second half of 2010.

    • Re:Old news (Score:4, Interesting)

      by guruevi ( 827432 ) on Monday May 23, 2011 @12:04PM (#36219088)

      I think they meant 2011b which is not out yet (it's in beta). GPUmat as well as nVIDIA had toolboxes for MATLAB for a while now (although the CUDA toolboxes require manual code edits and compiling to get it to work) but there is only limited function support (eg. FFT on large arrays works wonders on CUDA) and even those had limited support (only single floating point precision for example). There is also the commercial Accelereyes with Jacket.

    • by tomz16 ( 992375 )

      2010b did not include GPU array indexing support (among other things), making it fairly worthless for anything moderately complex.
      2011a DOES do indexing on GPU arrays. It works very well in my experience so far.

  • by Framboise ( 521772 ) on Monday May 23, 2011 @12:07PM (#36219114)

    There is competition from Jacket:
    http://www.accelereyes.com/products/jacket [accelereyes.com]
    This product is more expensive but more effective than Matlab.
    I tried the free trial and found it much more effective than Matlab.
    Alas the cost is too high to justify Jacket in my case, I would rather
    buy more hardware instead.

    • by Anonymous Coward

      Jacket costs as much as a toolbox from Matlab (which is required to use their GPU stuff). I'm a Jacket user and am more than pleased with both performance and support. Jacket also supports more functions and is faster.

  • by Anonymous Coward

    who actually uses MATLAB for real HPC?

    • by Anonymous Coward

      who actually uses MATLAB for real HPC?

      Lots of people. I know for a fact that NASA's Columbia supercomuter has Matlab licenses. Moreover, for certain engineering applications, Matlab is the de-facto standard (like control theory and certain areas in signal processing). Sure, you could write every solver, optimizer, and toolbox in standard C, C++, OpenMPI (a lot of control is just numerical optimization), but it would mean a lot of coding from ground up. Alternatively, you can get a bunch of specialized libraries, but then the administrator shoul

    • by guruevi ( 827432 )

      People that can't code for shit. It's pretty popular in the scientific community simply because it combines the simplicity (and noobicity of the coders) of PHP, Python or Ruby with high-level mathematical constructs.

      The thing is that for real coders it's actually harder because you're missing stuff like decent, inline evaluation of variables, loops and if/then/else constructs, evaluation of data types is hard to do, function overloading, regexp, greater-than-or-equal, and even the very basic of text evaluat

      • Seconded. I still remember some guy that ported his MATLAB finite element code to c++. The solving time for his problems went from 24 hours to 8 minutes -- 6 of which were the post-processing/display of the results... That was towards the end of his thesis, so basically he must have wasted countless months of productive work because of that.

        MATLAB is never the right tool -- unless you are really incompetent, so you need the hand-holding, and really obtuse, so you can't handle the small differences with Octa

    • The common process is for one applied mathematician to write the algorithm in Matlab, then 15 people to convert it to optimized C++/CUDA.

      Surely you can see the point in making Matlab faster, or for automated generation tools.

  • At first glance, I thought the subject said "Matlab integrates FPU support...". I was like, "Damn! I can break out the 486 DX [wikipedia.org] again!".
  • Have not used the software other then a few times for an assignment but if I already did not know that it is not a particularly nice piece of software then i would be surprised that it took them this long to do the obvious.

  • For those interested in Python libraries there is PyCuda [tician.de] and gnumpy [toronto.edu] I have not used either - I'm still learning how to use parallel python [parallelpython.com]
  • by melonakos ( 948203 ) on Monday May 23, 2011 @12:58PM (#36219738)
    I'm CEO of AccelerEyes and have been submitting Slashdot articles referencing updates about using GPUs with MATLAB for several years now. It's great to see it finally getting through, albeit via a reference to the "fake" GPU support which the MathWorks threw into PCT in an attempt to curtail the great success we continue to have with Jacket.

    For a full explanation of why I say "fake", read, http://www.accelereyes.com/products/compare [accelereyes.com]

    For a brief explanation of why I say "fake" GPU support consider the question, what does supporting GPUs mean? If you can run an FFT are you content? Or do you want to use INV, SVD, EIG, RAND, and the list goes on and on. Jacket has 10X the functionality of PCT-GPU.

    Why else is the PCT-GPU implementation weak? Well, it is so poorly constructed (shoehorned into their legacy Java system), that it is rarely more beneficial to use the GPU than the CPU with the PCT-GPU implementation. It takes 600 cycles to load-then-store global memory on the GPU (required in each kernel call). The main innovation that led us to build Jacket is the ability to generate as few kernels as possible to eliminate as many 600 cycle roundtrip transfers as possible. For example, Jacket's runtime system may only launch one kernel for every 20 lines of code. PCT-GPU on the other hand is limited to launching a GPU kernel for every basic function call.

    Jacket also has a GFOR loop which is the only parallel FOR-loop for GPUs, http://wiki.accelereyes.com/wiki/index.php/GFOR_Usage [accelereyes.com]

    I'm not aware of any MATLAB programmer that has had a good experience with PCT-GPU.

    Finally, because I'm so thrilled at this getting slashdotted (despite it being a link promoting PCT-GPU), I'm be happy to offer free 3 month Jacket subscriptions to anyone that emails me in the next 48 hours with the word "slashdot" in the subject, at john.melonakos@accelereyes.com

    Cheers!

    PS: Roblimo, if we can get some blurb love in your summary on the main slashdot.org page, it would really mean a ton to all our guys that have worked on this project for the last 4 years!
    • great! (Score:5, Insightful)

      by t2t10 ( 1909766 ) on Monday May 23, 2011 @01:48PM (#36220412)

      Now we have a vendor of an overpriced add-on battling it out with the vendor of the mother of all overpriced and badly designed pieces of scientific software. As someone who actually uses numerical scripting languages, let me tell you: I'm not impressed.

      My guess is that within a year or two, there will be better open-source alternatives to Jacket, just like there are better open source alternatives to MATLAB alrady. I'll just wait, thank you very much.

      • within a year or two

        That's just it. We've been in the world of parallelization for years now, but relatively few open source developers have innovated or even ported for performance. Why? Because such performance gain is a luxury. You pay more for luxuries, that's just a fact of life.

        Nevertheless, your timeframe sounds decent--but that's only as there become more varied and more open tools to support parallel implementations.

        • by t2t10 ( 1909766 )

          That's just it. We've been in the world of parallelization for years now, but relatively few open source developers have innovated or even ported for performance.

          A lot of parallel processing that's coming out today commercially was pioneered by open source projects years ago. OMP and distributed computing are widely used.

          On GPU computing, the speedups are barely worth it today unless you really hand-optimize your application for parallelization; you're not going to get a lot of speedups with Jacket on real

          • Hand optimization is really tough. Try for instance to beat Jacket's CONV2 (yes, I'm talking about straightup convolutions) by hand. If you can do that, I'll will expend all my energies to drop whatever else you are doing and to join us at AccelerEyes :)

            Jacket is meant to be a luxury as was mentioned elsewhere... providing a faster, better approach to what you could try to reinvent by hand if you had infinite energy.

            The Canny Edge benchmark is a full blown application (of which Canny Edge detection
            • by t2t10 ( 1909766 )

              Hand optimization is really tough. Try for instance to beat Jacket's CONV2 (yes, I'm talking about straightup convolutions) by hand.

              CONV2 is one of the most trivial cases for GPU programming; if you didn't screw up badly, I can't beat your code, but you couldn't beat mine either.

              Jacket is meant to be a luxury as was mentioned elsewhere... providing a faster, better approach to what you could try to reinvent by hand if you had infinite energy.

              Most people shouldn't be using GPU programming at all because it i

              • I don't agree with either statements:

                1) Expert convolutions on the GPU (that work well for both separable/non-separable cases, arbitrary input matrix sizes, and arbitrary kernel sizes) are extremely difficult. I don't think you can be our implementation. If you can, I will try to entice you away from other pursuits in life.

                2) CONV2 (i.e. convolutions) are very useful in many applications and often make more sense that pursuing some sort of other arithmetic expression. I do agree with your statement
                • by t2t10 ( 1909766 )

                  CONV2 (i.e. convolutions) are very useful in many applications

                  Your error is with the "i.e." part. Convolutions are very useful, but CONV2 is almost never the right function to call. Most convolutions are separable. Those that aren't can usually be made separable. If you're really stuck with a non-separable large 2D convolution, you can use 2D FFT in some cases. And if you have a non-separable small 2D convolution, there's usually some other known trick you can use to speed it up. Anybody who has any b

                  • CONV and FILTER2 both call CONV2 in MATLAB
                    • by t2t10 ( 1909766 )

                      So? For you to present benchmarks of a CONV2 that detects separability against the built-in CONV2 that explicitly does not use separability is dishonest, because much of the speedup you measure has nothing to do with GPU computing.

      • Isn't that always the case? A slight demand for something ( easy gpu programming at the matlab/octave level ), a company starts up to offer that service. Over time, if there is enough general demand, people start putting code snipits into the FS community, and it becomes a project, and over a 2 year period it becomes usable. But in the intervening period a company is trying to make money on this recent need that will become almost common place in a few years.

        As far as matlab vs octave- There are still som

        • by t2t10 ( 1909766 )

          There is no "real need" for GPU computing yet because for most people, it's not cost effective: the speedups are modest at best, and you only get them if you know what you're doing (in which case you wouldn't be using these tools). Get yourself a multicore machine and use OMP and your code is likely going to run faster with less effort.

          Open source developers will tackle GPU computing in scripting languages when it makes sense to do so. That's not because they need commercial leadership or leaks of "code s

      • To address the topic of open source:

        People have been saying that open source would swamp Jacket since we launched in 2007. The reality is that it is too stinking hard to build good stuff open source (i.e. where the developers aren't paid), when there isn't an enormous user community to fuel the effort in intangible benefits back to the contributors. Otherwise, we'd open source Jacket and try to live off the service contracts like every other open source project.

        So we end up pricing the software inli
        • by t2t10 ( 1909766 )

          The reality is that it is too stinking hard to build good stuff open source (i.e. where the developers aren't paid)

          Most open source developers I know are paid and the stuff they produce has wiped away pretty much anything commercial and proprietary in any area where they have developed it.

          The reality is that GPU computing barely makes sense today, and it certainly didn't make sense in 2007. And it may just be another fad, taken over again by general purpose CPUs, just like the last few times.

          While GPU comp

      • My guess is that within a year or two, there will be better open-source alternatives to Jacket, just like there are better open source alternatives to MATLAB alrady. I'll just wait, thank you very much.

        I don't dispute that there are alternatives to Matlab, but "better" is still premature in my opinion. Over the past year I had an interest in removing my work's dependence on proprietary software, so I have researched the Matlab alternatives, and I have even been using Python for some of my work (instrument

        • by t2t10 ( 1909766 )

          I would say that open source has a way to go before reaching Matlab's level

          The first thing you should do is stop thinking of it as "levels". Matlab has a few packages that you can't easily get for Python, and it has Simulink. There are many other areas where Matlab isn't even remotely close to Python's level. The two are, as the technical term goes, "incomparable".

          One of the usual problems with open source which is true here is fragmentation

          Well, closed source is even more fragmented! There's Matlab, Ma

    • In your benchmarks, you list "1.26 hours" for Canny edge detection on a 4 Mpixel image in Matlab without GPU computing, and you miraculously speed that up to 8 seconds using your GPU tools:

      http://www.accelereyes.com/products/benchmarks [accelereyes.com]

      On my three year old desktop, using just 1 CPU from a Core 2 Duo, I can do Canny edge detection on a 4 Mpixel PGM image in about 1.7 seconds with straightforward C code (no pointer tricks), including I/O, parsing the PGM, and god knows what else. It's about the same in Pytho

      • Please see my explanation to your other comment on this above. Thanks for pointing out that I need to get our marketing guys to post more information to avoid this confusion. Also, we ship a dozen example with Jacket that you can run to get code and back-to-back comparisons. Hope that helps.
      • Ah, and I should add that for the Python community there is libJacket which will go to v1.0 on Jun 1st. If you want to get early beta access to our Python stuff, email me (email address in my big post above).
  • The article talks about R2010b, which isn't out yet.  R2010a (which *is* out), supports parallel processing pretty well (I use it constantly), but not exactly "natively" -  you have to pay extra for an option called the "Parallel Computing Toolbox" which also gives you sweet stuff like multicore, HPC and so on.
  • For those with a bent toward Mathematica, GPU computing is baked into Version 8.
    There's more information at http://reference.wolfram.com/mathematica/guide/GPUComputing.html [wolfram.com]

    In the spirit of full disclosure, I'm solely a long-time user, not a Wolfram employee.

  • MAGMA (Score:4, Interesting)

    by l00sr ( 266426 ) on Monday May 23, 2011 @02:05PM (#36220614)

    For those interested in an open-source alternative, there's MAGMA [utk.edu], which provides a bunch of linear algebra routines implemented in CUDA. I haven't tried it myself yet, but it looks promising.

  • Interesting. i wonder if the GPU could be used to perform functions on large sets in a constant database.

    • If you can post some quick code you have in mind, I'll let you know how it might perform using GPUs in MATLAB.
      • I'm not sure by "quick code" your joking but... Below is a snippet of code from the testbed of my app. It doesn't have the output tied to the map of lists in it but it is small enough you can see what is going on. A quick descriptions is: it reads packets off the interface. It orders information so it can be inserted into a mapsource));
        dstPort = to_string(ntohs(tcp->dest));
        pktWin = ntohs(tcp->window);
        int flagArray[] = {ntohs(tcp->ack),ntohs(tcp->fin),ntohs(tcp->psh),ntohs(tcp->res1),nt

      • oops. something went wrong. i don't think it is going to jive with /. the post part of the reply should have been:
        I'm not sure by "quick code" your joking but... Below is a snippet of code from the testbed of my app. It doesn't have the output tied to the map of lists in it but it is small enough you can see what is going on. A quick descriptions is: it reads packets off the interface. It orders information so it can be inserted into a map of lists called connections. I update the map with the packet in

  • This means loads of Matlab commands can be parallelized onto the GPU without having to re-code things in C++ or Fortran using CUDA

    But you will have to re-code soon when a new version of Matlab is released and functions have changed over and over again! Yes, talking from personal experience...

  • There are tons of other CUDA accelerated numerical packages besides Matlab -- Mathematica, LabView, plugins / wrappers / libraries for Python, R, IDL. Some of these are linked from NVIDIA's website
    http://www.nvidia.com/object/numerical-packages.html [nvidia.com]

    Others from
    http://www.nvidia.com/object/data_mining_analytics_database.html [nvidia.com]

  • ANSYS is also going to able to use GPUs for parallel processing. The crappy part is they charge you 1500 bucks for each HPC license for each processor.

Term, holidays, term, holidays, till we leave school, and then work, work, work till we die. -- C.S. Lewis

Working...