Researcher Shows How GPUs Make Terrific Network Monitors 67

Posted by samzenpus on Friday November 22, 2013 @01:28AM from the keeping-an-eye-on-things dept.

alphadogg writes "A network researcher at the U.S. Department of Energy's Fermi National Accelerator Laboratory has found a potential new use for graphics processing units — capturing data about network traffic in real time. GPU-based network monitors could be uniquely qualified to keep pace with all the traffic flowing through networks running at 10Gbps or more, said Fermilab's Wenji Wu. Wenji presented his work as part of a poster series of new research at the SC 2013 supercomputing conference this week in Denver."

Researcher Shows How GPUs Make Terrific Network Monitors

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 67 Comments Log In/Create an Account

Comments Filter:

That's it? (Score:5, Informative)

by drcheap ( 1897540 ) writes: on Friday November 22, 2013 @01:38AM (#45488261) Journal

So in violation of /. convention, I went ahead and read TFA in hopes that were would actually be something more than "we solved yet another parallel computing problem with GPUs." Nope, nothing. Not even some useless eye candy of a graph showing two columns of before/after processing times.
And the article just *had* to be split into two pages because it would have killed them to include that tiny boilerplate footer on page one. What a fail...at least it wasn't a blatant slashvertisement!

- Re:That's it? (Score:5, Funny)
  
  by timeOday ( 582209 ) writes: on Friday November 22, 2013 @01:50AM (#45488305)
  
  They said it achieves a speedup of 17x, here is the graph:
  CPU: X GPU: XXXXXXXXXXXXXXXXX
  
  - - Re: (Score:1)
      
      by Anonymous Coward writes:
      
      I don't think you understand how hyperthreading works.
      It doesn't let it do twice as many tasks simultaniously, it just lets it work on another task when it would otherwise be idle while waiting for some other hardware to do something.
      - Re: (Score:1)
        
        by Anonymous Coward writes:
        
        lol wut? name *one* (real-world) workload where you get double the execution throughput.
        
        Re: (Score:2)
        
        by KingMotley ( 944240 ) writes:
        
        That isn't a real world workload. I typically see 10-30% increase from hyperthreading, which isn't nothing, but it's not a 100% speed bump either.
    - Re: (Score:1)
      
      by VTBlue ( 600055 ) writes:
      
      A speedup of 17x over a single core. Using an 8 core Xeon (16 threads with hyperthreading) would give a similar speedup.
      From the two-page report: "When compared to a 6-core CPU (m-cpu), the speedup ratios range from 1.54 to 3.20."
      so yeah, the 17x is misleading because what network monitoring load would run on a single core?
      - Re:That's it? (Score:5, Informative)
        
        by TeXMaster ( 593524 ) writes: on Friday November 22, 2013 @05:53AM (#45489127)
        
        Yeah but with this kind of applications the real bottleneck is the fact that the discrete GPU needs to access data through the high-latency, low-bandwidth PCIe bus. For this kind of application, an IGP, even with the lower core counts, is often a much better solution, unless you manage to fully cover the host-device-host transfers with computations.
        I'd be really curious to see this thing done in OpenCL on a recent AMD APU, exploting all the CPU cores and the IGP cores concurrently.
        
      - Re: (Score:3)
        
        by account_deleted ( 4530225 ) writes:
        
        Comment removed based on user account deletion
    - Re: That's it? (Score:2)
      
      by loufoque ( 1400831 ) writes:
      
      A Xeon also has SIMD, which gives a speed-up of 8.
- Re: (Score:2)
  
  by edibobb ( 113989 ) writes:
  
  In defense of the researcher (and not the author of the article), it was just a poster presented at a conference, not a published paper.
- Re:That's it? (Score:5, Informative)
  
  by NothingMore ( 943591 ) writes: on Friday November 22, 2013 @04:50AM (#45488867)
  
  I saw this poster at the conference and I was not impressed and in fact it was one of the weaker posters that I saw at the conference (it was light on details and had some of the information on the poster when talking about GPU's in general was not entirely accurate). It is really a poster that should not have been at SC at all. While it is interesting in the network sense the amount of data they can process is not anywhere close to the amount that is actually flowing through these large scale machines (up to 10 GB/sec per node) and there was no information about scaling this data collection (which would be needed at extreme scales) to obtain meaningful information to allow for tuning of network performance.
  This poster should have been at a networking conference where the results would have been much more interesting to the crowd attending. Also of note, IIRC the author was using a traditional GPU programming model for computation that is not efficient for this style of computation. The speedup numbers would have been greatly improved by using a RPC style model of programming for the GPU (persistent kernel with tasking from pinned pages). However this is not something I totally fault the author for not using since it is a rather obscure programming technique for GPU's at this time.
  
  - Re: (Score:2)
    
    by gentryx ( 759438 ) * writes:
    
    However this is not something I totally fault the author for not using since it is a rather obscure programming technique for GPU's at this time.
    Good point. I guess this will change once Kepler GPUs are widely adopted and CUDA 6.0 is published: With Kepler you can spawn Kernels from within the GPU and unified virtual addressing will make it easier to push complex data structures into the GPU (according to the poster these appears to be some preprocessing happening on the CPU).
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    As a PhD in computer networking, I'll tell you, it would have been easy to publish on SC, than other reputable networking conferences. This article to me is non news.
- Re: (Score:2)
  
  by solidraven ( 1633185 ) writes:
  
  He also has no clue about ASICs, lets take a look at this line: "Nor do they offer the ability to split processing duties into parallel tasks,"
  If there is one thing you can do on an ASIC, it's parallelisation. Application specific cores are small, very small, standard multi-project wafer run technologies have a good number of metal layers so routing isn't too problematic, etc. So you can actually fit a whole lot of cores on a small silicon area in a modern technology. The main issue is the cost of the hard
The Poster PDF (Score:1)

by Ayrezyle ( 112589 ) writes:

http://sc13.supercomputing.org/sites/default/files/PostersArchive/post161.html [supercomputing.org]
not new (Score:3)

by postmortem ( 906676 ) writes: on Friday November 22, 2013 @02:07AM (#45488357) Journal

NSA already does this, how else you think they process all that data?

- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  millions of gnomes!
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    Field programmable gnome arrays are going to be the next big thing in computing.
    - Re: (Score:3)
      
      by JustOK ( 667959 ) writes:
      
      Fuck that. If I have to go into a field to program the gnome arrays, I'm not doing it.
      - Re: (Score:2)
        
        by VortexCortex ( 1117377 ) writes:
        
        Thus the old system was abandoned, and Gnome3 was born.
  - Re: (Score:2)
    
    by KingMotley ( 944240 ) writes:
    
    Distributed cluster of lawn gnomes.
- Re: (Score:2)
  
  by jovius ( 974690 ) writes:
  
  With a pinch of salt?
- Re: (Score:2)
  
  by bob_super ( 3391281 ) writes:
  
  They don't "process" it.
  "Processing" is bad, it's like "collecting".
  They don't collect or process, they just "store" it. Nothing to worry about citizen. Move along, now.
wishful thinking (Score:1, Insightful)

by Anonymous Coward writes:

"Compared to a single core CPU-based network monitor, the GPU-based system was able to speed performance by as much as 17 times"
Shouldn't "researchers" know better how to execute benchmarks in such a way that a comparison between a CPU and a GPU actually makes sense and is not misleading? Why didn't they compare it to a 12 or 16 core CPU to show that it is only marginally better and requires programming in OpenCL or CUDA? Why didn't they take a 2P system and show that it is actually performing worse? In tha
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  Shouldn't "researchers" know better how to execute benchmarks in such a way that a comparison between a CPU and a GPU actually makes sense and is not misleading?
  If the goal is hard science, then that would make sense. But when the goal is to wow the press, grab attention, and whore in the media, then no... that would be the opposite of what you'd want.
- Re: (Score:2)
  
  by Guy Harris ( 3803 ) writes:
  
  Why didn't they compare it to a 12 or 16 core CPU to show that it is only marginally better and requires programming in OpenCL or CUDA?
  "Compared to a six core CPU, the speed up from using a GPU was threefold." If the 12-core CPU is twice as fast, that's 1.5x, and for a 16-core, that's 1.12x.
- Re: wishful thinking (Score:3)
  
  by loufoque ( 1400831 ) writes:
  
  In practice, most people who publish results of a new algorithm ported to GPU do not have a version well-optimized for CPU, or aren't that good at optimization in the first place. I've had several cases where I could make the CPU version faster than their GPU version, despite them having claimed a x200 speed-up with the GPU.
  If you have a fairly normal algorithm in terms of data access and your speed-up is bigger than 4, you're probably doing it wrong.
- Re: (Score:2)
  
  by gl4ss ( 559668 ) writes:
  
  yeah so you'd be buying a cpu?
  msg me when someone writes an os for it.
  pyramid3d? blast from the past?
Talk URL (Score:2)

by TheSync ( 5291 ) writes:

Here is a URL to a presentation [fnal.gov] on the issue of GPU-Based Network Monitoring.
BTW, with PF_RING and a DMA-enabled NIC driver (PF_RING DNA [ntop.org]), one should have no problems capturing 10 Gbps on a single CPU modern server. I can capture/playback 4.5 Gbps no problem using this with four 10kRPM HDDs - 8 drives should give you 10 Gbps rate capture/playback.
- Re: (Score:2)
  
  by cez ( 539085 ) writes:
  
  I just demo'd Fluke Networks's TruView system that does 10Gb/s stream to disk, 24 TB array of 26 1TB hard drives... was very nice, not cheap though. 2 Xenon 16 Core CPUs if memory servers and a whole crap load of pretty analysis and correlations between Netflow & SNMP data... scary cool with the VOIP module.
- Re: (Score:2)
  
  by Shatrat ( 855151 ) writes:
  
  You can't buy a custom ASIC off the shelf at Fry's, but you can buy a CPU or a GPU. I don't think it's an apples to apples comparison if you throw in custom hardware.
- Re: (Score:2)
  
  by jones_supa ( 887896 ) writes:
  
  Not so fast, buddy-boy. We still have positive efforts like Folding@home which tap the power of GPUs.
Link to the full paper (Score:1)

by mkaushik ( 1431203 ) writes:

http://lss.fnal.gov/archive/2013/conf/fermilab-conf-13-035-cd.pdf [fnal.gov]
They're using M2070 (Fermi) GPUs. Kepler would perform even better, the latest one has > 6GB of memory.
Sorry, but ... (wrong tool for the job) (Score:3)

by Ihlosi ( 895663 ) writes: on Friday November 22, 2013 @05:12AM (#45488937)

... the main task of GPUs is floating point calculations, and I doubt you need many of those when monitoring networks. Wrong tool for the job.
It's like saying that GPUs are "terrific" for Bitcoin mining, until you realize that they require one or more orders of magnitude more power for the same amount of processing than specialized hardware. And network monitoring is probably a common enough task that it's worthwhile to use hardware tailored to this particular job.

- - Re: (Score:3)
    
    by Ihlosi ( 895663 ) writes:
    
    Get an FPGA development system and implement your hardware in the FPGA, then ask a chip manufacturer to turn it into an ASIC. Expect to pay bucketloads of money on the way, though. It's only feasible if either costs are not an issue or you expect the resulting device to be mass-proced (six or better yet seven digit numbers manufactured per yeat).
In Soviet Russia, your TV watches YOU! (Score:2)

by Thor Ablestar ( 321949 ) writes:

As I understand, there are at least 2 purposes for monitoring the network: debugging and spying. I believe that due debugging is already built-in. But spying is a concern, especially since the Russian authorities have required the ISPs to preserve ALL data traffic in their network for 12 hours for further investigation. What about NSA?
Breaking news (Score:2)

by viperidaenz ( 2515578 ) writes:

massively parallel system is suited to massively parallel tasks.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

That's it? (Score:5, Informative)

Re:That's it? (Score:5, Funny)

Re: (Score:1)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

Re:That's it? (Score:5, Informative)

Re: (Score:3)

Re: That's it? (Score:2)

Re: (Score:2)

Re:That's it? (Score:5, Informative)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

The Poster PDF (Score:1)

not new (Score:3)

Re: (Score:1)

Re: (Score:1)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

wishful thinking (Score:1, Insightful)

Re: (Score:1)

Re: (Score:2)

Re: wishful thinking (Score:3)

Re: (Score:2)

Talk URL (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Link to the full paper (Score:1)

Sorry, but ... (wrong tool for the job) (Score:3)

Re: (Score:3)

In Soviet Russia, your TV watches YOU! (Score:2)

Breaking news (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals