Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
Power Linux

Could New Linux Code Cut Data Center Energy Use By 30%? (datacenterdynamics.com) 65

Two computer scientists at the University of Waterloo in Canada believe changing 30 lines of code in Linux "could cut energy use at some data centers by up to 30 percent," according to the site Data Centre Dynamics.

It's the code that processes packets of network traffic, and Linux "is the most widely used OS for data center servers," according to the article: The team tested their solution's effectiveness and submitted it to Linux for consideration, and the code was published this month as part of Linux's newest kernel, release version 6.13. "All these big companies — Amazon, Google, Meta — use Linux in some capacity, but they're very picky about how they decide to use it," said Martin Karsten [professor of Computer Science in the Waterloo's Math Faculty]. "If they choose to 'switch on' our method in their data centers, it could save gigawatt hours of energy worldwide. Almost every single service request that happens on the Internet could be positively affected by this."

The University of Waterloo is building a green computer server room as part of its new mathematics building, and Karsten believes sustainability research must be a priority for computer scientists. "We all have a part to play in building a greener future," he said. The Linux Foundation, which oversees the development of the Linux OS, is a founder member of the Green Software Foundation, an organization set up to look at ways of developing "green software" — code that reduces energy consumption.

Karsten "teamed up with Joe Damato, distinguished engineer at Fastly" to develop the 30 lines of code, according to an announcement from the university. "The Linux kernel code addition developed by Karsten and Damato was based on research published in ACM SIGMETRICS Performance Evaluation Review" (by Karsten and grad student Peter Cai).

Their paper "reviews the performance characteristics of network stack processing for communication-heavy server applications," devising an "indirect methodology" to "identify and quantify the direct and indirect costs of asynchronous hardware interrupt requests (IRQ) as a major source of overhead...

"Based on these findings, a small modification of a vanilla Linux system is devised that improves the efficiency and performance of traditional kernel-based networking significantly, resulting in up to 45% increased throughput..."
This discussion has been archived. No new comments can be posted.

Could New Linux Code Cut Data Center Energy Use By 30%?

Comments Filter:
  • by ozduo ( 2043408 ) on Saturday January 25, 2025 @07:51PM (#65118583)
    We will have world peace, eliminate hunger and disease, and all geeks would look handsome
    • More details (Score:5, Informative)

      by Kernel Kurtz ( 182424 ) on Saturday January 25, 2025 @07:56PM (#65118595)
      • by Anonymous Coward

        THANK YOU for some actual fucking technical details, sorely missing from the posting.

    • by 2TecTom ( 311314 )

      We will have world peace, eliminate hunger and disease, and all geeks would look handsome

      if we lived in Linux philosophy world, we would have all those and more

      • by haruchai ( 17472 )

        "and all geeks would look handsome"
        Linus Torvalds is perhaps the most famous legit geek ( or nerd?) of the past several decades and the most he could hope for in Hollywood is to be an understudy for the Penguin, which seems fitting

    • by RightwingNutjob ( 1302813 ) on Saturday January 25, 2025 @11:12PM (#65118863)

      Also, "sudo make me a sandwich" would actually work.

    • But Linux has certainly made me more attractive$ to the ladies.

      • It's not you that has become more attractive, just your pocket book. Still, that's pretty much the number one criteria for a typical women anyway, so I'm sure you are doing better then average. I hope you are :)

  • Do we have anyone here who could review their proposed code changes and say if it's an actual optimisation, that is a piece of code that increases performance without compromising on functionality, or if they are trying to corrupt the code to make it greener, slower or otherwise inferior to further their green agenda?

    • by SafeMode ( 11547 ) on Saturday January 25, 2025 @08:27PM (#65118659) Homepage

      there are two extremes used to deal with application level network traffic use. the normal way which can interrupt applications during high load and lead to poor performance or a busy polling mode which forces the app to use 100% cpu to ensure it doesn't get interrupted.

      this patch lets an app do both when it makes sense based on traffic load it's handling. idle/ low traffic is 'normal' mode and in high traffic it behaves like the busy polling mode and this happens dynamically.

      sounds like a good thing. the advertised savings is in relation to how often the existing static configs are not needed.

      at least that's what i got from the article

      • by LostMyBeaver ( 1226054 ) on Sunday January 26, 2025 @01:20AM (#65119001)
        In a high traffic environment, this change has no effect. The cluster I am on has a pretty stable 60Gb/s plus burst to 200Gb/s per node. Polled mode is the only sensible option in normal circumstances.

        Medium traffic environments which often idle but burst to a few hundred gigabits should use select or poll from the app when idle. Then switch to polled when traffic is bursting.

        Low traffic environments should just select/poll

        In circumstances where skilled engineers are involved, we would use virtual NICs where each network thread would have its own NIC and use blocking reads which would employ memory mapped I/O and bypass the kernel completely.

        Changes like this pull request are always welcome because it helps where unskilled workers are operating the data centers. And this is very common as computer science education is dead. But, I would hope this change would be largely ineffective at Google.
        • by ndykman ( 659315 )

          The underlying hypervisor environment for big cloud providers wouldn't be affected, no question, but they are having to do some pretty complex stuff.

          But, for the servers that they host, a lot of them are stock boxes and making a tweak like this is a win there.

          The big wins (and the paper this work is based on discusses this) is user-space networking stacks (as you noted, get the kernel out of the way), but that's really a big change.

               

    • by martin-boundary ( 547041 ) on Saturday January 25, 2025 @08:29PM (#65118661)
      See the comment by Kernel Kurtz here [slashdot.org].

      A quick squizz through the abstract suggests the optimization occurs by deferring IRQs. In other words, the kernel in that case is prevented from doing timely maintenance. Assuming the implementation is safe (runaway buggy processes are still interrupted eventually), the qualitative behaviour of deferring maintenance interrupts would likely be that the system feels more sluggish and unresponsive, with catchup periods when a lot of kernel stuff needs to be performed at once, and more timeout issues.

    • by ndykman ( 659315 ) on Saturday January 25, 2025 @09:57PM (#65118781)

      It is a simple change and it does work better. Basically, it keeps the underlying IRQ for the network request masked and advises the NIC driver to not to use further hardware interrupts. It then does kernel polling until there aren't any events of interest and then unmasks the IRQ and sleeps.

      Now, this is tricky and you can't do this in general, but it does work well in this specific case and seems to be solid enough to be accepted into the mainline kernel.

      • by davecb ( 6526 )
        That's a lot like character device drivers in CP/M, which would fill/clear their queues before returning. The queues were short.
  • In small, low power Raspberry Pi systems? If so, this needs to be publicized to that community as well.
    • I don't think so. The efficiency they are targeting involves processing network packets in large batches. To do that you need a network card that buffers large numbers of packets. You don't find them on a pi.

      When you do have a card that buffers lots of packets you get a trade off. You get efficiency by waiting for a long while until a lot of packets arrive and processing them in one batch. But waiting for a lot of packets to arrive could take a long time when there is little traffic, which can create big latencies. Your weapons in this fight are IRQ's, polling, packet counts and time outs. You use time outs and packet counts to intelligently choose whether to use IRQ's, and the polling frequency. This patch introduces a new time out.

      Finally the headline 30% is under ideal test conditions. Nobody is likely to see anything like that in real world scenarios, to the point that in any application that isn't a network appliance I doubt the speed difference will be noticeable.

    • by SirSlud ( 67381 )

      The reason this is touted as good for data centers is because it's a very specific scalpel for the kinds of workloads datacenters and network hubs see. Tons of traffic. Also, I'd like to add, saving 30% of the power going to a Raspberry Pi is like .. uh, who cares? This isn't about making things faster, it's making computers use less electricity. I have yet to fret about the pennies my Pis are costing me per year.

  • by devslash0 ( 4203435 ) on Saturday January 25, 2025 @08:13PM (#65118635)

    If performance stays the same afterwards, those companies will implement the energy saving patches. Then, they will use any energy saved to perform even more computing operations and earn more money. What? Did you think they'd sit on all that unused capacity and do nothing about it? In the corporate world everything needs to be milked to its maximum potential.

    • by godrik ( 1287354 ) on Saturday January 25, 2025 @10:11PM (#65118799)

      Energy reduction argument are always a bit weird. Because, as you indicate, there is some form of induced demand.
      Though, if you do decrease cost by 30%, maybe you don't add machines for a while because of the efficiency gains. So the gain come from not adding in the near future. Which is not quite the same; but probably still worthwhile.

      • So what I'm reading is I can save on my hardware budget for another year and take that savings as a management bonus? Asking for a friend.

      • by tlhIngan ( 30335 )

        Well, reducing energy usage is always good. Just don't tell the right about that, because somehow using more power is a good thing.

        But for the corporate balance sheet, if it saves 30% of energy usage, that's lowered electricity bills and lowered air conditioning bills, which could mean up to 60% in total datacenter energy savings (usually it's close to 1:1). And what corporation doesn't like to save money?

        That's the immediate short term benefit.

        The longer term benefit is it may mean the servers have an incr

    • by Entrope ( 68843 )

      So what is wrong with being able to do 43% more work at the same energy cost?

    • Yes, someone will do bad things, but a lot of poor people will benefit from it too (like small companies or government networks running in old hardware like here in Brazil).
    • by TeknoHog ( 164938 ) on Sunday January 26, 2025 @07:03PM (#65120485) Homepage Journal
      "In 1865, the English economist William Stanley Jevons observed that technological improvements that increased the efficiency of coal use led to the increased consumption of coal in a wide range of industries. He argued that, contrary to common intuition, technological progress could not be relied upon to reduce fuel consumption." https://en.wikipedia.org/wiki/... [wikipedia.org]
  • Hopefully not an idiotic question, but aside from software firewalls, how much do most servers use the kernel for networking over hardware specific networking. Even most firewalls I have worked with do most of the network operations in silicon.

    • Yeah I'm wondering if it's just a rewrite of some small part of iptables... but TFS appears to basically contain the entire "article", so there isn't any real information there.

    • by kamakazi ( 74641 ) on Saturday January 25, 2025 @08:48PM (#65118681)

      I guess I understood this differently, this is not the layer 1-3 stuff that happens in hardware, this is the application layer stuff where userland is getting the data from the network stack, and it sounds at first glance like when an interrupt occurs and the fetching of data starts it changes to a mode where it just keeps fetching data until the network stack runs out of data, at which point it reverts to interrupt driven mode.
      Sort of the same philosophy of optimization as keeping http connections alive and reusing them rather than tearing them down and building new ones with every request.
      Of course I may have misunderstood it in my superficial glance at the linked patch description, I am not a kernel hacker.
      It sounds to me like optimization by reducing redundant overhead, which is a great idea as long as the overhead you are reducing isn't necessary to prevent some other issue, like starving other processes of resources, and it sounds like this patch has implemented time outs to take care of that.

  • by bloodhawk ( 813939 ) on Saturday January 25, 2025 @08:52PM (#65118699)
    30% is not a realistic saving here, it will be a fraction of that, yes under ideal conditions and perfect network traffic scenarios (which almost never occur and certainly not in the bulk of datacenters) you could save up to 30% for the latency trade off.
    • by jsonn ( 792303 )
      I call BS on the claim that it is even that noticeable. Any semi-intelligent network driver already has interrupt moderation.
  • by SeaFox ( 739806 ) on Saturday January 25, 2025 @08:58PM (#65118707)

    This sounds like the same stupid spiel the petroleum industry runs: The answer to [environmental problem] is cutbacks in everything but the thing that really would have the biggest impact.

    • by haruchai ( 17472 )

      Cut the AI if they're so concerned with power

      Haven't you heard AI will DOUBLE our lifespans in a decade or 2?
      Do you really expect us to risk not being able to work another 90 years at whatever McJobs are left when the robots take over?
      All for the purpose of saving a few PWh?

      Wait....that doesn't sound right...never mind

      Anthropic CEO thinks AI could double human lifespan within a decade [pymnts.com]

    • by Anonymous Coward

      If you get 30% more networking performance for free, do you say no just because other applications are still energy hungry? When I looked into the comments, I was SURE somebody would come with "what about ... AI?!" even though the topic here is completely unrelated.

    • The industries kept telling us that copyright infringement is a crime, and that cryptocurrency mining is bad for the environment. Now their entire business is based on scraping other people's creations into data centers that put crypto farms to shame.
  • This seems like a great idea! Make all computing more energy efficient. New code analyzers (based on AI?) could be used to evaluate code. With just an update perhaps a substantial fraction of current energy use could be eliminated. I'm surprised this is not already a feature in code optimization. Compilers have speed optimization, code size optimization, now we need energy optimization! This may already exist in embedded systems compilers.
  • by Gabest ( 852807 )

    But they will be able to run 30% more hardware.

  • The answer is : No.

  • Submit a Pull Request and see if it's accepted or not. Don't grandstand. Thousands of commits per year go into Linux without fanfare.

  • Using real code instead of bloated slow interpreted crap like python could cut data center power usage by 50%
  • There were rumors going around that somebody had invented a special carburetor that would make your car get 200 mpg, but the oil companies bought the guy out and killed the product.

    This thing sounds like a conspiracy theory in the making!

"Being against torture ought to be sort of a multipartisan thing." -- Karl Lehenbauer, as amended by Jeff Daiell, a Libertarian

Working...