Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
AI Earth Power Software Science Technology

Training a Single AI Model Can Emit As Much Carbon As Five Cars In Their Lifetimes 156

In a new paper, researchers at the University of Massachusetts, Amherst, performed a life cycle assessment for training several common large AI models. They found that the process can emit more than 626,000 pounds of carbon dioxide equivalent -- nearly five times the lifetime emissions of the average American car (and that includes manufacture of the car itself). MIT Technology Review reports: The researchers looked at four models in the field that have been responsible for the biggest leaps in performance: the Transformer, ELMo, BERT, and GPT-2. They trained each on a single GPU for up to a day to measure its power draw. They then used the number of training hours listed in the model's original papers to calculate the total energy consumed over the complete training process. That number was converted into pounds of carbon dioxide equivalent based on the average energy mix in the US, which closely matches the energy mix used by Amazon's AWS, the largest cloud services provider.

They found that the computational and environmental costs of training grew proportionally to model size and then exploded when additional tuning steps were used to increase the model's final accuracy. In particular, they found that a tuning process known as neural architecture search, which tries to optimize a model by incrementally tweaking a neural network's design through exhaustive trial and error, had extraordinarily high associated costs for little performance benefit. Without it, the most costly model, BERT, had a carbon footprint of roughly 1,400 pounds of carbon dioxide equivalent, close to a round-trip trans-American flight. What's more, the researchers note that the figures should only be considered as baselines.
Using a model they'd produced in a previous paper as a case study, the researchers "found that the process of building and testing a final paper-worthy model required training 4,789 models over a six-month period," the report states. "Converted to CO2 equivalent, it emitted more than 78,000 pounds and is likely representative of typical work in the field."
This discussion has been archived. No new comments can be posted.

Training a Single AI Model Can Emit As Much Carbon As Five Cars In Their Lifetimes

Comments Filter:
  • by Anonymous Coward

    If it works for boats, it can work for cars too.

  • by jfdavis668 ( 1414919 ) on Thursday June 06, 2019 @10:10PM (#58722986)
    Designing energy efficient cars take a lot of energy, too. But, the benefits come in the long run. If these trained AIs can save power by the work they do, again it will be beneficial in the long run.
    • by ShanghaiBill ( 739463 ) on Thursday June 06, 2019 @10:23PM (#58723024)

      Also, the researchers used a GPU and then extrapolated the energy consumption.

      But modern "Big AI" is not done with GPUs. They use dedicated ASICs such as TPUs [wikipedia.org], which are far more efficient.

      If they failed to use the correct hardware, then I suspect they failed to get many other things correct as well. Techniques such as batch normalization [wikipedia.org] can make a ten-fold difference in energy consumption.

      • Or for that matter a negative emissions vehicle. Then you'd have a negative ratio.

        As for TPU versus GPU, well no. Some people use TPU. But that's not everyone. Nvidia isn't selling GPUs for no reason at all. Moreover some AI problems, like ones that aren't just matrix ops just don't even work on GPUs let along TPUs.

        • by ShanghaiBill ( 739463 ) on Friday June 07, 2019 @01:31AM (#58723446)

          As for TPU versus GPU, well no. Some people use TPU. But that's not everyone. Nvidia isn't selling GPUs for no reason at all.

          Nearly everyone buys GPUs for graphics.

          Some people buy them for training small NNs, but those are not who TFA is talking about. They are talking about "Big AI" that uses an entire datacenter (thousands of servers) for days of training. Organizations doing that are not using GPUs. They are using ASICs specialized for massive amounts of low-precision matrix ops.

          Moreover some AI problems, like ones that aren't just matrix ops just don't even work on GPUs let along TPUs.

          An inference engine running Lisp isn't consuming significant power.

          • by Anonymous Coward

            The largest supercomputer in the world, Summit at Oak Ridge National Lab, is powered by over 27k Nvidia TESLA V100 GPUs.
            Those GPUs aren't being used for graphics.....

            https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/

      • by larryjoe ( 135075 ) on Friday June 07, 2019 @02:48AM (#58723602)

        But modern "Big AI" is not done with GPUs. They use dedicated ASICs such as TPUs [wikipedia.org], which are far more efficient.

        Since Google hasn't yet started selling physical TPUs, the only users are Google employees and Google Cloud users. Based on current Rightscale [forbes.com] numbers, AWS and Azure are each far more popular than Google Cloud. So, it would be very surprising if most training is not currently done with GPUs.

        As for power efficiency, a TPUv3 [nextplatform.com] chip gets around 90 FP16 TOPS for about 200W, while an Nvidia T4 [nvidia.com] gets 65 FP16 TOPS for around 70W. The numbers depend on how you count TOPS and watts and how the raw numbers translate to application-level performance, but it's very clear that TPUs are not obviously "far more efficient."

        • by ShanghaiBill ( 739463 ) on Friday June 07, 2019 @03:55AM (#58723734)

          Tesla has a matrix ASIC.

          Facebook is working on one [eetimes.com].

          Nvidia is also building its own TPU [forbes.com].

          Soon GPUs will go back to being for graphics, and every laptop and smartphone will have a dedicated low power matrix processor.

          • Soon GPUs will go back to being for graphics, and every laptop and smartphone will have a dedicated low power matrix processor.

            No, it won't. That processing can be done in the cloud. Most users don't need to train AI models at all. Maybe eventually, but not soon

            • Tensor processing still needs to be done to run applications after they've been trained. That can be done in CPU but that's orders of magnitude slower. So whether you need a dedicated TPU depends on the needs of the applications. For example, running the lc0 chess engine at a decent level (by computer chess standards) absolutely requires a GPU/TPU, as lc0 needs to analyse many thousands of positions a second.

            • Most users don't need to train AI models at all

              Most people don't need a computer with multiple cores

              Most users don't need more than 640k RAM

              Most people don't need a computer

              Most people don't need a calculator

              ...

              • Most of those things happened much more slowly for most people than they could have, because they didn't need to happen any quicker.

                Users will eventually have that hardware onboard, but not especially soon. It will have to be common enough to become a cheap commodity part before they will slap it into the SoC.

            • Soon GPUs will go back to being for graphics, and every laptop and smartphone will have a dedicated low power matrix processor.

              No, it won't. That processing can be done in the cloud. Most users don't need to train AI models at all. Maybe eventually, but not soon

              Training personalized models in the cloud means shipping all the data needed to train them to the cloud, which raises privacy concerns. I expect to see lots of cases where model training is moved down to the end-user device so the data can be kept there.

      • by serviscope_minor ( 664417 ) on Friday June 07, 2019 @04:39AM (#58723862) Journal

        Also, the researchers used a GPU and then extrapolated the energy consumption.

        Yep.Big training is done on huge clusters. You can pretty easily multiply the power consumption by the number of machines and the time spent.

        But modern "Big AI" is not done with GPUs.

        No, it really is.

        They use dedicated ASICs such as TPUs, which are far more efficient.

        No mostly they use GPUs. You've clearly spent far more time reading articles about the tech than you have, say, running PyTorch. ASICS or dedicated logic blocks are currently used mostly on the inference end rather than the training end. On the training end, people use Nvidia 1080Tis or RTX2080s. Google have some TPU stuff for training, but I've not met anyone outside of google who actually uses it.

        Techniques such as batch normalization can make a ten-fold difference in energy consumption.

        Everyone already uses that pretty much between every layer.

    • Precisely ! The impact of conventional automobiles is always totally negative. But an AI device can come up with solutions that save untold amounts of emissions. For example a new cancer cure discovery by an AI device that simplifies the number of treatments a patient must take could actually save many millions of miles of driving. It is like having a bar in the neighborhood. There is no way to calculate the good the bar does but it is easy to see the harm the bar does.
    • by JaredOfEuropa ( 526365 ) on Friday June 07, 2019 @05:05AM (#58723934) Journal
      Not to mention the fact that once the AI is trained, it can be copied an infinite number of times with pretty much zero additional emissions.
    • Yeah, how many paper worthy models are there? 100? 200 a year? Oh no! we added a whopping 1,000 cars to the road in CO2. Meanwhile the paper worthy models are providing 7 billion people valuable paper-worthy functionality.

      This sort of hand wringing really is some of the worst aspects of modern society. We are paralyzed on innovation because every incremental intermediary step is so heavily criticized. "Electric Cars aren't carbon zero!" "Solar Panels take 5 years to pay themselves off!" "Self Driving c

  • It doesn't mean what most think it means.

    For starters, synonyms for ARTIFICIAL include: fake, imitation, mock, ersatz, faux.

    Machine learning is a milestone. But intelligent, it is not.

    • These days AI just means, "we didn't have enough money to build an expert system, so we trained a mediocre model, and we didn't have enough money to build an ideal data-set so we used real-world data."

      • by rtb61 ( 674572 )

        AI is a trial and error system. So change parameters better or worse than the last run, problem is false paths, which require restarting the process at an earlier stage with a rejected set of parameters. Lots and lots of trial and error. The problem being, the coders are only good at coding and not good a solving the problem assigned to the AI. If they were, they would establish better parameters and carefully monitor the process to reject long term bad outcomes. Also it depends upon how far you break down

        • Bad AI reflects bad human intelligence behind the AI.

          Solution is to make AI to help us design AI.

        • AI is a trial and error system. So change parameters better or worse than the last run, problem is false paths, which require restarting the process at an earlier stage with a rejected set of parameters. Lots and lots of trial and error.

          So is human intelligence. As a parent what you have described sounds exactly like raising a human child.

          Bad AI reflects bad human intelligence behind the AI.

          This doubly so. Just substitute child for AI.

    • by lgw ( 121541 )

      AI means "machine learning" as commonly used. I don't think it means what you think it means.

      • AI means "machine learning" as commonly used. I don't think it means what you think it means.

        Well put! At least the first half. But sorry. ML means machine learning, as commonly used. AI refers to the phrase Artificial Intelligence.

        But your 50% accuracy is notable for this field, considering our iteration count. Oh, you were being serious.

        So then what's next SAI (Strong AI)? Then RSAI (Real Strong AI)? IRMITTRSAI (I Really Mean It This Time - Real Strong AI)?

        Thanks for making my point.

        • by lgw ( 121541 )

          I get what you want "AI" to mean, but that battle is as lost as the battle for "hacker" and "cyber". Yes, there will inevitably be a language treadmill for "real AI", though the terms "machine intelligence" and "self-aware machine" have been long used for that. After all, it would be no more artificial than our own intelligence.

  • this is something that could be done on mars...why are we doing it on earth?
  • Its the only way :|
  • by Brett Buck ( 811747 ) on Thursday June 06, 2019 @10:31PM (#58723046)

    In an associated study on a topic of similar relevance, tt was calculated that 180,011 Angels can dance on the head of a pin.

    • In an associated study on a topic of similar relevance, tt was calculated that 180,011 Angels can dance on the head of a pin.

      Lets see. The angel with the thinnest waist is Mina [wikipedia.org] at 17.9". (Source: https://www.youtube.com/watch?... [youtube.com] )


      Area = 17.9^2 / 4pi = 25.497417658 sq inches

      Assuming they hug each other closely with no extra room, that's
      Total area = 25.497417658 * 180011 = 4589815.650034238
      So you need a pin with a head that has 4589815" of surface area.
      Going back the other way, we end up with a pin head diameter of 201ft 5.5in.

      Cheaper than training an AI model, anyways.

      • by lgw ( 121541 )

        Assuming they hug each other closely with no extra room

        I'd watch that music video.

  • by DavenH ( 1065780 ) on Thursday June 06, 2019 @10:32PM (#58723048)
    Imagine any other enterprise that produced a tool that could be used by thousands of businesses and individual, like say a water filtration plant -- how many tonnes of carbon does building that emit? Probably magnitudes more.

    The great thing about deep learning models is that they just need to be trained on the large corpora once, and after being made public, this computation avoids third parties having to replicate the process.

    I know specifically that the BERT model is used as the focal point of Natural Language Understanding/Processing for many businesses (I don't have the number but since it's the gold standard pretrained model, it's going to be lots).

    • by Barny ( 103770 )

      It goes further than that. How much carbon did the designer of said tool use during their learning? Days spent on BIG computers learning CAD/CAM software, practicing with it, then their colleagues who also helped make that tool. The training time for the fitters and turners who tool up the factory, the supplier, etc.

      This wasn't "how much energy is wasted training an ML system" this was "how much energy does it take to train people who make ML systems".

  • by Anonymous Coward

    Climate models also are run on high performance computing systems, across hundreds or thousands of cores. These models are run for long periods of time I order to simulate climate out to 2100, which is common. If training AI is computationally intensive enough to produce significant carbon pollution, we should also be concerned about the effect of running climate models. Perhaps we should reduce carbon pollution by making climate researchers stop running their alarmist doomsday climate scenarios. If climate

  • like football fields or bees to the hogshead?
  • They're really pulling some guesses out. I mean, if you're specifically using AWS, I guess this is relevant. But some places have greener energy than others.

    which closely matches the energy mix used by Amazon’s AWS

    I'd also like to see the citation for this.

  • by kriston ( 7886 ) on Thursday June 06, 2019 @10:48PM (#58723094) Homepage Journal

    Use AWS us-west-2 and Azure West US 2. These regions are primarily served by the 5.5 GW system of dams along the Columbia River along with a mix of wind and solar.

    Facebook and Apple are to the south in the High Desert, also primarily using hydro (plus wind and solar).

    These machine learning applications don't need to worry about latency but they do need to worry about pollution. The bandwidth that is continuously being expanded to this region is massive, from 400 to 600 Gb per each link.

    • by kriston ( 7886 ) on Thursday June 06, 2019 @10:59PM (#58723120) Homepage Journal

      Sadly, the largest cloud regions are in Northern Virginia (AWS us-east-1, Azure East US 1) and mostly use thermoelectric (most of it coal), some nuclear, wind, and solar.

      Southern Virginia (where Azure East US 2 is) presumably has more nuclear being closer to North Anna NGS than Northern Virginia presumably is to Peach Bottom NGS in the north, somewhat less thermoelectric (most being coal), wind, and solar.

      AWS us-east-2 is in Ohio; it would be interesting to speculate what its presumed power sources are along with other regions.

    • These regions are primarily served by the 5.5 GW system of dams along the Columbia River

      Dams are an obscenity that destroy whole ecosystems, I find it pretty amusing you claim to be for the environment while backing one of the most destructive engines against the environment mankind has come up with.

      The only real solution is nuclear with solar mixed, all other forms of alternative energy kill or remove habitat for far more wildlife than global warming ever will even if you just left everything as it is.

      • by kriston ( 7886 )

        These are not new dams. The damage has already been done. They will not be removed for the indefinite future.

        Exploitation and uprating by adding new turbines, and upgrading older turbines, on existing dams are truly sound environmental options.

        In other words, decrying our massive extant hydroelectric power infrastructure is crying over spilled milk.

    • by kriston ( 7886 )

      Adding another response to myself about Oregon's cloud sector:

      Both AWS and Azure regions are aggressively being populated with the mot advanced GPU instances. In particular, Azure GPU instance types in West US 2 have the more powerful GPUs with faster RDMA interconnect--all with virtually no carbon footprint and virtually negligible non-renewable power consumption.

      Just watch your Azure quotas. Even AWS is rationing their GPU instances there.

  • by Tablizer ( 95088 ) on Thursday June 06, 2019 @10:51PM (#58723098) Journal

    The trained bot software runs on hundreds of thousands of cars typically. Thus, costing the energy of just 5 versus being used on hundreds of thousands seems trivial.

  • by Anonymous Coward

    Better used on modeling than used on Bitcoin transactions which are inefficient by design...

  • by fabioalcor ( 1663783 ) on Thursday June 06, 2019 @11:13PM (#58723138)

    This should be a case against coal and oil.

    • by kriston ( 7886 )

      Seriously, these workloads need to be moved to regions with large renewable capacity. Obvious choices include Oregon (AWS us-west-2, Azure West US 2). I suspect Brazil is largely hydropower thanks to the massive Itaipu Dam, which also happens to be the largest power producer in the world, renewable or non-renewable.

      Harder choices are elsewhere and you can be fooled by regions using those fake "carbon credits" to hide the fact that their energy is thermoelectric (service providers in Northern VA do this, sin

    • Re: (Score:2, Insightful)

      by Anonymous Coward

      This should be a case FOR nuclear (generation 3+ and 4).

  • Today I learned that (Score:2, Informative)

    by Anonymous Coward

    Bitcoin, which consumes far more power than AI model training, is likely an environmental disaster in terms of carbon emissions.

  • by Anonymous Coward

    Literally 100% of electricity here is Hydro.

  • A typical car lasts at least 150,000 miles and gets about 20 mpg average and a gallon of gas emits 20 lbs of CO2. So a typical car roughly emits at least 150,000 lbs of CO2 over its lifetime. Meanwhile a GPU running full bore draws roughly 270W. Southern California Edison emits an average of .3 tons*2000 lbs/ton=600lbs of CO2 per MWh according to their filing. A MWh is 1,000,000 Watt/hours. So, how many hours would that GPU need to run full bore to generate the same CO2 as a typical car during its lifetime?
    • Correction: to the last part: it would take 5,252 (not 750) GPU's running 100% for a week to be equivalent to a typical car's emissions from fuel.
      • Correction: to the last part: it would take 5,252 (not 750) GPU's running 100% for a week t

        I think that might be right. Google says that some of their AI's take 200 years worth of training, check out this one for an example [esportswizard.com]. They're putting a lot of money into this.

  • That's the real news. Driving cars to and from work. 8 hours sitting there browsing the web. 5 minutes doing study work. Multiplied by a small team.
  • Yeah. (Score:4, Insightful)

    by Trogre ( 513942 ) on Friday June 07, 2019 @12:26AM (#58723338) Homepage

    Those clowns who lobby against nuclear and hydro power are the real villains here, and should be held accountable for the millions of tons of CO2 dumped into the atmosphere by unnecessarily burning fossil fuels.

  • by Jeremi ( 14640 ) on Friday June 07, 2019 @01:37AM (#58723468) Homepage

    Yes, the AI people are emitting CO2 as a side effect of getting their work done, just like people in almost every other field of human endeavor. That's what happens when you live in a fossil-fuel-based society. The fix would be to move the AI hardware (along with everything else, eventually) over to renewable energy; something that Google and others are already in the process of doing.

    In the meantime, have these researches looked at Bitcoin mining? If they think training AI is a waste of energy, that's really going to twist their noodle.

    • A real "mountain" would be the equivalent energy consumption of a few medium sized countries to generate the existing supply of crypto currency that benefits mostly a small group of whale hoarders, payment systems for illicit black markets, and an incentive for cyber criminals to steal cpu cycles on a global scale.

  • by stud9920 ( 236753 ) on Friday June 07, 2019 @02:37AM (#58723580)

    Since when is a cost equivalent to 5 cars relevant if it's only an fix cost prior to extremely cheap multiplication?

  • Yeah, and that is how much of a fraction of the waste generated by bitcoins and other cryptocurrencies?

    Those AI models at least serve some useful purpose for humanity.

    Did the researcher also calculate how much waste was generated by the computing for weather forecasting?

  • by Anonymous Coward

    Assuming it's true, and it may well be...

    How many cars are there in the US? Ok, now, how many AI models of that magnitude?

    There you go.

  • by Anonymous Coward

    > a carbon footprint of roughly 1,400 pounds of carbon dioxide equivalent...

    I'm trying to work out how much that is. It seems 1000kg CO2 costs about $160, which is about £125 today. So 1400 GBP gets you 11,200kg. Got it.

    Kind of crazy. Since when were quantities of CO2 measured in pounds?

  • Every attempt to create order offputs an equivalent amount of disorder.I wonder what the carbon footprint of the United States Postal Service is considering it puts in order and delivers 152 billion pieces of mail every year (it was, at one time, 213 billion) Oh, and by the way, there is no free lunch, no perpetual motion machine. If you switch to solar energy there must be an equivalent downside. I suggest it is that it changes the albedo of the Earth, causing more heat from the sun to be trapped. hmm, jus
    • could one cool the house significantly in summer merely by extending a reflective white sheet of some kind across the roof and heat it to some degree in the winter by extending a black sheet across the roof?

      Significantly? Not really. Stick built houses with 4" walls are simply high energy consumers. Brick walls all the way around are a substantially more significant building improvement, energy-wise.

      Could the difference between the temperature of the roof in summer and the temperature, say, ten feet under the ground be used to generate useful amounts of power from a house?

      No but the difference can be used to reduce the power consumption of the house in warmer climates. It's called a ground source heat pump.

  • by thegarbz ( 1787294 ) on Friday June 07, 2019 @06:40AM (#58724104)

    I was recently involved in a major industrial project which interestingly for the oil and gas industry had carbon footprint as a large factor in its design goal. I thought the notion was ridiculous since reducing the carbon footprint without CCS was practically impossible but the results were quite interesting. The location preferenced certain countries over others due to local energy policy. The idea of using steam raising equipment and then running critical equipment on turbines (a classical approach) was scrapped in favour of large electrical equipment and infrastructure to make it reliabile. Then the kicker:

    "We have a reduced carbon footprint as we have maximised electricity usage and chosen a country with an energy policy for carbon neutral production by 2030"

    Since being involved in this project ever time I see a story like this that "X uses so much energy" all I can think of is:
    a) scrap bitcoin
    b) country in which X is proposed need to pull their finger out and fix their generation.

  • You can claim that any given project is "uneconomical" by calculating the carbon emitted in all phases of the production and development cycle. If your number don't quite add up yet, add in the carbon produced by the software develops' cars, and what their families use, until you get whatever number you deem to be scary enough.

    All of those carbon usages are the average emissions at one chosen time, in this case when the US grid still has a large percentage of coal. As carbon is wrung out of the economy, all

  • ...based upon the false assumption that computers are powered by fossil fuel.
  • How many pounds ^H^H^H^H^H^H kg (modern world ppl!) of CO2 were emitted to produce the paper?

    Include all computing, travel, etc which supported the paper....

  • 1) Cars are a major pollution problem because we have over a billion of them on the road. Until we have hundreds of millions of AI's being trained, comparing an AI to a car is irrelevant. Nuclear power plants emit more carbon than a single car does (Concrete emits carbon dioxide) but no one is stupid enough to think nuclear power plants are a major source of carbon dioxide.

    2) The study used an average mix of power supply. Cars are mobile, so fuelling them is problematic. AI training plants, if they D

  • Outta sight outta mind. Carry on.
  • This kind of study makes no sense at all to me. What exactly are they trying to accomplish? Do they not want AI work to be done at all?

    I'm sure the research they did wasn't very carbon neutral either. The food they consumed isn't carbon neutral. The farts they emitted arn't helping either. The computers they used, the paper they wrote on... I'm sure all that contributes to the CO2 burden as well. Maybe they should stop doing research and go live in a monastery if they care so much about CO2 levels?

  • TPUs are commonplace in sub-$200 demo boards from Googl, Nvidia, & others. Please test those as nobody doing "big" models is paying 100x too much. Therefore this paper's conclusions are absurd.

    Training AI is cheap & getting cheaper. Using the results is often the cheapest alternative (hence the field's existence). I'd imagine AI is one of the most competitive techs in carbon neutrality.

    Just consider "alternatives" of:
    - training & paying people do to the work
    - using

  • Young humans, in their first 18 years of training, consume fuel and void waste in similar quantities to a working productive adult... This process is highly inefficient. Students should eat less.

Our OS who art in CPU, UNIX be thy name. Thy programs run, thy syscalls done, In kernel as it is in user!

Working...