Training a Single AI Model Can Emit As Much Carbon As Five Cars In Their Lifetimes 156
In a new paper, researchers at the University of Massachusetts, Amherst, performed a life cycle assessment for training several common large AI models. They found that the process can emit more than 626,000 pounds of carbon dioxide equivalent -- nearly five times the lifetime emissions of the average American car (and that includes manufacture of the car itself). MIT Technology Review reports: The researchers looked at four models in the field that have been responsible for the biggest leaps in performance: the Transformer, ELMo, BERT, and GPT-2. They trained each on a single GPU for up to a day to measure its power draw. They then used the number of training hours listed in the model's original papers to calculate the total energy consumed over the complete training process. That number was converted into pounds of carbon dioxide equivalent based on the average energy mix in the US, which closely matches the energy mix used by Amazon's AWS, the largest cloud services provider.
They found that the computational and environmental costs of training grew proportionally to model size and then exploded when additional tuning steps were used to increase the model's final accuracy. In particular, they found that a tuning process known as neural architecture search, which tries to optimize a model by incrementally tweaking a neural network's design through exhaustive trial and error, had extraordinarily high associated costs for little performance benefit. Without it, the most costly model, BERT, had a carbon footprint of roughly 1,400 pounds of carbon dioxide equivalent, close to a round-trip trans-American flight. What's more, the researchers note that the figures should only be considered as baselines. Using a model they'd produced in a previous paper as a case study, the researchers "found that the process of building and testing a final paper-worthy model required training 4,789 models over a six-month period," the report states. "Converted to CO2 equivalent, it emitted more than 78,000 pounds and is likely representative of typical work in the field."
They found that the computational and environmental costs of training grew proportionally to model size and then exploded when additional tuning steps were used to increase the model's final accuracy. In particular, they found that a tuning process known as neural architecture search, which tries to optimize a model by incrementally tweaking a neural network's design through exhaustive trial and error, had extraordinarily high associated costs for little performance benefit. Without it, the most costly model, BERT, had a carbon footprint of roughly 1,400 pounds of carbon dioxide equivalent, close to a round-trip trans-American flight. What's more, the researchers note that the figures should only be considered as baselines. Using a model they'd produced in a previous paper as a case study, the researchers "found that the process of building and testing a final paper-worthy model required training 4,789 models over a six-month period," the report states. "Converted to CO2 equivalent, it emitted more than 78,000 pounds and is likely representative of typical work in the field."
Nuclear power (Score:1)
If it works for boats, it can work for cars too.
Energy efficient cars (Score:5, Insightful)
Re: (Score:3)
The University of Massachusetts, Amherst create about 100 cars worth of CO2 every time they research, write, and publish one of these stupid, pedantic studies.
Re:Energy efficient cars (Score:5, Informative)
Also, the researchers used a GPU and then extrapolated the energy consumption.
But modern "Big AI" is not done with GPUs. They use dedicated ASICs such as TPUs [wikipedia.org], which are far more efficient.
If they failed to use the correct hardware, then I suspect they failed to get many other things correct as well. Techniques such as batch normalization [wikipedia.org] can make a ten-fold difference in energy consumption.
They should compare it to a zero emissions vehicle (Score:3, Interesting)
Or for that matter a negative emissions vehicle. Then you'd have a negative ratio.
As for TPU versus GPU, well no. Some people use TPU. But that's not everyone. Nvidia isn't selling GPUs for no reason at all. Moreover some AI problems, like ones that aren't just matrix ops just don't even work on GPUs let along TPUs.
Re:They should compare it to a zero emissions vehi (Score:5, Insightful)
As for TPU versus GPU, well no. Some people use TPU. But that's not everyone. Nvidia isn't selling GPUs for no reason at all.
Nearly everyone buys GPUs for graphics.
Some people buy them for training small NNs, but those are not who TFA is talking about. They are talking about "Big AI" that uses an entire datacenter (thousands of servers) for days of training. Organizations doing that are not using GPUs. They are using ASICs specialized for massive amounts of low-precision matrix ops.
Moreover some AI problems, like ones that aren't just matrix ops just don't even work on GPUs let along TPUs.
An inference engine running Lisp isn't consuming significant power.
Re: (Score:1)
The largest supercomputer in the world, Summit at Oak Ridge National Lab, is powered by over 27k Nvidia TESLA V100 GPUs.
Those GPUs aren't being used for graphics.....
https://www.olcf.ornl.gov/olcf-resources/compute-systems/summit/
Re:Energy efficient cars (Score:5, Insightful)
But modern "Big AI" is not done with GPUs. They use dedicated ASICs such as TPUs [wikipedia.org], which are far more efficient.
Since Google hasn't yet started selling physical TPUs, the only users are Google employees and Google Cloud users. Based on current Rightscale [forbes.com] numbers, AWS and Azure are each far more popular than Google Cloud. So, it would be very surprising if most training is not currently done with GPUs.
As for power efficiency, a TPUv3 [nextplatform.com] chip gets around 90 FP16 TOPS for about 200W, while an Nvidia T4 [nvidia.com] gets 65 FP16 TOPS for around 70W. The numbers depend on how you count TOPS and watts and how the raw numbers translate to application-level performance, but it's very clear that TPUs are not obviously "far more efficient."
Re:Energy efficient cars (Score:5, Interesting)
Tesla has a matrix ASIC.
Facebook is working on one [eetimes.com].
Nvidia is also building its own TPU [forbes.com].
Soon GPUs will go back to being for graphics, and every laptop and smartphone will have a dedicated low power matrix processor.
Re: (Score:2)
Soon GPUs will go back to being for graphics, and every laptop and smartphone will have a dedicated low power matrix processor.
No, it won't. That processing can be done in the cloud. Most users don't need to train AI models at all. Maybe eventually, but not soon
Re: (Score:2)
Tensor processing still needs to be done to run applications after they've been trained. That can be done in CPU but that's orders of magnitude slower. So whether you need a dedicated TPU depends on the needs of the applications. For example, running the lc0 chess engine at a decent level (by computer chess standards) absolutely requires a GPU/TPU, as lc0 needs to analyse many thousands of positions a second.
Re: (Score:1)
Most users don't need to train AI models at all
Most people don't need a computer with multiple cores
Most users don't need more than 640k RAM
Most people don't need a computer
Most people don't need a calculator
Re: (Score:2)
Most of those things happened much more slowly for most people than they could have, because they didn't need to happen any quicker.
Users will eventually have that hardware onboard, but not especially soon. It will have to be common enough to become a cheap commodity part before they will slap it into the SoC.
Re: (Score:2)
It's already a (relatively) cheap commodity part: https://www.amazon.com/Intel-N... [amazon.com]
And Apple has already slapped them in their SoCs: https://www.wired.com/story/ho... [wired.com]
And Apple isn't special.
Re: (Score:2)
Soon GPUs will go back to being for graphics, and every laptop and smartphone will have a dedicated low power matrix processor.
No, it won't. That processing can be done in the cloud. Most users don't need to train AI models at all. Maybe eventually, but not soon
Training personalized models in the cloud means shipping all the data needed to train them to the cloud, which raises privacy concerns. I expect to see lots of cases where model training is moved down to the end-user device so the data can be kept there.
Re:Energy efficient cars (Score:4, Informative)
Also, the researchers used a GPU and then extrapolated the energy consumption.
Yep.Big training is done on huge clusters. You can pretty easily multiply the power consumption by the number of machines and the time spent.
But modern "Big AI" is not done with GPUs.
No, it really is.
They use dedicated ASICs such as TPUs, which are far more efficient.
No mostly they use GPUs. You've clearly spent far more time reading articles about the tech than you have, say, running PyTorch. ASICS or dedicated logic blocks are currently used mostly on the inference end rather than the training end. On the training end, people use Nvidia 1080Tis or RTX2080s. Google have some TPU stuff for training, but I've not met anyone outside of google who actually uses it.
Techniques such as batch normalization can make a ten-fold difference in energy consumption.
Everyone already uses that pretty much between every layer.
Re: (Score:3)
Well technically blood plasma is sterile its the bacteria in it that isn't.
Re: (Score:1)
Re:Energy efficient cars (Score:5, Insightful)
Re: (Score:2)
Yeah, how many paper worthy models are there? 100? 200 a year? Oh no! we added a whopping 1,000 cars to the road in CO2. Meanwhile the paper worthy models are providing 7 billion people valuable paper-worthy functionality.
This sort of hand wringing really is some of the worst aspects of modern society. We are paralyzed on innovation because every incremental intermediary step is so heavily criticized. "Electric Cars aren't carbon zero!" "Solar Panels take 5 years to pay themselves off!" "Self Driving c
Re:GO RENEWABLE - SAYS THE CAPITAL MARKET, SORRY (Score:4, Informative)
The markets (and reality) have spoken. Nuclear power is right now a super-expensive boondoggle. We need solar/wind and other renewables, and the market is investing in that because it's cheaper and easier. We also need capacitance.
Ideally we'd upgrade the grids (or overlay a new grid) of site generation that could be used to offset transmission losses and peak demand mountains could be refilled during low demand valley times. All of it 1000's of times cheaper than nuke.
The markets have spoken. Unless you're a socialist and have a big check to write to fund it, renewables win. That's just capitalism.
Ah, so the land we would need to do that, where is that? And all the extra mining we would need? Scale of power matters [withouthotair.com]. Energy production is an engineering problem and efficiency of energy production matters. Fossil fuels are about 1000x times more energy dense than renewable sources. Nuclear is 60000x more energy dense than that. When you are trying to minimize waste, you minimize the amount of fuel first. That's why nuclear is the best (and only) choice. Also, it doesn't matter how cheap something is if there's only 1/100th as much as the world needs.
Re: (Score:2)
The land required for wind is not large, since windfarms can be interwoven with agricultural farms. The land required for solar in an urban environment is not large, since solar panels are mounted on buildings. That leaves solar farms, which have had no trouble finding land since they're being built in areas which are unu
Re: (Score:1)
There are a lot of roofs that are well suited for solar panels (orientation and inclination) and where the panels would not need extra land. Putting panels on those is already quite popular in Germany.
AI is a marketing term (Score:2)
It doesn't mean what most think it means.
For starters, synonyms for ARTIFICIAL include: fake, imitation, mock, ersatz, faux.
Machine learning is a milestone. But intelligent, it is not.
Re: (Score:2)
These days AI just means, "we didn't have enough money to build an expert system, so we trained a mediocre model, and we didn't have enough money to build an ideal data-set so we used real-world data."
Re: (Score:2)
AI is a trial and error system. So change parameters better or worse than the last run, problem is false paths, which require restarting the process at an earlier stage with a rejected set of parameters. Lots and lots of trial and error. The problem being, the coders are only good at coding and not good a solving the problem assigned to the AI. If they were, they would establish better parameters and carefully monitor the process to reject long term bad outcomes. Also it depends upon how far you break down
Re: (Score:2)
Bad AI reflects bad human intelligence behind the AI.
Solution is to make AI to help us design AI.
Training humans (Score:2)
AI is a trial and error system. So change parameters better or worse than the last run, problem is false paths, which require restarting the process at an earlier stage with a rejected set of parameters. Lots and lots of trial and error.
So is human intelligence. As a parent what you have described sounds exactly like raising a human child.
Bad AI reflects bad human intelligence behind the AI.
This doubly so. Just substitute child for AI.
Re: (Score:2)
AI means "machine learning" as commonly used. I don't think it means what you think it means.
Re: (Score:1)
AI means "machine learning" as commonly used. I don't think it means what you think it means.
Well put! At least the first half. But sorry. ML means machine learning, as commonly used. AI refers to the phrase Artificial Intelligence.
But your 50% accuracy is notable for this field, considering our iteration count. Oh, you were being serious.
So then what's next SAI (Strong AI)? Then RSAI (Real Strong AI)? IRMITTRSAI (I Really Mean It This Time - Real Strong AI)?
Thanks for making my point.
Re: (Score:2)
I get what you want "AI" to mean, but that battle is as lost as the battle for "hacker" and "cyber". Yes, there will inevitably be a language treadmill for "real AI", though the terms "machine intelligence" and "self-aware machine" have been long used for that. After all, it would be no more artificial than our own intelligence.
wrong planet (Score:1)
Re: (Score:1)
And if we ever can build large facilities off world, in orbit might be a lot smarter. Right now we can build small ones like the ISS.
Humans, (Score:1)
Re: (Score:3, Funny)
Think of how much carbon is emitted in the ~25-year long process of training a single, non-copyable instance of meatbag intelligence.
Re:Electricity (Score:2)
Re: (Score:2)
And how much extra CO2 if that meatbag spends 2 hours/day watching funny cat movies on YouTube ?
Re:I dunno (Score:2)
In an associated study (Score:4, Insightful)
In an associated study on a topic of similar relevance, tt was calculated that 180,011 Angels can dance on the head of a pin.
Re: (Score:1)
In an associated study on a topic of similar relevance, tt was calculated that 180,011 Angels can dance on the head of a pin.
Lets see. The angel with the thinnest waist is Mina [wikipedia.org] at 17.9". (Source: https://www.youtube.com/watch?... [youtube.com] )
Area = 17.9^2 / 4pi = 25.497417658 sq inches
Assuming they hug each other closely with no extra room, that's
Total area = 25.497417658 * 180011 = 4589815.650034238
So you need a pin with a head that has 4589815" of surface area.
Going back the other way, we end up with a pin head diameter of 201ft 5.5in.
Cheaper than training an AI model, anyways.
Re: (Score:2)
Assuming they hug each other closely with no extra room
I'd watch that music video.
It's really not a problem (Score:4, Interesting)
The great thing about deep learning models is that they just need to be trained on the large corpora once, and after being made public, this computation avoids third parties having to replicate the process.
I know specifically that the BERT model is used as the focal point of Natural Language Understanding/Processing for many businesses (I don't have the number but since it's the gold standard pretrained model, it's going to be lots).
Re: (Score:2)
It goes further than that. How much carbon did the designer of said tool use during their learning? Days spent on BIG computers learning CAD/CAM software, practicing with it, then their colleagues who also helped make that tool. The training time for the fitters and turners who tool up the factory, the supplier, etc.
This wasn't "how much energy is wasted training an ML system" this was "how much energy does it take to train people who make ML systems".
Climate models (Score:1)
Climate models also are run on high performance computing systems, across hundreds or thousands of cores. These models are run for long periods of time I order to simulate climate out to 2100, which is common. If training AI is computationally intensive enough to produce significant carbon pollution, we should also be concerned about the effect of running climate models. Perhaps we should reduce carbon pollution by making climate researchers stop running their alarmist doomsday climate scenarios. If climate
Can somebody convert to something more useful (Score:1)
"average energy mix in the US" (Score:2)
They're really pulling some guesses out. I mean, if you're specifically using AWS, I guess this is relevant. But some places have greener energy than others.
which closely matches the energy mix used by Amazon’s AWS
I'd also like to see the citation for this.
Use AWS us-west-2 and Azure West US 2 (Score:5, Interesting)
Use AWS us-west-2 and Azure West US 2. These regions are primarily served by the 5.5 GW system of dams along the Columbia River along with a mix of wind and solar.
Facebook and Apple are to the south in the High Desert, also primarily using hydro (plus wind and solar).
These machine learning applications don't need to worry about latency but they do need to worry about pollution. The bandwidth that is continuously being expanded to this region is massive, from 400 to 600 Gb per each link.
Re:Use AWS us-west-2 and Azure West US 2 (Score:5, Interesting)
Sadly, the largest cloud regions are in Northern Virginia (AWS us-east-1, Azure East US 1) and mostly use thermoelectric (most of it coal), some nuclear, wind, and solar.
Southern Virginia (where Azure East US 2 is) presumably has more nuclear being closer to North Anna NGS than Northern Virginia presumably is to Peach Bottom NGS in the north, somewhat less thermoelectric (most being coal), wind, and solar.
AWS us-east-2 is in Ohio; it would be interesting to speculate what its presumed power sources are along with other regions.
Do you hate the Earth? (Score:1)
These regions are primarily served by the 5.5 GW system of dams along the Columbia River
Dams are an obscenity that destroy whole ecosystems, I find it pretty amusing you claim to be for the environment while backing one of the most destructive engines against the environment mankind has come up with.
The only real solution is nuclear with solar mixed, all other forms of alternative energy kill or remove habitat for far more wildlife than global warming ever will even if you just left everything as it is.
Re: (Score:3)
These are not new dams. The damage has already been done. They will not be removed for the indefinite future.
Exploitation and uprating by adding new turbines, and upgrading older turbines, on existing dams are truly sound environmental options.
In other words, decrying our massive extant hydroelectric power infrastructure is crying over spilled milk.
Re: (Score:2)
Cool. Now apply that logic to guns. And pregnancy. And Russian election interference.
You leftists are all fucking retarded.
Is this a troll, a cry for help, or both? All of those things are ongoing problems which will cause more sadness if not addressed. (The guns aren't the root cause of violence, but the gun violence problem still has to be addressed.)
Re: (Score:2)
Adding another response to myself about Oregon's cloud sector:
Both AWS and Azure regions are aggressively being populated with the mot advanced GPU instances. In particular, Azure GPU instance types in West US 2 have the more powerful GPUs with faster RDMA interconnect--all with virtually no carbon footprint and virtually negligible non-renewable power consumption.
Just watch your Azure quotas. Even AWS is rationing their GPU instances there.
So, what's the problem? (Score:3, Interesting)
The trained bot software runs on hundreds of thousands of cars typically. Thus, costing the energy of just 5 versus being used on hundreds of thousands seems trivial.
better than... (Score:1)
Better used on modeling than used on Bitcoin transactions which are inefficient by design...
This should not be a case against AI (Score:3)
This should be a case against coal and oil.
Re: (Score:2)
Seriously, these workloads need to be moved to regions with large renewable capacity. Obvious choices include Oregon (AWS us-west-2, Azure West US 2). I suspect Brazil is largely hydropower thanks to the massive Itaipu Dam, which also happens to be the largest power producer in the world, renewable or non-renewable.
Harder choices are elsewhere and you can be fooled by regions using those fake "carbon credits" to hide the fact that their energy is thermoelectric (service providers in Northern VA do this, sin
Re: (Score:2, Insightful)
This should be a case FOR nuclear (generation 3+ and 4).
Today I learned that (Score:2, Informative)
Bitcoin, which consumes far more power than AI model training, is likely an environmental disaster in terms of carbon emissions.
Ok. But from where? (Score:1)
Literally 100% of electricity here is Hydro.
Math seems way off - check my calculations (Score:1)
Re: (Score:1)
Re: (Score:2)
Correction: to the last part: it would take 5,252 (not 750) GPU's running 100% for a week t
I think that might be right. Google says that some of their AI's take 200 years worth of training, check out this one for an example [esportswizard.com]. They're putting a lot of money into this.
How many car burns did this study cost? (Score:2)
Yeah. (Score:4, Insightful)
Those clowns who lobby against nuclear and hydro power are the real villains here, and should be held accountable for the millions of tons of CO2 dumped into the atmosphere by unnecessarily burning fossil fuels.
Molehills vs mountains (Score:5, Insightful)
Yes, the AI people are emitting CO2 as a side effect of getting their work done, just like people in almost every other field of human endeavor. That's what happens when you live in a fossil-fuel-based society. The fix would be to move the AI hardware (along with everything else, eventually) over to renewable energy; something that Google and others are already in the process of doing.
In the meantime, have these researches looked at Bitcoin mining? If they think training AI is a waste of energy, that's really going to twist their noodle.
Re: (Score:2)
A real "mountain" would be the equivalent energy consumption of a few medium sized countries to generate the existing supply of crypto currency that benefits mostly a small group of whale hoarders, payment systems for illicit black markets, and an incentive for cyber criminals to steal cpu cycles on a global scale.
fix vs variable cost (Score:4, Insightful)
Since when is a cost equivalent to 5 cars relevant if it's only an fix cost prior to extremely cheap multiplication?
As compared to Bitcoins (Score:2)
Yeah, and that is how much of a fraction of the waste generated by bitcoins and other cryptocurrencies?
Those AI models at least serve some useful purpose for humanity.
Did the researcher also calculate how much waste was generated by the computing for weather forecasting?
Click bait study (Score:1)
Assuming it's true, and it may well be...
How many cars are there in the US? Ok, now, how many AI models of that magnitude?
There you go.
Pounds? (Score:1)
> a carbon footprint of roughly 1,400 pounds of carbon dioxide equivalent...
I'm trying to work out how much that is. It seems 1000kg CO2 costs about $160, which is about £125 today. So 1400 GBP gets you 11,200kg. Got it.
Kind of crazy. Since when were quantities of CO2 measured in pounds?
Not surprising (Score:2)
Re: (Score:2)
could one cool the house significantly in summer merely by extending a reflective white sheet of some kind across the roof and heat it to some degree in the winter by extending a black sheet across the roof?
Significantly? Not really. Stick built houses with 4" walls are simply high energy consumers. Brick walls all the way around are a substantially more significant building improvement, energy-wise.
Could the difference between the temperature of the roof in summer and the temperature, say, ten feet under the ground be used to generate useful amounts of power from a house?
No but the difference can be used to reduce the power consumption of the house in warmer climates. It's called a ground source heat pump.
So 100% carbon neutral given the right input? (Score:3)
I was recently involved in a major industrial project which interestingly for the oil and gas industry had carbon footprint as a large factor in its design goal. I thought the notion was ridiculous since reducing the carbon footprint without CCS was practically impossible but the results were quite interesting. The location preferenced certain countries over others due to local energy policy. The idea of using steam raising equipment and then running critical equipment on turbines (a classical approach) was scrapped in favour of large electrical equipment and infrastructure to make it reliabile. Then the kicker:
"We have a reduced carbon footprint as we have maximised electricity usage and chosen a country with an energy policy for carbon neutral production by 2030"
Since being involved in this project ever time I see a story like this that "X uses so much energy" all I can think of is:
a) scrap bitcoin
b) country in which X is proposed need to pull their finger out and fix their generation.
The silly 'mining argument' again (Score:2)
You can claim that any given project is "uneconomical" by calculating the carbon emitted in all phases of the production and development cycle. If your number don't quite add up yet, add in the carbon produced by the software develops' cars, and what their families use, until you get whatever number you deem to be scary enough.
All of those carbon usages are the average emissions at one chosen time, in this case when the US grid still has a large percentage of coal. As carbon is wrung out of the economy, all
Clickbait headline... (Score:2)
IMWTK (Score:1)
How many pounds ^H^H^H^H^H^H kg (modern world ppl!) of CO2 were emitted to produce the paper?
Include all computing, travel, etc which supported the paper....
Major problems with this study (Score:2)
1) Cars are a major pollution problem because we have over a billion of them on the road. Until we have hundreds of millions of AI's being trained, comparing an AI to a car is irrelevant. Nuclear power plants emit more carbon than a single car does (Concrete emits carbon dioxide) but no one is stupid enough to think nuclear power plants are a major source of carbon dioxide.
2) The study used an average mix of power supply. Cars are mobile, so fuelling them is problematic. AI training plants, if they D
See no evil know no evil (Score:2)
How carbon neutral is research? (Score:2)
This kind of study makes no sense at all to me. What exactly are they trying to accomplish? Do they not want AI work to be done at all?
I'm sure the research they did wasn't very carbon neutral either. The food they consumed isn't carbon neutral. The farts they emitted arn't helping either. The computers they used, the paper they wrote on... I'm sure all that contributes to the CO2 burden as well. Maybe they should stop doing research and go live in a monastery if they care so much about CO2 levels?
Tensor TPU Chips (Score:2)
TPUs are commonplace in sub-$200 demo boards from Googl, Nvidia, & others. Please test those as nobody doing "big" models is paying 100x too much. Therefore this paper's conclusions are absurd.
Training AI is cheap & getting cheaper. Using the results is often the cheapest alternative (hence the field's existence). I'd imagine AI is one of the most competitive techs in carbon neutrality.
Just consider "alternatives" of:
- training & paying people do to the work
- using
Young humans, in their first 18 years of training (Score:2)
Young humans, in their first 18 years of training, consume fuel and void waste in similar quantities to a working productive adult... This process is highly inefficient. Students should eat less.
Re: (Score:1)
Don't like objective facts? Then kill yourself. You deserve no right to exist. Anyone who oppose science must be exterminated.
God holds you above the fires of Hell by a thread as thin as gossamer. Struggle against what is righteous and the thread will break, plunging you into eternal damnation.
Don't believe in objective facts? Doesn't matter, Hell believes in you. You deserve no right to Heaven. Anyone who opposes Jesus will burn eternally in the fires of Hell.
Re: (Score:2)
That one poor solar farm has probably been claimed as the power source for so many projects, I'm starting to think that this whole green energy thing has been brought to us by Bialystock and Bloom.