Why Certain AI Models Emit 50 Times More Greenhouse Gases for Identical Queries

Like it or not, large language models have quickly become embedded into our lives. And due to their intense energy and water needs, they might also be causing us to spiral even faster into climate chaos. Some LLMs, though, might be releasing more planet-warming pollution than others, a new study finds.

Queries made to some models generate up to 50 times more carbon emissions than others, according to a new study published in Frontiers in Communication. Unfortunately, and perhaps unsurprisingly, models that are more accurate tend to have the biggest energy costs.

It’s hard to estimate just how bad LLMs are for the environment, but some studies have suggested that training ChatGPT used up to 30 times more energy than the average American uses in a year. What isn’t known is whether some models have steeper energy costs than their peers as they’re answering questions.

Researchers from the Hochschule München University of Applied Sciences in Germany evaluated 14 LLMs ranging from 7 to 72 billion parameters—the levers and dials that fine-tune a model’s understanding and language generation—on 1,000 benchmark questions about various subjects.

LLMs convert each word or parts of words in a prompt into a string of numbers called a token. Some LLMs, particularly reasoning LLMs, also insert special “thinking tokens” into the input sequence to allow for additional internal computation and reasoning before generating output. This conversion and the subsequent computations that the LLM performs on the tokens use energy and releases CO2.

The scientists compared the number of tokens generated by each of the models they tested. Reasoning models, on average, created 543.5 thinking tokens per question, whereas concise models required just 37.7 tokens per question, the study found. In the ChatGPT world, for example, GPT-3.5 is a concise model, whereas GPT-4o is a reasoning model.

This reasoning process drives up energy needs, the authors found. “The environmental impact of questioning trained LLMs is strongly determined by their reasoning approach,” study author Maximilian Dauner, a researcher at Hochschule München University of Applied Sciences, said in a statement. “We found that reasoning-enabled models produced up to 50 times more CO2 emissions than concise response models.”

The more accurate the models were, the more carbon emissions they produced, the study found. The reasoning model Cogito, which has 70 billion parameters, reached up to 84.9% accuracy—but it also produced three times more CO2 emissions than similarly sized models that generate more concise answers.

“Currently, we see a clear accuracy-sustainability trade-off inherent in LLM technologies,” said Dauner. “None of the models that kept emissions below 500 grams of CO2 equivalent achieved higher than 80% accuracy on answering the 1,000 questions correctly.” CO2 equivalent is the unit used to measure the climate impact of various greenhouse gases.

Another factor was subject matter. Questions that required detailed or complex reasoning, for example abstract algebra or philosophy, led to up to six times higher emissions than more straightforward subjects, according to the study.

There are some caveats, though. Emissions are very dependent on how local energy grids are structured and the models that you examine, so it’s unclear how generalizable these findings are. Still, the study authors said they hope that the work will encourage people to be “selective and thoughtful” about the LLM use.

“Users can significantly reduce emissions by prompting AI to generate concise answers or limiting the use of high-capacity models to tasks that genuinely require that power,” Dauner said in a statement.

Like
Love
Haha
3
Обновить до Про
Выберите подходящий план
Больше