Microsoft, Google, and Amazon have all acknowledged the difficulty of building and developing computationally intensive artificial intelligence (AI) infrastructure while simultaneously trying to meet their net-zero and sustainability goals.
These ever-growing energy demands of AI starkly contrast with the global efforts to reduce carbon emissions and minimize waste. Interest in green AI has surged to spread awareness and reduce the environmental impact of such technologies.
To better understand the environmental impact of AI, we teamed up with the Collaborative Innovation Program at the Wharton School of the University of Pennsylvania. After an in-depth review and interviewing multiple senior executives across the technology, climate/ESG, and business spaces, this report identified three key areas with the greatest energy implications: data centers, hardware and algorithmic optimization.
The International Energy Agency (IEA) estimates that data centers and data transmission networks are responsible for approximately 1% of energy-related greenhouse gas emissions globally. As AI demand increases, so too will the need to build out and maintain data center warehouses, which are often powered by “dirty” electricity grids, including in Virginia’s “data center alley”, the site of 70% of the world’s internet traffic in 2019.[1]
The energy usage of AI and data centers is shifting the long-term thinking of many technology companies. The ability of generative AI (GenAI) to produce complex data differs drastically from that of discriminative AI, as the latter are models designed for classification purposes in creating binary decisions such as approving/disapproving loan applications. Due to GenAI’s inherent complexity in generating outputs, the carbon emissions of training these models requires 10 to 15 times more energy utilizing graphics processing units (GPUs) than traditional central processing units (CPUs) due to the former’s superiority in computationally intensive tasks.[2] These rapidly accelerating energy needs are shifting the calculus of technology companies that are now exploring previously untenable sources such as nuclear fusion and small modular reactors.
To understand these dynamics better, three insights have been provided to help companies approach data center selection. First, as use cases of AI soar, so do the energy and water consumption required to run the data centers. Currently, data centers use 6% of all electricity in the U.S. – a figure that is expected to double by 2026. This will impact energy, water, and resource capabilities as the world transitions to a lowcarbon economy and critical electrical components such as semiconductors may face shortages similar to those experienced during the COVID-19 pandemic.
Source: Expected carbon emissions due to data center operation
(Towards a Systematic Survey for Carbon Neutral Data Centers)
Second, operational emissions represent the bulk of environmental impact from data centers. This is becoming a priority for technology companies. For example, due to such large emissions, Microsoft has pledged four primary actions to address this issue:
Renewable energy and attendant investments will play an important role in creating a green, circular ecosystem, but fossil fuels will largely run the initial advancements of AI.
Source: Operational vs. Non-Operational Emissions (Data Centre Life Cycle analysis)
Towards a Systematic Survey for Carbon Neutral Data Centers
And third, selecting the right data center location can drastically cut operational emissions by at least 60%. Key considerations for site selection include power purchase agreements (PPAs) and access to carbon-free energy (CFE) sources such as solar, wind, hydroelectric, and geothermal. In its most recent 2024 sustainability report, Google’s total greenhouse gas emissions increased by 13% year over year, primarily driven by data center energy consumption and supply chain emissions in “hard-to-decarbonize” regions such as the Asia Pacific where CFE isn’t readily available.[4]
The recent explosion of large language models (LLMs) and attendant data center expansions has forced a radical rethinking in how tech companies and countries approach their electrical grid. Adding more power generation capacities and capabilities has caused countries to reassess their larger net-zero and decarbonization goals. With the IEA projecting that global data center electricity demand will more than double by 2026 – largely driven by LLMs and data centers – there is a need to reevaluate and rethink the very nature of electricity consumption in trying to match supply and demand.[5]
Source: WTW and Wharton
The intermittent nature of renewable energy will require greater coordination between technology companies and electrical utilities in mapping out the grid to better absorb the large electricity demand required by hyperscale data centers, whose main purpose is to handle the extreme scalability capabilities optimized for networked infrastructure and large-scale workloads of GenAI models.[6]
The creation and deployment of new technologies and standards – such as Green AI Code of Conduct; environmental, social and governmental protocols; specialized hardware accelerators; water cooling systems; 3D chips; and non-silicon semiconductors – may bring computation to where renewable energy is sufficient and where competitive behavior will be awarded by consumers through the use of more environmentally friendly chatbots.
Hardware selection will ultimately rest upon what functions and roles researchers and developers require when systematizing AI applications. Depending on cost, efficiency, scalability and purpose of AI projects, choosing the right processors to power the architecture will be determined when building and training models. These processing units are the fundamental computing engines of the hardware that powers deep learning and high performance inferencing tasks and can have a material impact on the sustainability and environmental impact of technology.
The processing units that make up and perform the complicated tasks involved with AI center around central processing units (CPUs), graphics processing units (GPUs), tensor processing units (TPUs), and neural processing units (NPUs). Choosing which processing unit is needed for which operation depends on striking a delicate balance between complexity, cost-efficiency for real-world applications and environmental impacts.
CPUs have multiple cores and are commonly known as the brain of the computer, executing the commands needed for a computer’s operating system. Due to their versatility, cost effectiveness, and convenience in availability, they can handle simple and general purpose computing tasks; however, CPUs can face bandwidth and memory issues.[7] A lack of dedicated hardware for powerful and specific machine learning operations makes CPUs an inferior processing unit when compared with GPUs and TPUs.
In recent years, GPUs (more specifically, Nvidia’s Ampere, Hopper, Lovelace, and Blackwell GPUs) have surpassed the roles traditionally required of CPUs due to their superior abilities in computing power and attendant operations. Designed for parallel processing and to accelerate the rendering of 3D graphics,[8] GPUs are now used in high performance computing (HPC), deep learning, and training and inference. Working in conjunction with CPUs, GPU parallel computing helps to accelerate some of the CPUs’ functions, with both sharing similar internal components such as core, memory, and control units.[9]
Google created its TPUs as an AI accelerator application-specific integrated circuit (ASIC) for use in neural network machine learning based on its own TensorFlow software. TPUs differ from GPUs in that TPUs’ specialized feature is its utilization of matrix multiplication for AI training and inference whereas GPUs are ideal for algorithms that process large datasets found in AI workloads.[10] GPUs are the primary compute hardware for AI applications, but specialized AI hardware such as Google's TPUs offer greater energy efficiency, being tailor-made for AI tasks.
Device NPUs have architecture that simulates the brain’s neural network. Unlike general purpose CPUs and GPUs, NPUs are optimized for handling AI-related tasks while differing from TPUs and other ASICs. While ASICs are designed for a singular purpose, NPUs offer more flexibility due to their tailor-made requirements for neural network computations.[11] As demands for processing performance increased, NPUs were regarded as a specialized solution for handling new AI tasks that CPUs and GPUs were not built for.
The AI hardware landscape is rapidly expanding with new entrants such as the Cerebras AI processor, Ampere CPU, and Graphcore IPU, driven by the burgeoning use of AI. With industry measuring energy efficiency in TOPS/W, specialized hardware options have demonstrated up to 1.5 times more energy efficiency over GPUs. Despite this, Nvidia maintains market dominance thanks to its comprehensive ecosystem of DGX hardware (enterprise AI combining software, infrastructure and expertise) and CUDA software, the latter being Nvidia’s proprietary parallel computing platform, developed around the company’s market-leading GPUs.
Feature | Nividia H100 | Google TPUv5 |
---|---|---|
Architecture | Hopper | TPUv5 |
Tensor scores | 80 | 64 |
Floating points performance | 180 TFLOPS | 180 TFLOPS |
Power consumption | 450W | 300W |
Efficiency | 4 TFLOPS/W | 6TFLOPS/W |
Spec | Cerebras CS-3 |
B200 |
DGX B200 |
GB200 NVL72 |
---|---|---|---|---|
FP16 PFLOPs | 125 | 4.4 | 36 | 360 |
Memory(GB) | 1,200,000 | 192 | 1,536 | 13,500 |
NVLlnk I Fabric Bandwidth (TB/s) |
26,750 | 1.8 | 14.4 | 130 |
Power (Watts) | 23,000 | 1,000 | 14,300 | 120.000 |
PFLOPs/W | 0.005 | 0.004 | 0.003 | 0.003 |
AI models undergo training (the first phase for an AI model where the model is shown desired inputs and outputs) and inference (the process that follows AI training where the model recognizes inputted data and makes predictions). Initially, it was believed that the training of AI models was higher than inference costs. However, companies such as Nvidia and Amazon now believe that inference can exceed the cost of training, and that inference may account for up to 90% of machine learning costs for AI systems[12] while Google estimates that 60% of energy used goes towards inference and 40% towards training.[13]
Cost-effective AI workloads will depend on utilizing CPUs or GPUs (or a combination of the two) in a system architecture with clear goals aimed at accomplishing specific and/or complex tasks across multiple industries and platforms. For instance, it is estimated that OpenAI’s ChatGPT was trained on over 20,000 Nvidia A100 GPUs and future ChatGPT versions will require over 30,000 H100 GPUs. [14]
The lower purple line is for Evolved Transformer [So19] on TPUv2s and the upper blue line is for Primer [So21] on TPUv4s, both run in Google datacenters.
Source: WTW and Wharton
Given the high cost and large carbon footprint of such computational power, start-ups and alternatives in the LLM and chip space are challenging the established dominance of ChatGPT and Nvidia, respectively.
Given these numbers, new means and methods have been devised to reduce the carbon footprint of these models. While GPUs remain preferable for training AI models, inference tasks are increasingly shifting to specialized hardware, yielding significant efficiency improvements. Federated learning, neuromorphic computing, and implementing 4M best practices – what Google refers to as Model, Machine, Mechanization, and Map – can help reduce energy usage and carbon emissions by 100 times and 1000 times, respectively.[15]
AI's energy issues can be tackled by optimizing hardware, but further miniaturization of microelectronics is not feasible in the long-term. Since GenAI’s training process – using LLMs – consumes considerable energy, optimization must focus on algorithms. Enhancements in data collection, processing techniques, choosing more efficient libraries, and improving training algorithm efficiency are essential.[16]
There are four valuable insights for guiding companies' developers in writing eco-friendly code. First, using efficient AI models helps decrease energy use and carbon emissions. To gauge a machine learning model's carbon footprint, look at the energy intensive stages: model training, inference execution, and the production of computing hardware and data center infrastructure. Among these three areas, training costs exceed inference costs in the initial stages of a non-deployed LLM. Training just one LLM model can emit an estimated 300 tons of CO2.
Model | GPT-3 | Bloom | LLaMa | LLaMa-2 | T5 | PaLM |
---|---|---|---|---|---|---|
Developer | OpenAI | BigScience | Meta | |||
Model Size (# of parameters) | 175B | 175B | 7B,13B, 33B, 65B | 7B, 13B, 34B, 70B | 11B | 540B |
Training Data (# of tokens) | 300B | 350B | 1.4T | 2T | 34B | 795B |
Training Compute (FLOPS) | 3.2E+23 | 3.7E+23 | 9.9E+23 | 1.5E+24 | 2.2E+21 | 2.6E+24 |
Processor Hours | 3,552,000 | 1,082,990 | 1,770,394 | 3,311,616 | 245,760 | 8,404,992 |
Grid Carbon Intensity (kgCO2e/KWh | 0.429 | 0.057 | 0.385 | 0.423 | 0.545 | 0.079 |
Data Center Efficiency (PUE) | 1.1 | 1.2 | 1.1 | 1.1 | 1.12 | 1.08 |
Energy Consumption | 1,287 | 520 | 779 | 1,400 | 86 | 3,436 |
Carbon Emissions (tCO2e) | 552 | 30 | 300 | 593 | 47 | 271 |
Second, when evaluating the model, it is critical to assess its generality as this will provide an understanding of its energy consumption. The broader a model’s capabilities, the larger its energy consumption will be. Multi-purpose, generative frameworks consume more energy compared with those designed for specific tasks. For example, task-specific systems include voice assistants, recommendation algorithms, autonomous vehicles, and image recognition tools, whereas general, generative AI (AGI) systems encompass ChatGPT, DALL-E, and Google Bard.
Third, developers must review each task their system performs, as certain operations can demand more energy. Factors influencing energy consumption include the task's complexity, the length of generated text, and whether an image is produced. Employing skilled and considerate programmers will facilitate this aspect of energy efficiency.
When collaborating with developers, it is recommended to integrate sustainability considerations from the start, alongside discussions on model expectations, accuracy, and governance. Rushing the planning process can lead to hasty development and poor outcomes in the long-term. Last, effective prompt engineering is crucial for decreasing AI's computational needs and carbon footprint. Prompt engineering optimizes inputs to yield better outputs from a generative AI model.
Higher quality inputs lead to more accurate and efficient responses, improving model performance and sustainability. Techniques such as using contextual prompts, compressing prompts, caching, reusing prompts and optimizing them can help achieve more pertinent outputs while cutting down on energy consumption.
Generality" comes at a steep cost to the environment, given the amount of energy these systems require. Multi-purpose, generative architectures are more energy expensive than task-specific systems. Explore task-specific Al tools rather than general, generative Al (AGI).
Increasing energy consumption and carbon emissions | |
---|---|
Prompt engineering | Design and craft prompts to guide the model's responses effective |
Retrieval augmented engineering | Retrieve data from outside the model and augment the prompts by adding the relevant retrieved data in context |
Parameter efficient tuning | Fine-tune the model with a minimal number of parameters |
Full fine tuning | Fine-tune the model by updating all the parameters |
Training from scratch | Build your own model |
But even with task specific Al tools, some tasks can be more energy intensive than others. Factors that affect energy intensity:
Inference energy (kWh) | ||
---|---|---|
task | mean | std |
text classification | 0.002 | 0.001 |
extractive QA | 0.003 | 0.001 |
masked language modeling | 0.003 | 0.001 |
token classification | 0.004 | 0.002 |
image classification | 0.007 | 0.001 |
object detection | 0.038 | 0.02 |
text generation | 0.047 | 0.03 |
summarization | 0.049 | 0.01 |
image captioning | 0.063 | 0.02 |
image generation | 2.907 | 3.31 |
01
Craft inputs that elicit effective and efficient responses, improving model performance and sustainability.
02
03
Optimize prompts in order to achieve more relevant output and reduce energy use.
04
Keep prompts concise, experiment gradually with different prompts and use reproducible prompts
When developing a system, consider the following:
Process improvements: Begin with framing problems and scopes with sustainability in mind. Concurrently, monitor utilization metrics and refine configurations for performance with a low carbon footprint.
Model choices: When selecting a model, aim for lightweight base models – also referred to as task-specific models in this article. These models have fewer layers and parameters, which reduces computational overhead and allows for easy deployment across various hardware platforms. They can be adjusted in scale according to the task requirements and available resources. Additionally, employ prompt engineering and parameter-efficient fine-tuning during customization.
Consulting your organization’s technology team can help to evaluate how the computational load from model training will be allocated across specialized hardware such as GPUs and TPUs. As mentioned above in the section on hardware selection, GPUs excel at matrix operations, while TPUs are specifically designed for machine learning tasks. Essentially, inquire about what hardware is being utilized and whether it is optimized based on its capabilities. Addressing and acting on these questions can directly reduce the computational burden in a system, thereby lowering overall energy consumption.
Training and tools: Finally, using tools such as CodeCarbon to obtain real-time metrics on the model’s carbon footprint can help in reducing overall carbon emissions. CodeCarbon, a Python package, helps developers reduce emissions by optimizing their code and utilizing cloud infrastructure in regions that rely on renewable energy. These tools assist in evaluating algorithms from an environmental perspective, allowing developers to actively analyze and validate their code.
Think carefully about the business need and the task at hand, and choose an algorithm that meets just that task. Not every business need requires a generative Al solution.
The increase in carbon emissions over the past few years by the world’s leading technology companies has provided an impetus for society to search for new sources of clean energy to power the AI revolution. The internet’s insatiable appetite for data is causing companies such as Microsoft and leading tech leaders such as Jeff Bezos and Bill Gates to fund more sustainable and circular energy sources, most notably nuclear energy in the form of small modular reactors, to power the world’s 7,000 data centers.
Given that electricity demand is no longer easily predictable, country electric loads are now growing significantly faster than grid planners have forecasted, with the load growth curve soaring due to growing demands by industry and the construction of new data centers to handle AI’s explosive electricity usage. Scaling up investments in the clean energy space has now become not only a sustainability necessity but also an economic one as companies are now looking to compete with one another in the green tech space to fuel their technological advancements. While innovation in the AI and LLM space in recent years was given precedence over sustainability imperatives, companies are now trying to outdo each other in unlocking powerfully efficient green tech to meet their companies’ hyperscale abilities, hoping to achieve limitless zero-carbon energy.
Green AI is both a technological challenge and an environmental necessity. With AI becoming increasingly embedded in our daily lives, its considerable energy use and carbon emissions must be addressed.
The strategies discussed in this essay—ranging from algorithm and hardware optimization to embedding sustainability in development practices—offer a guide for mitigating AI's environmental effects. As societal and regulatory pressure builds, companies will find themselves in situations where consumers will reward them based on not just their environmental and sustainability pledges but also their actionable results in “greenifying” their data architecture systems.