Skip to main content
main content, press tab to continue
Article | WTW Research Network Newsletter

Solving the AI energy dilemma

By Omar Samhan , Sonal Madhok , Meghana Bhimarao , Sabrina Fathi , Snigdha Rege and Shweta Mokashi | March 11, 2025

The growth of data centers and their resultant demands being met by fossil fuel generation in the short to medium term has negatively affected the sustainability goals of leading technology firms.
Corporate Risk Tools and Technology||Environmental Risks|ESG and Sustainability|Insurance Consulting and Technology|Risk and Analytics|Willis Research Network
Artificial Intelligence

Microsoft, Google, and Amazon have all acknowledged the difficulty of building and developing computationally intensive artificial intelligence (AI) infrastructure while simultaneously trying to meet their net-zero and sustainability goals.

These ever-growing energy demands of AI starkly contrast with the global efforts to reduce carbon emissions and minimize waste. Interest in green AI has surged to spread awareness and reduce the environmental impact of such technologies.

To better understand the environmental impact of AI, we teamed up with the Collaborative Innovation Program at the Wharton School of the University of Pennsylvania. After an in-depth review and interviewing multiple senior executives across the technology, climate/ESG, and business spaces, this report identified three key areas with the greatest energy implications: data centers, hardware and algorithmic optimization.

Part 1 - Carbon, computation, and the cloud: The ecological footprint of data centers

The International Energy Agency (IEA) estimates that data centers and data transmission networks are responsible for approximately 1% of energy-related greenhouse gas emissions globally. As AI demand increases, so too will the need to build out and maintain data center warehouses, which are often powered by “dirty” electricity grids, including in Virginia’s “data center alley”, the site of 70% of the world’s internet traffic in 2019.[1]

The energy usage of AI and data centers is shifting the long-term thinking of many technology companies. The ability of generative AI (GenAI) to produce complex data differs drastically from that of discriminative AI, as the latter are models designed for classification purposes in creating binary decisions such as approving/disapproving loan applications. Due to GenAI’s inherent complexity in generating outputs, the carbon emissions of training these models requires 10 to 15 times more energy utilizing graphics processing units (GPUs) than traditional central processing units (CPUs) due to the former’s superiority in computationally intensive tasks.[2] These rapidly accelerating energy needs are shifting the calculus of technology companies that are now exploring previously untenable sources such as nuclear fusion and small modular reactors.

To understand these dynamics better, three insights have been provided to help companies approach data center selection. First, as use cases of AI soar, so do the energy and water consumption required to run the data centers. Currently, data centers use 6% of all electricity in the U.S. – a figure that is expected to double by 2026. This will impact energy, water, and resource capabilities as the world transitions to a lowcarbon economy and critical electrical components such as semiconductors may face shortages similar to those experienced during the COVID-19 pandemic.

Insight 1. As use of Al soars, so does the energy and water it requires

Bar chart showing the trajectory of energy and water usage as use of AI soars
Insight 1- As use of Al soars, so does the energy and water it requires

Source: Expected carbon emissions due to data center operation
(Towards a Systematic Survey for Carbon Neutral Data Centers)

Second, operational emissions represent the bulk of environmental impact from data centers. This is becoming a priority for technology companies. For example, due to such large emissions, Microsoft has pledged four primary actions to address this issue:

  1. reducing direct operational emissions for Scope 1 (direct emissions owned by a company) and Scope 2 (indirect emissions from power sources used by the company) ;
  2. accelerating its carbon removal efforts;
  3. designing and optimizing for circularity in reusing cloud hardware; and
  4. improving biodiversity and protecting more land than it uses.[3]

Renewable energy and attendant investments will play an important role in creating a green, circular ecosystem, but fossil fuels will largely run the initial advancements of AI.

Insight 2. Operational emissions represent the bulk of environmental impact from data centers

  1. Carbon-intensive electricity sources drive operational emissions
  2. Al companies prioritize innovation over sustainability to beat competition
  3. Renewable investments decoupled from data centers limited hourly emission reductions

3% non-operational

Transportation and Distribution

Transportation and Distribution

End-of-Life Treatment of Sold Product

End-of-Life Treatment of Sold Product

Processing of Sold Products

Processing of Sold Products

Use of Sold Products

Use of Sold Products

97% operational

Company vehicle

Company vehicle

Business travel

Business travel

Use of Sold Products

Diesel Generator

Purchase of Electricity, Heat, Steam, etc.

Purchase of Electricity, Heat, Steam, etc.

Source: Operational vs. Non-Operational Emissions (Data Centre Life Cycle analysis)
Towards a Systematic Survey for Carbon Neutral Data Centers

And third, selecting the right data center location can drastically cut operational emissions by at least 60%. Key considerations for site selection include power purchase agreements (PPAs) and access to carbon-free energy (CFE) sources such as solar, wind, hydroelectric, and geothermal. In its most recent 2024 sustainability report, Google’s total greenhouse gas emissions increased by 13% year over year, primarily driven by data center energy consumption and supply chain emissions in “hard-to-decarbonize” regions such as the Asia Pacific where CFE isn’t readily available.[4]

The recent explosion of large language models (LLMs) and attendant data center expansions has forced a radical rethinking in how tech companies and countries approach their electrical grid. Adding more power generation capacities and capabilities has caused countries to reassess their larger net-zero and decarbonization goals. With the IEA projecting that global data center electricity demand will more than double by 2026 – largely driven by LLMs and data centers – there is a need to reevaluate and rethink the very nature of electricity consumption in trying to match supply and demand.[5]

Insight 3. Choice of data center location can reduce operating emissions by at least 60%

Google's Data Center Emissions per Region

global regional choice of data center locations which can reduce emissions by at least 60%
Insight 3. Choice of data center location can reduce operating emissions by at least 60%

Source: WTW and Wharton

The intermittent nature of renewable energy will require greater coordination between technology companies and electrical utilities in mapping out the grid to better absorb the large electricity demand required by hyperscale data centers, whose main purpose is to handle the extreme scalability capabilities optimized for networked infrastructure and large-scale workloads of GenAI models.[6]

The creation and deployment of new technologies and standards – such as Green AI Code of Conduct; environmental, social and governmental protocols; specialized hardware accelerators; water cooling systems; 3D chips; and non-silicon semiconductors – may bring computation to where renewable energy is sufficient and where competitive behavior will be awarded by consumers through the use of more environmentally friendly chatbots.

Part 2 - Hardware selection: The Landscape — CPUs vs. GPUs vs. TPUs vs NPUs

Hardware selection will ultimately rest upon what functions and roles researchers and developers require when systematizing AI applications. Depending on cost, efficiency, scalability and purpose of AI projects, choosing the right processors to power the architecture will be determined when building and training models. These processing units are the fundamental computing engines of the hardware that powers deep learning and high performance inferencing tasks and can have a material impact on the sustainability and environmental impact of technology.

The processing units that make up and perform the complicated tasks involved with AI center around central processing units (CPUs), graphics processing units (GPUs), tensor processing units (TPUs), and neural processing units (NPUs). Choosing which processing unit is needed for which operation depends on striking a delicate balance between complexity, cost-efficiency for real-world applications and environmental impacts.

Insight 4. Why is Nvidia GPU preferred?

diagram showing why Nvidia is preferred
Insight 4. Why is Nvidia GPU preferred?

CPUs have multiple cores and are commonly known as the brain of the computer, executing the commands needed for a computer’s operating system. Due to their versatility, cost effectiveness, and convenience in availability, they can handle simple and general purpose computing tasks; however, CPUs can face bandwidth and memory issues.[7] A lack of dedicated hardware for powerful and specific machine learning operations makes CPUs an inferior processing unit when compared with GPUs and TPUs.

In recent years, GPUs (more specifically, Nvidia’s Ampere, Hopper, Lovelace, and Blackwell GPUs) have surpassed the roles traditionally required of CPUs due to their superior abilities in computing power and attendant operations. Designed for parallel processing and to accelerate the rendering of 3D graphics,[8] GPUs are now used in high performance computing (HPC), deep learning, and training and inference. Working in conjunction with CPUs, GPU parallel computing helps to accelerate some of the CPUs’ functions, with both sharing similar internal components such as core, memory, and control units.[9]

Google created its TPUs as an AI accelerator application-specific integrated circuit (ASIC) for use in neural network machine learning based on its own TensorFlow software. TPUs differ from GPUs in that TPUs’ specialized feature is its utilization of matrix multiplication for AI training and inference whereas GPUs are ideal for algorithms that process large datasets found in AI workloads.[10] GPUs are the primary compute hardware for AI applications, but specialized AI hardware such as Google's TPUs offer greater energy efficiency, being tailor-made for AI tasks.

Device NPUs have architecture that simulates the brain’s neural network. Unlike general purpose CPUs and GPUs, NPUs are optimized for handling AI-related tasks while differing from TPUs and other ASICs. While ASICs are designed for a singular purpose, NPUs offer more flexibility due to their tailor-made requirements for neural network computations.[11] As demands for processing performance increased, NPUs were regarded as a specialized solution for handling new AI tasks that CPUs and GPUs were not built for.

The AI hardware landscape is rapidly expanding with new entrants such as the Cerebras AI processor, Ampere CPU, and Graphcore IPU, driven by the burgeoning use of AI. With industry measuring energy efficiency in TOPS/W, specialized hardware options have demonstrated up to 1.5 times more energy efficiency over GPUs. Despite this, Nvidia maintains market dominance thanks to its comprehensive ecosystem of DGX hardware (enterprise AI combining software, infrastructure and expertise) and CUDA software, the latter being Nvidia’s proprietary parallel computing platform, developed around the company’s market-leading GPUs.

Insight 5. Architecture comparison

Architecture, such as Google TPUs, are optimised for tensor operations

Architecture, such as Google TPUs, are optimised for tensor operations

  • Higher performance for large neural network training
  • More energy efficient than GPUs for AI workloads

Tables showing architecture comparisons

Source: WTW and Wharton

Feature Nividia H100 Google TPUv5
Architecture Hopper TPUv5
Tensor scores 80 64
Floating points performance 180 TFLOPS 180 TFLOPS
Power consumption 450W 300W
Efficiency 4 TFLOPS/W 6TFLOPS/W
Other emerging options: Cerebras AI prosessors, Ampere CPUs …

Other emerging options: Cerebras AI processors, Ampere CPUs …

  • Designed for maximum performance/watt at scale
  • Architectural, software, and cooling advantages

Tables showing architecture comparisons

Source: WTW and Wharton

Spec Cerebras
CS-3
B200 DGX
B200
GB200
NVL72
FP16 PFLOPs 125 4.4 36 360
Memory(GB) 1,200,000 192 1,536 13,500
NVLlnk I Fabric
Bandwidth (TB/s)
26,750 1.8 14.4 130
Power (Watts) 23,000 1,000 14,300 120.000
PFLOPs/W 0.005 0.004 0.003 0.003

AI models undergo training (the first phase for an AI model where the model is shown desired inputs and outputs) and inference (the process that follows AI training where the model recognizes inputted data and makes predictions). Initially, it was believed that the training of AI models was higher than inference costs. However, companies such as Nvidia and Amazon now believe that inference can exceed the cost of training, and that inference may account for up to 90% of machine learning costs for AI systems[12] while Google estimates that 60% of energy used goes towards inference and 40% towards training.[13]

Cost-effective AI workloads will depend on utilizing CPUs or GPUs (or a combination of the two) in a system architecture with clear goals aimed at accomplishing specific and/or complex tasks across multiple industries and platforms. For instance, it is estimated that OpenAI’s ChatGPT was trained on over 20,000 Nvidia A100 GPUs and future ChatGPT versions will require over 30,000 H100 GPUs. [14]

Insight 6. Reduction in gross CO2 emissions since 2017

Graph showing the reduction in gross CO2 emissions since 2017
Insight 6. Reduction in gross CO2 emissions since 2017

The lower purple line is for Evolved Transformer [So19] on TPUv2s and the upper blue line is for Primer [So21] on TPUv4s, both run in Google datacenters.

Source: WTW and Wharton

Given the high cost and large carbon footprint of such computational power, start-ups and alternatives in the LLM and chip space are challenging the established dominance of ChatGPT and Nvidia, respectively.

Given these numbers, new means and methods have been devised to reduce the carbon footprint of these models. While GPUs remain preferable for training AI models, inference tasks are increasingly shifting to specialized hardware, yielding significant efficiency improvements. Federated learning, neuromorphic computing, and implementing 4M best practices – what Google refers to as Model, Machine, Mechanization, and Map – can help reduce energy usage and carbon emissions by 100 times and 1000 times, respectively.[15]

Insight 7. New Opportunities in Federated Learning

  1. Federated learning decentralizes model training, allowing diverse data insights and preserving privacy
  2. This method supports efficient collaboration, requiring only model updates, not the full datasets, to be transmitted
  3. Smaller AI models collaborate across devices, adapting more dynamically to localized data for tailored solutions
  4. Federated learning catalyzes breakthroughs in AI, leveraging wider data sources for robust, adaptable models

Part 3 - Green AI: Optimizing algorithms for energy efficiency and sustainability

AI's energy issues can be tackled by optimizing hardware, but further miniaturization of microelectronics is not feasible in the long-term. Since GenAI’s training process – using LLMs – consumes considerable energy, optimization must focus on algorithms. Enhancements in data collection, processing techniques, choosing more efficient libraries, and improving training algorithm efficiency are essential.[16]

There are four valuable insights for guiding companies' developers in writing eco-friendly code. First, using efficient AI models helps decrease energy use and carbon emissions. To gauge a machine learning model's carbon footprint, look at the energy intensive stages: model training, inference execution, and the production of computing hardware and data center infrastructure. Among these three areas, training costs exceed inference costs in the initial stages of a non-deployed LLM. Training just one LLM model can emit an estimated 300 tons of CO2.

Insight 8. Employing efficient Al models can reduce energy needs and carbon emissions

  • Carbon footprint in ML includes training the model, running inference, and the production of computing hardware and data center capabilities
  • More parameters and training data mean more energy consumption and carbon generation
  • Training models are the most energy intensive components in AI (training a single LLM can use an estimated 300 tons of CO2)
KWh = Hours to train x Number of Processors x Average Power per Processor x PUE ÷ 1000
tCO2e = KWh x kg CO2e perKWh ÷ 1000

Source: WTW and Wharton

Model GPT-3 Bloom LLaMa LLaMa-2 T5 PaLM
Developer OpenAI BigScience Meta Google
Model Size (# of parameters) 175B 175B 7B,13B, 33B, 65B 7B, 13B, 34B, 70B 11B 540B
Training Data (# of tokens) 300B 350B 1.4T 2T 34B 795B
Training Compute (FLOPS) 3.2E+23 3.7E+23 9.9E+23 1.5E+24 2.2E+21 2.6E+24
Processor Hours 3,552,000 1,082,990 1,770,394 3,311,616 245,760 8,404,992
Grid Carbon Intensity (kgCO2e/KWh 0.429 0.057 0.385 0.423 0.545 0.079
Data Center Efficiency (PUE) 1.1 1.2 1.1 1.1 1.12 1.08
Energy Consumption 1,287 520 779 1,400 86 3,436
Carbon Emissions (tCO2e) 552 30 300 593 47 271

Second, when evaluating the model, it is critical to assess its generality as this will provide an understanding of its energy consumption. The broader a model’s capabilities, the larger its energy consumption will be. Multi-purpose, generative frameworks consume more energy compared with those designed for specific tasks. For example, task-specific systems include voice assistants, recommendation algorithms, autonomous vehicles, and image recognition tools, whereas general, generative AI (AGI) systems encompass ChatGPT, DALL-E, and Google Bard.

Third, developers must review each task their system performs, as certain operations can demand more energy. Factors influencing energy consumption include the task's complexity, the length of generated text, and whether an image is produced. Employing skilled and considerate programmers will facilitate this aspect of energy efficiency.

When collaborating with developers, it is recommended to integrate sustainability considerations from the start, alongside discussions on model expectations, accuracy, and governance. Rushing the planning process can lead to hasty development and poor outcomes in the long-term. Last, effective prompt engineering is crucial for decreasing AI's computational needs and carbon footprint. Prompt engineering optimizes inputs to yield better outputs from a generative AI model.

Higher quality inputs lead to more accurate and efficient responses, improving model performance and sustainability. Techniques such as using contextual prompts, compressing prompts, caching, reusing prompts and optimizing them can help achieve more pertinent outputs while cutting down on energy consumption.

Insight 9. General algorithms use more energy than task-specific systems

Generality" comes at a steep cost to the environment, given the amount of energy these systems require. Multi-purpose, generative architectures are more energy expensive than task-specific systems. Explore task-specific Al tools rather than general, generative Al (AGI).

General algorithms use more energy than task-specific systems
Increasing energy consumption and carbon emissions
Prompt engineering Design and craft prompts to guide the model's responses effective
Retrieval augmented engineering Retrieve data from outside the model and augment the prompts by adding the relevant retrieved data in context
Parameter efficient tuning Fine-tune the model with a minimal number of parameters
Full fine tuning Fine-tune the model by updating all the parameters
Training from scratch Build your own model

Insight 10. Some tasks are more energy intensive than others

But even with task specific Al tools, some tasks can be more energy intensive than others. Factors that affect energy intensity:

Mean and standard deviation of energy per l ,000 queries for the IO tasks examined

Source: WTW and Wharton

Inference energy (kWh)
task mean std
text classification 0.002 0.001
extractive QA 0.003 0.001
masked language modeling 0.003 0.001
token classification 0.004 0.002
image classification 0.007 0.001
object detection 0.038 0.02
text generation 0.047 0.03
summarization 0.049 0.01
image captioning 0.063 0.02
image generation 2.907 3.31

Insight 11. Green prompt engineering is crucial for reducing Al's computational needs and carbon foot print

  1. 01

    The Art of Prompt Engineering

    Craft inputs that elicit effective and efficient responses, improving model performance and sustainability.

  2. 02

    Strategies to Reduce Computation Load

    1. Contextual prompts
    2. Prompt compression
    3. Caching
    4. Reusing prompts
  3. 03

    Prompt Optimization

    Optimize prompts in order to achieve more relevant output and reduce energy use.

  4. 04

    Efficient Prompting Guidelines

    Keep prompts concise, experiment gradually with different prompts and use reproducible prompts

When developing a system, consider the following:

  • Process Improvements
  • Model choices
  • Training and tools

Process improvements: Begin with framing problems and scopes with sustainability in mind. Concurrently, monitor utilization metrics and refine configurations for performance with a low carbon footprint.

Model choices: When selecting a model, aim for lightweight base models – also referred to as task-specific models in this article. These models have fewer layers and parameters, which reduces computational overhead and allows for easy deployment across various hardware platforms. They can be adjusted in scale according to the task requirements and available resources. Additionally, employ prompt engineering and parameter-efficient fine-tuning during customization.

Consulting your organization’s technology team can help to evaluate how the computational load from model training will be allocated across specialized hardware such as GPUs and TPUs. As mentioned above in the section on hardware selection, GPUs excel at matrix operations, while TPUs are specifically designed for machine learning tasks. Essentially, inquire about what hardware is being utilized and whether it is optimized based on its capabilities. Addressing and acting on these questions can directly reduce the computational burden in a system, thereby lowering overall energy consumption.

Training and tools: Finally, using tools such as CodeCarbon to obtain real-time metrics on the model’s carbon footprint can help in reducing overall carbon emissions. CodeCarbon, a Python package, helps developers reduce emissions by optimizing their code and utilizing cloud infrastructure in regions that rely on renewable energy. These tools assist in evaluating algorithms from an environmental perspective, allowing developers to actively analyze and validate their code.

Think carefully about the business need and the task at hand, and choose an algorithm that meets just that task. Not every business need requires a generative Al solution.

Insight 12. What are some best practices to utilize when selecting sustainable, energy efficient algorithms?

Process Improvements

Process Improvements

  1. Frame problems and model scope with sustainability.
  2. Monitor utilization metrics.
  3. Refine configurations for performance with a low carbon footprint.
Model Choices

Model Choices

  1. Opt for lightweight base models.
  2. Use prompt engineering and parameter efficient fine tuning during customization.
Training and Tools

Training and Tools

  1. Distribute the training procedures across specialized hardware.
  2. Leverage tools like CodeCarbon to calculate algorithmic carbon footprint in real-time

The increase in carbon emissions over the past few years by the world’s leading technology companies has provided an impetus for society to search for new sources of clean energy to power the AI revolution. The internet’s insatiable appetite for data is causing companies such as Microsoft and leading tech leaders such as Jeff Bezos and Bill Gates to fund more sustainable and circular energy sources, most notably nuclear energy in the form of small modular reactors, to power the world’s 7,000 data centers.

Given that electricity demand is no longer easily predictable, country electric loads are now growing significantly faster than grid planners have forecasted, with the load growth curve soaring due to growing demands by industry and the construction of new data centers to handle AI’s explosive electricity usage. Scaling up investments in the clean energy space has now become not only a sustainability necessity but also an economic one as companies are now looking to compete with one another in the green tech space to fuel their technological advancements. While innovation in the AI and LLM space in recent years was given precedence over sustainability imperatives, companies are now trying to outdo each other in unlocking powerfully efficient green tech to meet their companies’ hyperscale abilities, hoping to achieve limitless zero-carbon energy.

Conclusion

Green AI is both a technological challenge and an environmental necessity. With AI becoming increasingly embedded in our daily lives, its considerable energy use and carbon emissions must be addressed.

The strategies discussed in this essay—ranging from algorithm and hardware optimization to embedding sustainability in development practices—offer a guide for mitigating AI's environmental effects. As societal and regulatory pressure builds, companies will find themselves in situations where consumers will reward them based on not just their environmental and sustainability pledges but also their actionable results in “greenifying” their data architecture systems.

References

  1. The Staggering Ecological Impacts of Computation and the Cloud Return to article
  2. How to Make Generative AI Greener Return to article
  3. Our 2024 Environmental Sustainability Report Return to article
  4. Google 2024 Environmental Report - Google Sustainability Return to article
  5. EPRI Home Return to article
  6. Hardware Machine Learning Return to article
  7. What Is a GPU? Graphics Processing Units Defined Return to article
  8. GPU vs CPU - Difference Between Processing Units - AWS Return to article
  9. Cloud Tensor Processing Units (TPUs) Return to article
  10. What is an NPU: the new AI chips explained Return to article
  11. Trends in AI inference energy consumption: Beyond the performance-vs-parameter laws of deep learning Return to article
  12. AI’s Growing Carbon Footprint Return to article
  13. ChatGPT Will Command More Than 30,000 Nvidia GPUs: Report Return to article
  14. The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink Return to article
  15. Generative AI’s Energy Problem Today Is Foundational Before AI can take over, it will need to find a new approach to energy Return to article
  16. US electricity load growth forecast jumps 81% led by data centers, industry: Grid Strategies Return to article

Authors


Technology and People Risks Analyst
email Email

Analyst, CRB Graduate Development Program
email Email

MBA Candidate at The Wharton School

MBA Candidate at The Wharton School

MBA Candidate at The Wharton School


MBA Candidate at The Wharton School

Contact us