Solving the AI energy dilemma

Microsoft, Google, and Amazon have all acknowledged the difficulty of building and developing computationally intensive artificial intelligence (AI) infrastructure while simultaneously trying to meet their net-zero and sustainability goals.

These ever-growing energy demands of AI starkly contrast with the global efforts to reduce carbon emissions and minimize waste. Interest in green AI has surged to spread awareness and reduce the environmental impact of such technologies.

To better understand the environmental impact of AI, we teamed up with the Collaborative Innovation Program at the Wharton School of the University of Pennsylvania. After an in-depth review and interviewing multiple senior executives across the technology, climate/ESG, and business spaces, this report identified three key areas with the greatest energy implications: data centers, hardware and algorithmic optimization.

Part 1 - Carbon, computation, and the cloud: The ecological footprint of data centers

The International Energy Agency (IEA) estimates that data centers and data transmission networks are responsible for approximately 1% of energy-related greenhouse gas emissions globally. As AI demand increases, so too will the need to build out and maintain data center warehouses, which are often powered by “dirty” electricity grids, including in Virginia’s “data center alley”, the site of 70% of the world’s internet traffic in 2019.^[1]

The energy usage of AI and data centers is shifting the long-term thinking of many technology companies. The ability of generative AI (GenAI) to produce complex data differs drastically from that of discriminative AI, as the latter are models designed for classification purposes in creating binary decisions such as approving/disapproving loan applications. Due to GenAI’s inherent complexity in generating outputs, the carbon emissions of training these models requires 10 to 15 times more energy utilizing graphics processing units (GPUs) than traditional central processing units (CPUs) due to the former’s superiority in computationally intensive tasks.^[2] These rapidly accelerating energy needs are shifting the calculus of technology companies that are now exploring previously untenable sources such as nuclear fusion and small modular reactors.

To understand these dynamics better, three insights have been provided to help companies approach data center selection. First, as use cases of AI soar, so do the energy and water consumption required to run the data centers. Currently, data centers use 6% of all electricity in the U.S. – a figure that is expected to double by 2026. This will impact energy, water, and resource capabilities as the world transitions to a lowcarbon economy and critical electrical components such as semiconductors may face shortages similar to those experienced during the COVID-19 pandemic.

Insight 1. As use of Al soars, so does the energy and water it requires

Bar chart showing the trajectory of energy and water usage as use of AI soars

Insight 1- As use of Al soars, so does the energy and water it requires

Source: Expected carbon emissions due to data center operation
(Towards a Systematic Survey for Carbon Neutral Data Centers)

Second, operational emissions represent the bulk of environmental impact from data centers. This is becoming a priority for technology companies. For example, due to such large emissions, Microsoft has pledged four primary actions to address this issue:

reducing direct operational emissions for Scope 1 (direct emissions owned by a company) and Scope 2 (indirect emissions from power sources used by the company) ;
accelerating its carbon removal efforts;
designing and optimizing for circularity in reusing cloud hardware; and
improving biodiversity and protecting more land than it uses.^[3]

Renewable energy and attendant investments will play an important role in creating a green, circular ecosystem, but fossil fuels will largely run the initial advancements of AI.

Insight 2. Operational emissions represent the bulk of environmental impact from data centers

Carbon-intensive electricity sources drive operational emissions
Al companies prioritize innovation over sustainability to beat competition
Renewable investments decoupled from data centers limited hourly emission reductions

3% non-operational

Transportation and Distribution

End-of-Life Treatment of Sold Product

Processing of Sold Products

Use of Sold Products

97% operational

Company vehicle

Business travel

Diesel Generator

Purchase of Electricity, Heat, Steam, etc.

Source: Operational vs. Non-Operational Emissions (Data Centre Life Cycle analysis)
Towards a Systematic Survey for Carbon Neutral Data Centers

And third, selecting the right data center location can drastically cut operational emissions by at least 60%. Key considerations for site selection include power purchase agreements (PPAs) and access to carbon-free energy (CFE) sources such as solar, wind, hydroelectric, and geothermal. In its most recent 2024 sustainability report, Google’s total greenhouse gas emissions increased by 13% year over year, primarily driven by data center energy consumption and supply chain emissions in “hard-to-decarbonize” regions such as the Asia Pacific where CFE isn’t readily available.^[4]

The recent explosion of large language models (LLMs) and attendant data center expansions has forced a radical rethinking in how tech companies and countries approach their electrical grid. Adding more power generation capacities and capabilities has caused countries to reassess their larger net-zero and decarbonization goals. With the IEA projecting that global data center electricity demand will more than double by 2026 – largely driven by LLMs and data centers – there is a need to reevaluate and rethink the very nature of electricity consumption in trying to match supply and demand.^[5]

Insight 3. Choice of data center location can reduce operating emissions by at least 60%

Google's Data Center Emissions per Region

global regional choice of data center locations which can reduce emissions by at least 60%

Insight 3. Choice of data center location can reduce operating emissions by at least 60%

Source: WTW and Wharton

The intermittent nature of renewable energy will require greater coordination between technology companies and electrical utilities in mapping out the grid to better absorb the large electricity demand required by hyperscale data centers, whose main purpose is to handle the extreme scalability capabilities optimized for networked infrastructure and large-scale workloads of GenAI models.^[6]

The creation and deployment of new technologies and standards – such as Green AI Code of Conduct; environmental, social and governmental protocols; specialized hardware accelerators; water cooling systems; 3D chips; and non-silicon semiconductors – may bring computation to where renewable energy is sufficient and where competitive behavior will be awarded by consumers through the use of more environmentally friendly chatbots.

Part 2 - Hardware selection: The Landscape — CPUs vs. GPUs vs. TPUs vs NPUs

Hardware selection will ultimately rest upon what functions and roles researchers and developers require when systematizing AI applications. Depending on cost, efficiency, scalability and purpose of AI projects, choosing the right processors to power the architecture will be determined when building and training models. These processing units are the fundamental computing engines of the hardware that powers deep learning and high performance inferencing tasks and can have a material impact on the sustainability and environmental impact of technology.

The processing units that make up and perform the complicated tasks involved with AI center around central processing units (CPUs), graphics processing units (GPUs), tensor processing units (TPUs), and neural processing units (NPUs). Choosing which processing unit is needed for which operation depends on striking a delicate balance between complexity, cost-efficiency for real-world applications and environmental impacts.

Insight 4. Why is Nvidia GPU preferred?

Insight 4. Why is Nvidia GPU preferred?

CPUs have multiple cores and are commonly known as the brain of the computer, executing the commands needed for a computer’s operating system. Due to their versatility, cost effectiveness, and convenience in availability, they can handle simple and general purpose computing tasks; however, CPUs can face bandwidth and memory issues.^[7] A lack of dedicated hardware for powerful and specific machine learning operations makes CPUs an inferior processing unit when compared with GPUs and TPUs.

In recent years, GPUs (more specifically, Nvidia’s Ampere, Hopper, Lovelace, and Blackwell GPUs) have surpassed the roles traditionally required of CPUs due to their superior abilities in computing power and attendant operations. Designed for parallel processing and to accelerate the rendering of 3D graphics,^[8] GPUs are now used in high performance computing (HPC), deep learning, and training and inference. Working in conjunction with CPUs, GPU parallel computing helps to accelerate some of the CPUs’ functions, with both sharing similar internal components such as core, memory, and control units.^[9]

Google created its TPUs as an AI accelerator application-specific integrated circuit (ASIC) for use in neural network machine learning based on its own TensorFlow software. TPUs differ from GPUs in that TPUs’ specialized feature is its utilization of matrix multiplication for AI training and inference whereas GPUs are ideal for algorithms that process large datasets found in AI workloads.^[10] GPUs are the primary compute hardware for AI applications, but specialized AI hardware such as Google's TPUs offer greater energy efficiency, being tailor-made for AI tasks.

Device NPUs have architecture that simulates the brain’s neural network. Unlike general purpose CPUs and GPUs, NPUs are optimized for handling AI-related tasks while differing from TPUs and other ASICs. While ASICs are designed for a singular purpose, NPUs offer more flexibility due to their tailor-made requirements for neural network computations.^[11] As demands for processing performance increased, NPUs were regarded as a specialized solution for handling new AI tasks that CPUs and GPUs were not built for.

The AI hardware landscape is rapidly expanding with new entrants such as the Cerebras AI processor, Ampere CPU, and Graphcore IPU, driven by the burgeoning use of AI. With industry measuring energy efficiency in TOPS/W, specialized hardware options have demonstrated up to 1.5 times more energy efficiency over GPUs. Despite this, Nvidia maintains market dominance thanks to its comprehensive ecosystem of DGX hardware (enterprise AI combining software, infrastructure and expertise) and CUDA software, the latter being Nvidia’s proprietary parallel computing platform, developed around the company’s market-leading GPUs.

Insight 5. Architecture comparison

Architecture, such as Google TPUs, are optimised for tensor operations

Higher performance for large neural network training
More energy efficient than GPUs for AI workloads

Tables showing architecture comparisons
Source: WTW and Wharton
Feature	Nividia H100	Google TPUv5
Architecture	Hopper	TPUv5
Tensor scores	80	64
Floating points performance	180 TFLOPS	180 TFLOPS
Power consumption	450W	300W
Efficiency	4 TFLOPS/W	6TFLOPS/W

Other emerging options: Cerebras AI processors, Ampere CPUs …

Designed for maximum performance/watt at scale
Architectural, software, and cooling advantages

Tables showing architecture comparisons
Source: WTW and Wharton
Spec	Cerebras CS-3	B200	DGX B200	GB200 NVL72
FP16 PFLOPs	125	4.4	36	360
Memory(GB)	1,200,000	192	1,536	13,500
NVLlnk I Fabric Bandwidth (TB/s)	26,750	1.8	14.4	130
Power (Watts)	23,000	1,000	14,300	120.000
PFLOPs/W	0.005	0.004	0.003	0.003

AI models undergo training (the first phase for an AI model where the model is shown desired inputs and outputs) and inference (the process that follows AI training where the model recognizes inputted data and makes predictions). Initially, it was believed that the training of AI models was higher than inference costs. However, companies such as Nvidia and Amazon now believe that inference can exceed the cost of training, and that inference may account for up to 90% of machine learning costs for AI systems^[12] while Google estimates that 60% of energy used goes towards inference and 40% towards training.^[13]

Cost-effective AI workloads will depend on utilizing CPUs or GPUs (or a combination of the two) in a system architecture with clear goals aimed at accomplishing specific and/or complex tasks across multiple industries and platforms. For instance, it is estimated that OpenAI’s ChatGPT was trained on over 20,000 Nvidia A100 GPUs and future ChatGPT versions will require over 30,000 H100 GPUs. ^[14]

Insight 6. Reduction in gross CO2 emissions since 2017

Graph showing the reduction in gross CO2 emissions since 2017

Insight 6. Reduction in gross CO2 emissions since 2017

The lower purple line is for Evolved Transformer [So19] on TPUv2s and the upper blue line is for Primer [So21] on TPUv4s, both run in Google datacenters.

Source: WTW and Wharton

Given the high cost and large carbon footprint of such computational power, start-ups and alternatives in the LLM and chip space are challenging the established dominance of ChatGPT and Nvidia, respectively.

Given these numbers, new means and methods have been devised to reduce the carbon footprint of these models. While GPUs remain preferable for training AI models, inference tasks are increasingly shifting to specialized hardware, yielding significant efficiency improvements. Federated learning, neuromorphic computing, and implementing 4M best practices – what Google refers to as Model, Machine, Mechanization, and Map – can help reduce energy usage and carbon emissions by 100 times and 1000 times, respectively.^[15]

Insight 7. New Opportunities in Federated Learning

Federated learning decentralizes model training, allowing diverse data insights and preserving privacy
This method supports efficient collaboration, requiring only model updates, not the full datasets, to be transmitted
Smaller AI models collaborate across devices, adapting more dynamically to localized data for tailored solutions
Federated learning catalyzes breakthroughs in AI, leveraging wider data sources for robust, adaptable models

Part 3 - Green AI: Optimizing algorithms for energy efficiency and sustainability

AI's energy issues can be tackled by optimizing hardware, but further miniaturization of microelectronics is not feasible in the long-term. Since GenAI’s training process – using LLMs – consumes considerable energy, optimization must focus on algorithms. Enhancements in data collection, processing techniques, choosing more efficient libraries, and improving training algorithm efficiency are essential.^[16]

There are four valuable insights for guiding companies' developers in writing eco-friendly code. First, using efficient AI models helps decrease energy use and carbon emissions. To gauge a machine learning model's carbon footprint, look at the energy intensive stages: model training, inference execution, and the production of computing hardware and data center infrastructure. Among these three areas, training costs exceed inference costs in the initial stages of a non-deployed LLM. Training just one LLM model can emit an estimated 300 tons of CO2.

Insight 8. Employing efficient Al models can reduce energy needs and carbon emissions

Carbon footprint in ML includes training the model, running inference, and the production of computing hardware and data center capabilities
More parameters and training data mean more energy consumption and carbon generation
Training models are the most energy intensive components in AI (training a single LLM can use an estimated 300 tons of CO2)

KWh = Hours to train x Number of Processors x Average Power per Processor x PUE ÷ 1000
tCO2e = KWh x kg CO2e perKWh ÷ 1000

Source: WTW and Wharton
Model	GPT-3	Bloom	LLaMa	LLaMa-2	T5	PaLM
Developer	OpenAI	BigScience	Meta		Google
Model Size (# of parameters)	175B	175B	7B,13B, 33B, 65B	7B, 13B, 34B, 70B	11B	540B
Training Data (# of tokens)	300B	350B	1.4T	2T	34B	795B
Training Compute (FLOPS)	3.2E+23	3.7E+23	9.9E+23	1.5E+24	2.2E+21	2.6E+24
Processor Hours	3,552,000	1,082,990	1,770,394	3,311,616	245,760	8,404,992
Grid Carbon Intensity (kgCO2e/KWh	0.429	0.057	0.385	0.423	0.545	0.079
Data Center Efficiency (PUE)	1.1	1.2	1.1	1.1	1.12	1.08
Energy Consumption	1,287	520	779	1,400	86	3,436
Carbon Emissions (tCO2e)	552	30	300	593	47	271

Second, when evaluating the model, it is critical to assess its generality as this will provide an understanding of its energy consumption. The broader a model’s capabilities, the larger its energy consumption will be. Multi-purpose, generative frameworks consume more energy compared with those designed for specific tasks. For example, task-specific systems include voice assistants, recommendation algorithms, autonomous vehicles, and image recognition tools, whereas general, generative AI (AGI) systems encompass ChatGPT, DALL-E, and Google Bard.

Third, developers must review each task their system performs, as certain operations can demand more energy. Factors influencing energy consumption include the task's complexity, the length of generated text, and whether an image is produced. Employing skilled and considerate programmers will facilitate this aspect of energy efficiency.

When collaborating with developers, it is recommended to integrate sustainability considerations from the start, alongside discussions on model expectations, accuracy, and governance. Rushing the planning process can lead to hasty development and poor outcomes in the long-term. Last, effective prompt engineering is crucial for decreasing AI's computational needs and carbon footprint. Prompt engineering optimizes inputs to yield better outputs from a generative AI model.

Higher quality inputs lead to more accurate and efficient responses, improving model performance and sustainability. Techniques such as using contextual prompts, compressing prompts, caching, reusing prompts and optimizing them can help achieve more pertinent outputs while cutting down on energy consumption.

Insight 9. General algorithms use more energy than task-specific systems

Generality" comes at a steep cost to the environment, given the amount of energy these systems require. Multi-purpose, generative architectures are more energy expensive than task-specific systems. Explore task-specific Al tools rather than general, generative Al (AGI).

General algorithms use more energy than task-specific systems
Increasing energy consumption and carbon emissions
Prompt engineering	Design and craft prompts to guide the model's responses effective
Retrieval augmented engineering	Retrieve data from outside the model and augment the prompts by adding the relevant retrieved data in context
Parameter efficient tuning	Fine-tune the model with a minimal number of parameters
Full fine tuning	Fine-tune the model by updating all the parameters
Training from scratch	Build your own model

Insight 10. Some tasks are more energy intensive than others

But even with task specific Al tools, some tasks can be more energy intensive than others. Factors that affect energy intensity:

Mean and standard deviation of energy per l ,000 queries for the IO tasks examined
Source: WTW and Wharton
		Inference energy (kWh)
task	mean	std
text classification	0.002	0.001
extractive QA	0.003	0.001
masked language modeling	0.003	0.001
token classification	0.004	0.002
image classification	0.007	0.001
object detection	0.038	0.02
text generation	0.047	0.03
summarization	0.049	0.01
image captioning	0.063	0.02
image generation	2.907	3.31

Insight 11. Green prompt engineering is crucial for reducing Al's computational needs and carbon foot print

01

The Art of Prompt Engineering

Craft inputs that elicit effective and efficient responses, improving model performance and sustainability.
02

Strategies to Reduce Computation Load
1. Contextual prompts
2. Prompt compression
3. Caching
4. Reusing prompts
03

Prompt Optimization

Optimize prompts in order to achieve more relevant output and reduce energy use.
04

Efficient Prompting Guidelines

Keep prompts concise, experiment gradually with different prompts and use reproducible prompts

When developing a system, consider the following:

Process Improvements
Model choices
Training and tools

Process improvements: Begin with framing problems and scopes with sustainability in mind. Concurrently, monitor utilization metrics and refine configurations for performance with a low carbon footprint.

Model choices: When selecting a model, aim for lightweight base models – also referred to as task-specific models in this article. These models have fewer layers and parameters, which reduces computational overhead and allows for easy deployment across various hardware platforms. They can be adjusted in scale according to the task requirements and available resources. Additionally, employ prompt engineering and parameter-efficient fine-tuning during customization.

Consulting your organization’s technology team can help to evaluate how the computational load from model training will be allocated across specialized hardware such as GPUs and TPUs. As mentioned above in the section on hardware selection, GPUs excel at matrix operations, while TPUs are specifically designed for machine learning tasks. Essentially, inquire about what hardware is being utilized and whether it is optimized based on its capabilities. Addressing and acting on these questions can directly reduce the computational burden in a system, thereby lowering overall energy consumption.

Training and tools: Finally, using tools such as CodeCarbon to obtain real-time metrics on the model’s carbon footprint can help in reducing overall carbon emissions. CodeCarbon, a Python package, helps developers reduce emissions by optimizing their code and utilizing cloud infrastructure in regions that rely on renewable energy. These tools assist in evaluating algorithms from an environmental perspective, allowing developers to actively analyze and validate their code.

Think carefully about the business need and the task at hand, and choose an algorithm that meets just that task. Not every business need requires a generative Al solution.

Insight 12. What are some best practices to utilize when selecting sustainable, energy efficient algorithms?

Process Improvements

Frame problems and model scope with sustainability.
Monitor utilization metrics.
Refine configurations for performance with a low carbon footprint.

Model Choices

Opt for lightweight base models.
Use prompt engineering and parameter efficient fine tuning during customization.

Training and Tools

Distribute the training procedures across specialized hardware.
Leverage tools like CodeCarbon to calculate algorithmic carbon footprint in real-time

The increase in carbon emissions over the past few years by the world’s leading technology companies has provided an impetus for society to search for new sources of clean energy to power the AI revolution. The internet’s insatiable appetite for data is causing companies such as Microsoft and leading tech leaders such as Jeff Bezos and Bill Gates to fund more sustainable and circular energy sources, most notably nuclear energy in the form of small modular reactors, to power the world’s 7,000 data centers.

Given that electricity demand is no longer easily predictable, country electric loads are now growing significantly faster than grid planners have forecasted, with the load growth curve soaring due to growing demands by industry and the construction of new data centers to handle AI’s explosive electricity usage. Scaling up investments in the clean energy space has now become not only a sustainability necessity but also an economic one as companies are now looking to compete with one another in the green tech space to fuel their technological advancements. While innovation in the AI and LLM space in recent years was given precedence over sustainability imperatives, companies are now trying to outdo each other in unlocking powerfully efficient green tech to meet their companies’ hyperscale abilities, hoping to achieve limitless zero-carbon energy.

Conclusion

Green AI is both a technological challenge and an environmental necessity. With AI becoming increasingly embedded in our daily lives, its considerable energy use and carbon emissions must be addressed.

The strategies discussed in this essay—ranging from algorithm and hardware optimization to embedding sustainability in development practices—offer a guide for mitigating AI's environmental effects. As societal and regulatory pressure builds, companies will find themselves in situations where consumers will reward them based on not just their environmental and sustainability pledges but also their actionable results in “greenifying” their data architecture systems.

References

Authors

Omar Samhan

Technology and People Risks Analyst

email Email

Sonal Madhok

Technology Risks Analyst

email Email

Meghana Bhimarao

MBA Candidate at The Wharton School

Sabrina Fathi

MBA Candidate at The Wharton School

Snigdha Rege

MBA Candidate at The Wharton School

Shweta Mokashi

MBA Candidate at The Wharton School

Related capabilities

List of website locations and languages available in Americas
Location	Languages Available
Argentina	Spanish
Bermuda	English
Brazil	Portuguese
Canada	English French
Chile	Spanish
Colombia	Spanish
Costa Rica	Spanish
El Salvador	Spanish
Guatemala	Spanish
Honduras	Spanish
Mexico	Spanish
Nicaragua	Spanish
Panama	Spanish
Peru	Spanish
United States	English
Venezuela	Spanish

List of website locations and languages available in Asia-Pacific
Location	Languages Available
Australia	English
China	Simplified Chinese
Hong Kong (China, SAR)	English
India	English
Indonesia	English
Japan	Japanese
Korea	Korean
Malaysia	English
New Zealand	English
Philippines	English
Singapore	English
Taiwan	Traditional Chinese
Thailand	English Thai
Vietnam	English

List of website locations and languages available in Europe
Location	Languages Available
Austria	German
Belgium	English French Flemish
Croatia	English Croatian
Czech Republic	English Czech
Denmark	Danish
Finland	Finnish
France	French
Germany	German
Greece	Greek
Hungary	Hungarian
Ireland	English
Italy	Italian
Kazakhstan	Kazakh Russian
Luxembourg	French
Netherlands	Dutch English
Norway	Norwegian
Poland	Polish
Portugal	Portuguese
Romania	Romanian
Serbia	Serbian
Slovakia	Slovak
Spain	Spanish
Sweden	English Swedish
Switzerland	English French German
Turkey	Turkish
Ukraine	Ukrainian
United Kingdom	English

List of website locations and languages available in Middle East and Africa
Location	Languages Available
Cameroon	English French
Congo	French
Egypt	English
Ghana	English
Ivory Coast	French
Israel	English
Jordan	English
Kenya	English
Kuwait	English
Mauritius	English
Nigeria	English
Saudi Arabia	English
Senegal	French
South Africa	English
UAE	English
Uganda	English

Part 1 - Carbon, computation, and the cloud: The ecological footprint of data centers

Insight 1. As use of Al soars, so does the energy and water it requires

Insight 2. Operational emissions represent the bulk of environmental impact from data centers

3% non-operational

97% operational

Insight 3. Choice of data center location can reduce operating emissions by at least 60%

Google's Data Center Emissions per Region

Part 2 - Hardware selection: The Landscape — CPUs vs. GPUs vs. TPUs vs NPUs

Insight 4. Why is Nvidia GPU preferred?

Insight 5. Architecture comparison

Insight 6. Reduction in gross CO2 emissions since 2017

Insight 7. New Opportunities in Federated Learning

Part 3 - Green AI: Optimizing algorithms for energy efficiency and sustainability

Insight 8. Employing efficient Al models can reduce energy needs and carbon emissions

Insight 9. General algorithms use more energy than task-specific systems

Insight 10. Some tasks are more energy intensive than others

Insight 11. Green prompt engineering is crucial for reducing Al's computational needs and carbon foot print

The Art of Prompt Engineering

Strategies to Reduce Computation Load

Prompt Optimization

Efficient Prompting Guidelines

Insight 12. What are some best practices to utilize when selecting sustainable, energy efficient algorithms?

Conclusion

References

Authors

Related capabilities