background imageUnsplash

How to reduce the resource consumption of artificial intelligence

author image

By Yury Erofeev

· 9 min read


How many resources does AI consume

Modern artificial intelligence systems require colossal computing power. Training and operation of neural networks are accompanied by significant power consumption, the use of large amounts of data, and the use of powerful hardware. The more complex the model, the higher its costs, which is becoming a key challenge for the development of AI.

Electricity and CO₂

One of the most striking examples of the resource intensity of AI is the GPT-4 neural network developed by OpenAI. According to various estimates, training GPT-3 required about 1287 MWh of electricity, which is equivalent to the annual consumption of 120 American homes. It is also known that ChatGPT can now consume about 39.98 million kWh per day  —  the neural network is used by 400 million active users weekly.

A study by the University of Massachusetts showed that the creation of one neural network model can lead to the emission of more than 280 thousand kg of CO₂, the equivalent of five years of operation of a car with an internal combustion engine. Such indicators are of concern to environmentalists and are pushing businesses to look for energy-efficient solutions.

Equipment

AI also requires the use of expensive equipment: graphics processing units (GPUs), tensor processing units (TPUs), and server farms. According to Schneider Electric estimates, in 2023, 80% of the load of AI models in data centres was on generating results, and 20% on training.

Modern servers working with AI generate a lot of heat  —  their cooling becomes another expense item. The largest data centres owned by Google or Microsoft use liquid cooling systems to reduce the temperature of processors and increase their performance. However, such measures do not eliminate the problem of rapid wear of equipment: modern GPUs operating at full capacity can lose performance after 3–5 years of operation.

Water

A less obvious but significant factor is water consumption. In 2023, it was revealed that OpenAI data centres consume millions of litres of water annually to cool their servers. Research shows that one large ChatGPT call can consume up to 500 ml of water, depending on the climate and type of cooling system.

In the context of global warming and water shortages, such figures raise questions about the wise use of water in the tech industry.

As AI services such as generative models, autonomous systems, and advanced analytics become more popular, resource consumption will only increase. Companies are already developing energy-efficient chips, using renewable energy, and implementing quantum computing, but these measures do not yet solve the problem.

Shortly, the AI ​​industry will face a key challenge: how to balance the development of technology with the preservation of the environment. It is obvious that without a comprehensive approach, including model optimisation, development of “green” data centres, and implementation of more economical algorithms, the problem of excessive resource consumption of AI will remain relevant.

How to reduce resource usage

One of the most effective ways to reduce AI resource consumption is to optimise algorithms and models. Modern neural networks are often overly complex, which leads to unnecessary computational costs. Companies and researchers are working on methods to reduce the size of models without losing quality.

One of the most famous research centres working on this problem is the MIT Lincoln Laboratory Supercomputing Center (LLSC). There, scientists are developing methods to help data centres reduce energy consumption. Most importantly, they have found that these methods have a minimal impact on model performance.

Optimisation of algorithms and models

When developing neural networks, it is important not only to achieve high accuracy, but also to optimise calculations, especially if the model must work on limited resources  —  various methods are used for this.

Quantisation

Quantisation helps reduce energy consumption during calculations by reducing the precision of the numbers used for calculations. For example, instead of 32-bit numbers that can represent a range from -2 billion to 2 billion, you can use 8-bit numbers with a range from 0 to 255. This reduces energy consumption and speeds up the process, since fewer resources are needed to process smaller amounts of data. In tasks such as image or text recognition, 32-bit precision is redundant, and switching to 8 bits significantly speeds up the model.

Binarisation of weights

Binarisation of weights (Binary Neural Networks) limits the parameters of the model to only two possible values: instead of many values ​​(for example, from -1 to 1), they are limited to only two: 0 and 1. This reduces resource consumption, speeds up model training, and reduces the amount of memory needed to store data.

Pruning

Pruning eliminates unnecessary connections in the neural network, preserving its accuracy. During training, models may contain connections that do not affect the result, and they can be removed without losing quality. For example, in an image recognition model, you can remove connections responsible for unimportant details, such as the background, and thereby save computing resources. Dynamic pruning allows the model to optimise itself during operation, eliminating unimportant connections depending on the complexity of the current task, which reduces energy consumption.

Efficient architectures and adaptive algorithms

Mixture of Experts (MoE) is an architecture in which only the necessary parts of the model are activated. For example, to translate a text from English, only the part of the neural network that works with this language is used, while the rest remains inactive. This allows saving resources. Similar technologies, such as Sparse Transformers and Switch Transformers, activate only those resources that are needed at the time of the task, which increases the efficiency of the model. Adaptive algorithms can dynamically change their complexity depending on the task, for example, using early exit models. They finish calculations if the confidence in the prediction is high enough: the neural network calculates this based on its internal parameters trained on examples.

Using more efficient hardware and alternative energy sources

The second important way is to use specialised hardware. Traditional server processors (CPUs) are inefficient for processing AI tasks, so graphics processing units (GPUs) and tensor processing units (TPUs) are used.

CPUs, GPUs, and TPUs have different architectures: a CPU, as a general-purpose processor, has several cores for performing various tasks; a GPU has a large number of cores optimised for parallel computing, making it ideal for graphics and AI processing; a TPU is specialised for performing matrix and vector calculations, making it significantly more efficient for AI work compared to CPUs and GPUs, especially when processing large data sets and training neural networks.

Google created TPUs to meet the growing demand for efficient computing power in AI services such as search, YouTube, and DeepMind’s large language models. Trillium, the latest TPU, delivers 4.7x higher peak compute performance per chip than its predecessor and is 67% more power efficient, making it more robust for AI workloads.

This is a schematic of the different types of processors, from left to right: CPU, GPU and TPU (Photo: Official Google blog)

In addition to hardware solutions, the use of alternative energy sources plays an important role. Data centres that process and store huge amounts of information for AI models consume colossal amounts of electricity. Google, Microsoft, and IBM are increasingly turning to renewable energy sources such as solar and wind power plants to power their servers. Big techs are even creating strategies for building data centres that run on alternative energy.

Of particular interest is the use of nuclear energy to support AI infrastructure. Microsoft, Google, and Amazon are already considering the possibility of integrating small modular nuclear reactors (SMRs) to provide a stable power supply for data centres. SMRs are capable of producing up to 300 MW(e), which makes them a convenient solution for such facilities. In comparison, the capacity of traditional large nuclear reactors exceeds 700 MW(e), and micronuclear ones can provide up to 10 MW(e), which allows them to be used in remote areas and compact infrastructure projects.

Small modular reactors (SMRs) have a capacity of up to 300 MWe per unit (Photo: International Atomic Energy Agency)

Nuclear energy provides high productivity with a minimal carbon footprint  — according to a study published by the UN in 2021, nuclear power plants emit even less CO₂ than wind farms.

The future of artificial intelligence is closely linked to reducing its resource consumption. In the coming years, we can expect the emergence of new technologies that will make AI more energy efficient, environmentally friendly, and accessible.

Quantum computing for AI

Quantum computers can significantly speed up data processing and complex calculations, potentially reducing the energy costs of training and running models. Quantum machine learning technologies are in their early stages, but today, companies like IBM, Google, and D-Wave are actively developing this sector.

In 2019, Google announced that it had achieved quantum supremacy, performing calculations that a traditional supercomputer could not complete in a reasonable time. And IBM is working on quantum optimisation algorithms that can be applied to AI tasks.

In addition to the computational advantages, quantum technologies can significantly improve energy efficiency. Some studies suggest that in certain cases, quantum computers can be 100 times more energy efficient than classical supercomputers.

An analysis of the energy consumption of the world’s 500 largest supercomputers over the past 20 years also confirms the hypothesis that a quantum computer with similar computing power would consume less than 0.05% of the energy compared to the most energy-intensive supercomputer on the list.

Self-learning and adaptive models

Future AI models will be able to automatically adapt to the tasks at hand, minimising the need for full-scale training. This will reduce the consumption of computing resources. Examples of such technologies include sparsity-aware algorithms that activate only the necessary parts of the model, and Zero-Shot Learning, which allows models to learn without prior training on large data sets.

For example, OpenAI is developing models such as CLIP and DALL-E that can perform complex tasks without traditional training on specialised data. Researchers from the Massachusetts Institute of Technology (MIT) are also working on technologies that allow models to dynamically adapt to the load, reducing energy consumption.

Decentralised computing and edge AI

Edge computing is an approach where data processing occurs not in data centres, but directly on local devices. This reduces energy consumption and reduces latency in AI operations. When combined with blockchain technology, decentralised computing will create networks where devices exchange data, minimising the load on global server capacity. In particular, Nvidia is actively developing Edge AI devices, such as the Jetson Nano, which allow complex calculations to be performed on IoT devices with minimal energy consumption.

Despite widespread interest in this technology, precise data on the real reduction in energy consumption in mass use remains limited. Most claims about the benefits of edge computing are based on theoretical models or individual tests, the results of which are rarely published in the public domain.

illuminem Voices is a democratic space presenting the thoughts and opinions of leading Sustainability & Energy writers, their opinions do not necessarily represent those of illuminem.

Did you enjoy this illuminem voice? Support us by sharing this article!
author photo

About the author

Yury Erofeev is a Research and Development Sustainability Manager of SQUAKE, specialising in market analysis, carbon calculation methodologies, and product development within the transport and travel sectors. With a solid foundation in physics, mathematics, and sustainable development, he is passionate about driving impactful change through data-driven insights and strategic innovation.

 

Other illuminem Voices


Related Posts


You cannot miss it!

Weekly. Free. Your Top 10 Sustainability & Energy Posts.

You can unsubscribe at any time (read our privacy policy)