· 8 min read
Artificial intelligence consumes energy, materials, and capital at a scale comparable to manufacturing. The token, the smallest unit of machine reasoning, defines how much power and infrastructure intelligence requires, and its rapid inflation is driving one of the fastest increases in digital energy demand ever recorded¹.
A typical enterprise query in 2020 used fewer than 200 tokens. By 2025, models such as GPT-4 Pro and ChatGPT-5 process around 22,000 in a single exchange that includes planning, tool use, code execution, and self-correction². At the current rate of expansion, tokens per query could rise to between 150,000 and 1,500,000 by 2030, depending on task complexity². The lower bound represents hundreds of pages of text, while the upper spans thousands, transforming language models from software applications into energy-intensive industries.
Every token must be compared with every other, and the transformer architecture that governs modern models scales with roughly N² complexity³. Doubling context length multiplies computation and memory by a factor of four, pushing energy use higher than analytical benefit. Hardware efficiency improves roughly fourfold and software optimisation threefold with each generation, but token growth expands more than tenfold over the same period. The combined twelvefold efficiency gain is overwhelmed by a fiftyfold rise in tokens, producing more than a fourfold increase in net energy use per query, as shown in Figure 1. Efficiency reduces cost but expands use, reproducing the rebound pattern familiar from earlier industrial energy systems.

Competition among AI firms to demonstrate agentic behaviour amplifies the problem. Each advance in reasoning depth depends on generating far more tokens to sustain planning chains and memory states. The pursuit of longer context windows has replaced Moore’s Law as the benchmark of progress, accelerating GPU and memory lifecycles and shortening upgrade intervals beyond sustainable limits.
A rack of high-density AI servers draws between 30 and 100 kilowatts, and fully integrated racks exceed 120. A campus that required 100 megawatts in 2022 will need between 700 and 1,000 megawatts by 2030. Global data-centre electricity use is expected to rise from about 400 terawatt-hours in 2024 to nearly 1,000 by the end of the decade, with AI workloads responsible for roughly one-third of that total.
Data-centre geography follows power supply rather than user location. The United States recorded about 28 gigawatts of load in 2025 and is expected to exceed 60 by 2030. In Virginia, the Colossus I and II complexes alone secured over one gigawatt of dedicated gas generation to offset grid limits⁴. Gas turbines provide immediate capacity where renewable build-out cannot keep pace, revealing how fossil generation still underwrites digital growth.
Token inflation has become the hidden metric of industrial acceleration. The unit that once measured text now measures energy, and the race to display more complex reasoning risks outpacing both hardware and grids. Sustainability depends on redefining efficiency through tokens per watt and tokens per query rather than the obsolete cost per GPU.
Artificial intelligence expands through tokens rather than processors, and the volume of those tokens determines how infrastructure evolves. A deployment of 100,000 GPUs using NVIDIA’s next-generation Rubin architecture, expected to follow the NVL72 rack-scale design introduced with Blackwell, will draw roughly 500 megawatts, up from about 270 on previous systems⁴. Efficiency gains at the chip level are outweighed by the acceleration of token generation that forces total power consumption higher.
Next-generation processors may deliver four times the throughput per watt, and software optimisation may double that again, but the number of tokens rises about fifty-fold. Doubling context length multiplies computational work fourfold, pushing energy and capital expenditure ahead of performance. Metrics based on cost per GPU or training runs no longer describe the economics of intelligence. Cost per token and cost per query define productivity and reveal how much power and capital each act of reasoning consumes.
Capability is constrained by electricity rather than silicon. The largest clusters are built where supply is guaranteed, and investment follows generation capacity in the same way heavy industry once followed coal seams. Compute has become a function of grid geography, and proximity to generation now determines competitiveness.
Industrial policy mirrors that geography. The United States links semiconductor incentives with clean-energy programmes through the CHIPS and Science Act and the Inflation Reduction Act. China couples GPU manufacturing with hydro, solar, and nuclear expansion to stabilise supply, and the European Union aligns its Chips Act with the Net-Zero Industry Act to balance competitiveness and emissions⁵. Computation and energy strategy have merged into a single discipline where control of electrons defines control of intelligence.
Hyperscale operators co-locate with hydroelectric and nuclear plants to secure stable load profiles, diverting recovered heat to district networks or hydrogen production. Cooling has shifted from air to liquid, tripling rack density while cutting facility overhead by up to 30 per cent, but these advances cannot offset the growth in token throughput driving total demand.
Rising AI demand competes directly with the electrification of transport and heavy industry. Long-term renewable contracts signed by data-centre operators limit availability for other sectors, forcing governments to balance digital expansion against climate commitments⁶. Hardware and memory vendors remain rewarded for expanding token capacity. Larger context windows require faster processors, denser memory, and more cooling, ensuring continual upgrade cycles. Moore’s Law no longer governs progress because token growth does.
Nations compete for compute as once they did for oil or steel, and the ability to host and train frontier models defines autonomy. Rising token counts shift independence from algorithms to energy, turning the infrastructure of intelligence into a measure of national strength. Concentrating compute within a few hyperscalers grants them influence over industrial strategy and power allocation, while open-weight models remain bound by the availability of chips, water, and electricity.
Inference distributed across vehicles, factories, and domestic systems reduces latency and protects privacy but decentralises energy demand. Millions of processors performing real-time reasoning convert artificial intelligence from a centralised industry into a continuous global load. Without coordination, distributed computing will outpace renewable growth, shifting strain from national campuses to regional substations and turning decentralisation into a new form of dependence.
Semiconductor fabrication and cooling consume more than 50 billion litres of ultrapure water each year, with planned expansions expected to raise that figure by a third⁷. Competition for these resources links AI infrastructure directly with food, water, and environmental security. Energy and industrial policy have converged as advanced economies expand nuclear and renewable generation to sustain growth within climate limits, while emerging economies weigh participation in the AI economy against grid stability.
Research seeks to decouple intelligence from energy. Sparse attention, linear-scaling transformers, and mixture-of-experts architectures activate only the parameters or context fragments needed for each query, cutting computation dramatically⁸. Quantisation, retrieval-augmented inference, and optical or neuromorphic computing promise further gains. These approaches could reduce compute energy per token by more than 80 per cent and shift the industry from an arms race of scale to one of efficiency.
Consequences of such advances would be profound. Hyperscale campuses designed for gigawatt loads could face overcapacity, and investment would pivot toward algorithmic optimisation rather than brute-force expansion. Hardware lifecycles might lengthen, margins compress, and strategic advantage move from the biggest clusters to the most efficient architectures. Grid forecasts premised on continuous doubling of demand could flatten, redirecting capital toward storage and distribution rather than raw generation.
Models already balance grids, optimise renewables, and manage demand with precision that reduces waste. When directed toward efficiency rather than expansion, intelligence can stabilise the systems it strains. Projected AI electricity use of 300 to 500 terawatt-hours by 2030 still exceeds most national forecasts⁸. If token growth continues unchecked, total demand could approach one petawatt-hour by 2035, a level comparable to the combined electricity use of Japan and the United Kingdom.
The rapid expansion of AI-driven electricity demand risks undermining 2050 net-zero commitments by forcing governments to rely on transitional fossil capacity and slowing renewable integration. Without efficiency breakthroughs and coordinated regulation, AI could become the single largest source of new energy-related emissions growth this decade.
Policymakers must treat AI as a strategic energy challenge requiring coordinated investment, regulation, and grid modernisation. Measurement must shift from GPUs and FLOPs to tokens per watt and tokens per query⁹. The growth of tokens measures how intelligence consumes the material world, and without alignment between energy planning, technology design, and capital allocation, expansion will exceed both grids and governance. The limits of computation are no longer mathematical; they are material.
illuminem Voices is a democratic space presenting the thoughts and opinions of leading Sustainability & Energy writers, their opinions do not necessarily represent those of illuminem.
See how the companies in your sector perform on sustainability. On illuminem’s Data Hub™, access emissions data, ESG performance, and climate commitments for thousands of industrial players across the globe.
References
¹ International Energy Agency (IEA). Electricity 2024: Analysis and Forecast to 2026. January 2024.
² Anthropic, OpenAI, xAI, and Google DeepMind Technical Disclosures (2024–2025).
³ University of Cambridge Energy Policy Research Group (EPRG). Computational Efficiency Metrics for Artificial Intelligence Systems. 2025.
⁴ BloombergNEF. AI Power Demand Outlook 2025–2030. May 2024.
⁵ European Commission. Chips Act and Net-Zero Industry Act Alignment Brief. February 2025.
⁶ Financial Times. “AI’s Power Problem: Why Data Centres Are Turning to Gas.” FT Technology and Energy, August 2025.
⁷ Semiconductor Industry Association (SIA). 2024 Semiconductor Manufacturing and Environmental Impact Report. April 2024.
⁸ Goldman Sachs Global Research. AI Compute, Energy and Infrastructure: 2030 Scenarios. April 2025.
⁹ University of Cambridge EPRG. Computational Efficiency Metrics for Artificial Intelligence Systems. 2025.






