Carbon Footprint of Industrial AI Inference: 2026 Living Benchmark

Living benchmark — last updated 2026-06-03. The honest answer to “what is the industrial AI inference carbon footprint in 2026?” is: it ranges from roughly 0.02 g CO2e per inference on a renewables-fed edge gateway in Norway to well over 3 g CO2e per inference on a coal-leaning grid running large transformer models — a spread of more than two orders of magnitude before you even count embodied carbon. That headline is illustrative; this page is the working ledger we keep updating as MLPerf Power rounds, IEA grid data, and community submissions land. If you came here for one single number to drop into a sustainability slide, that number does not exist, and any vendor offering it is selling you a story rather than a measurement.

Architecture at a glance

Carbon Footprint of Industrial AI Inference: 2026 Living Benchmark — architecture diagram — Architecture diagram — Carbon Footprint of Industrial AI Inference: 2026 Living Benchmark

This post sets out a transparent methodology for measuring the industrial AI inference carbon footprint in 2026, lists the workloads and hardware tiers we benchmark, maps the regional grids that dominate the result, and gives you illustrative ranges per business outcome — grams of CO2e per detected defect, per anomaly, per drafted work order. Every number is labelled as either a cited primary source or an illustrative community estimate. Treat the framework as the durable bit and the numbers as a snapshot that will move.

If you are a sustainability lead trying to answer CSRD double-materiality questions on AI, a plant engineer being asked to justify a vision-QC rollout on carbon grounds, or an ML platform owner pricing inference in gCO2e/kWh alongside dollars, this benchmark is built for you.

Why this matters in 2026 (CSRD, SEC, customer mandates)

Three regulatory currents converged over the past 18 months and pushed AI inference carbon from a nice-to-have ESG narrative into a balance-sheet line item.

First, the EU Corporate Sustainability Reporting Directive (CSRD) phase-in is now biting mid-cap industrials in Europe in their FY2025 reports filed during 2026. CSRD demands Scope 1, 2, and 3 disclosure under European Sustainability Reporting Standards (ESRS E1), and the European Financial Reporting Advisory Group’s (EFRAG) guidance has made it clear that material upstream and downstream emissions from purchased cloud and AI services fall into Scope 3 category 1 (“purchased goods and services”) and category 11 (“use of sold products”) for vendors. If you ship a connected product that runs ML inference at the edge or in the cloud, the operational energy of that inference is now disclosable.

Second, the US Securities and Exchange Commission’s climate disclosure rule, despite its bumpy legal ride, has driven Fortune 500 issuers to formalise GHG accounting that includes IT load. Many large industrial buyers are now flowing requirements down: if you want to remain on the approved supplier list for a major automaker or aerospace OEM, you have to report attributable kgCO2e per million inferences served.

Third, customer mandates from buyers themselves — Unilever, Maersk, Ørsted, Tata Steel — increasingly include “AI carbon transparency” clauses in master service agreements. A 2025 BCG survey of 200 industrial procurement leads found that 41% had added an emissions-per-inference clause to at least one ML services contract in the prior 12 months.

The implication is concrete: industrial ML teams need a defensible, auditable per-inference carbon number, broken down by hardware tier, region, and workload — not a vendor-supplied headline figure. That is what this benchmark exists to support.

Methodology and assumptions

Carbon accounting for AI inference is a swamp of overlapping methodologies, and the honest move is to surface our assumptions before any numbers.

Emission factors. For grid intensity we use a blended approach: International Energy Agency (IEA) 2025 country and regional averages as the long-run anchor, Ember’s 2025 European Electricity Review and global Yearly Electricity Data for higher-frequency country updates, and Electricity Maps’ hourly data for region-of-interest spot checks. Where a hyperscaler publishes its own contracted PPA mix (Google Cloud regional carbon-free energy percentage, AWS regional carbon intensity, Azure Emissions Impact Dashboard), we record both the average grid figure and the contracted figure separately. We never silently substitute one for the other.

Hardware power. For accelerators we anchor on MLPerf Inference Power v4.x results where they exist, NVIDIA published TDP for unmeasured configurations, and Cloud Carbon Footprint’s open coefficients for general server load. For microcontrollers and edge SoCs we use vendor datasheets corroborated by CodeCarbon and Green Algorithms benchmark methodology. We measure under three load states: idle, typical (steady-state inference), and peak.

Idle vs active attribution. A GPU drawing 130 W at idle and 650 W under load cannot be attributed entirely to active inference. We split idle power across the average time-shared tenancy of the device when known, and otherwise apply a 30/70 rule: 30% of idle power follows the workload, 70% is “facility overhead” attributable to provisioning, not requests. This is a defensible heuristic, not gospel — Hugging Face’s research methodology and the Green Software Foundation’s Software Carbon Intensity (SCI) specification debate this split, and we publish results both ways.

PUE. Power Usage Effectiveness multiplies IT load to give total facility load. We use 1.15 for best-in-class hyperscaler regions (matching Google’s reported 2024 fleet average), 1.25 for typical Tier III commercial colocation, 1.4 for legacy enterprise data centres, and 1.05 for on-prem edge gateways that ride existing factory HVAC.

Embodied carbon. Manufacturing, transport, and disposal of silicon and chassis. We amortise an estimated 1,500 kg CO2e per high-end GPU (consistent with publicly available Dell, Lenovo, and HPE lifecycle assessments) and 60 kg CO2e per Jetson-class edge device over a five-year useful life, then divide by lifetime inferences served. This is the most uncertain bucket; we report it as a separate line so readers can recompute.

Training carbon amortisation. A trained model has carbon debt. We amortise reported training-run emissions (where vendors publish them — OpenAI, Anthropic, Meta, Mistral have for select models) across the model’s served lifetime inferences. For models with no published training carbon, we exclude rather than guess.

Network energy. Bits over fibre and through switches consume energy. For end-to-end accounting we add a network energy coefficient from the IEA’s 2024 “Networks” tracker (roughly 0.06 kWh per GB transferred for fixed networks, 0.1 for mobile), applied to the payload size of each inference.

Attribution model. We use the GHG Protocol Scope 2 dual reporting approach (location-based + market-based) and tag every result accordingly. SCI v1.2 from the Green Software Foundation provides the functional unit framework: gCO2e per inference, gCO2e per detected outcome.

What we do not claim. This is not a Life Cycle Assessment per ISO 14040/44. It is a transparent operational carbon accounting suitable for engineering trade-off conversations and as an input to formal LCAs.

Benchmark cohort: workloads we measure

The benchmark covers six representative industrial inference workloads. Each is sized to a realistic plant duty cycle, not synthetic peak.

Workload	Domain	Model class	Typical input	Typical latency budget	Duty cycle
WL-01 Vibration anomaly	Predictive maintenance	1D CNN (~300 k params)	1 s accelerometer window	100 ms	1 inference/sec per asset
WL-02 Surface defect QC	Quality inspection	YOLO-N / YOLOv11n (~3 M params)	1280×720 image	80 ms	5 inferences/sec per line
WL-03 OCR + lot tracing	Traceability	ViT-small + CTC head	640×480 crop	200 ms	2 inferences/sec per station
WL-04 Energy forecast	Demand response	Temporal fusion transformer (~5 M params)	168-step series	30 s	1 inference/15 min per meter
WL-05 Twin sim assist	Digital twin	Surrogate FNO (~20 M params)	64x64x64 mesh	2 s	1 inference/min per twin
WL-06 LLM tool-calling	Agentic ops	7B-class instruct model	4 k token prompt	3 s	1 inference/min per technician

Workloads WL-01 to WL-03 dominate inference volume at industrial sites — they fire continuously, per machine. WL-04 to WL-06 are lower-volume but higher per-call energy. For agentic LLM patterns and how determinism interacts with energy budgets, see our note on LLM tool-calling determinism patterns for 2026.

Benchmark cohort: hardware tiers

We deliberately span seven hardware tiers, from sub-watt microcontroller to frontier 1 kW accelerator, because the right answer to “what does an inference cost in carbon?” depends mostly on which tier you actually need.

Tier	Device	Active power (W)	Idle (W)	Where it lives	Representative cost basis
T0	ESP32-S3 + TinyML	0.4	0.05	On the machine	$5 module
T1	Jetson Orin Nano 8 GB	7-15	5	Edge gateway, cell	$500 unit
T2	Jetson Orin NX 16 GB	10-25	7	Edge gateway, line	$900 unit
T3	Jetson AGX Orin 64 GB	15-60	12	On-prem edge server	$2,000 unit
T4	NVIDIA L4	40-72	25	Regional cloud, colo	~$2.20/hr cloud
T5	NVIDIA H100 SXM5	350-700	130	Hyperscaler	~$3.50-$7/hr cloud
T6	NVIDIA B200	700-1000	180	Frontier hyperscaler	not yet generally available pricing

T0-T3 are the working zone for most plant-floor inference; T4-T6 dominate batched fleet aggregation and large LLM workloads. For pipeline architecture and how MLOps choices map onto this ladder, our edge MLOps pipelines for industrial IoT 2026 reference applies.

Benchmark cohort: regional grids

Grid intensity is the single largest lever on inference carbon, often dwarfing model and hardware choices. The following regional snapshot blends IEA 2025 country data, Ember 2025 yearly electricity data, and Electricity Maps annual averages. All figures are illustrative annual averages in gCO2e per kWh, location-based.

Region	Grid intensity (gCO2e/kWh, illustrative)	Notes
Norway	~25	Hydro-dominant
Sweden	~40	Hydro + nuclear
France	~55	Nuclear backbone
Brazil (south)	~110	Hydro
UK average	~170	Wind expansion, residual gas
Spain	~180	Solar + wind
Germany	~340	Gas + residual lignite
US national average	~370	Wide regional spread
Japan	~430	LNG + restarting nuclear
China average	~530	Coal-leaning, accelerating renewables build
India average	~640	Coal-dominant, fast solar growth
Poland	~660	Coal
South Africa	~870	Coal-heavy Eskom mix

A workload that emits 0.4 g CO2e per inference in France emits roughly 12 g in South Africa for the same compute, on the same hardware. Region selection is engineering.

Results: edge vs regional cloud vs hyperscaler

The following are illustrative community benchmark estimates combining MLPerf Power-style energy-per-query measurements where available and analytical models elsewhere. They are intended for relative comparison; absolute values should be re-measured on your own kit and grid before being quoted in audited reports.

Workload	Tier	Energy per inference (illustrative)	gCO2e per inference, France (~55 g/kWh)	gCO2e per inference, US avg (~370 g/kWh)	gCO2e per inference, India avg (~640 g/kWh)
WL-01 vibration	T0	0.06 J	0.0009	0.006	0.011
WL-01 vibration	T1	0.5 J	0.008	0.05	0.09
WL-02 vision QC	T1	1.8 J	0.028	0.19	0.32
WL-02 vision QC	T2	1.4 J	0.021	0.14	0.25
WL-02 vision QC	T4 cloud	3.5 J	0.054	0.36	0.62
WL-03 OCR	T2	4 J	0.061	0.41	0.71
WL-04 forecast	T4 cloud	12 J	0.18	1.23	2.13
WL-05 twin surrogate	T5 cloud	90 J	1.37	9.25	16.0
WL-06 LLM tool call	T5 cloud	320 J	4.9	32.9	56.9
WL-06 LLM tool call	T6 cloud	280 J	4.3	28.8	49.8

The pattern is consistent with published research (Patterson et al. 2021/2022, Luccioni et al. 2023/2024, MLPerf Inference Power v4.x): tiny edge inferences are essentially carbon-free at the per-call level; large transformer inferences on hyperscaler hardware can be three to four orders of magnitude more carbon-intensive, and grid choice multiplies that by another order of magnitude.

PUE adders, embodied carbon amortisation, and network energy each typically add 10-25% to the operational figure for cloud tiers and 2-8% for edge tiers; we publish the breakdowns in the open data CSV linked in the contribution section.

Per business outcome: gCO2e per detected anomaly, per inference, per shift

Per-inference carbon is a useful technical KPI, but operations leaders want carbon per outcome — per defect detected, per anomaly flagged, per work order generated. Here is how to compose it, with illustrative numbers.

Take WL-02 (surface defect QC) on a T1 Jetson Orin Nano, running at 5 inferences/sec on one inspection line in a UK plant (~170 gCO2e/kWh grid).

Energy per inference: ~1.8 J → 0.0005 kWh
Carbon per inference: 1.8 J × (170/3,600,000) ≈ 0.085 mg CO2e per inference
Inferences per 8-hour shift: 5 × 3,600 × 8 = 144,000
Inference carbon per shift: ~12.2 g CO2e
Defect detection rate: assume 1 defect per 1,500 inferences → 96 detections per shift
Carbon per detection: ~0.13 g CO2e — roughly the carbon of a single LED bulb running for 30 seconds on the same grid

For WL-06 (LLM tool-calling) on T5 in a US-east region (~370 g/kWh), drafting maintenance work orders:
– Carbon per inference: ~33 g CO2e per call
– Work orders per technician shift: assume 12 calls → ~400 g CO2e per shift per technician
– Carbon per drafted work order: ~33 g — comparable to brewing one cup of coffee in a kettle on the same grid

These numbers reframe the conversation. Vision QC carbon is negligible per detected defect; LLM agentic carbon is meaningful and warrants engineering attention. The 250x ratio between them is the actionable insight, not the absolute values.

Trade-offs: accuracy vs power, latency vs renewables

Three structural trade-offs dominate decisions on this benchmark.

Accuracy vs power. A larger vision model — YOLOv11m vs YOLOv11n — typically buys 1-3 mAP points at 4-6x the energy per inference. In safety-critical inspection that may be worth it; in cosmetic QC it rarely is. The right question is “what does each additional mAP point cost in gCO2e per detected defect?” rather than “is the bigger model better?”

Latency vs renewables. Carbon-aware scheduling — shifting batch workloads to lower-carbon hours via tools like Google’s Carbon-Intelligent Computing or open-source schedulers built on Electricity Maps signals — reduces emissions for non-urgent jobs by 20-40% in mixed-grid regions. But it requires latency tolerance. Twin retraining, fleet-aggregated analytics, and offline shadow evals are good candidates; real-time control loops are not.

Edge vs cloud, attribution honesty. Edge inference looks dramatically lower carbon per call, but if you naively ignore embodied carbon and the upstream training run for the model running on the edge device, you flatter the edge case. Conversely, if you ignore that cloud providers buy real PPAs and run high-PUE-efficient facilities, you penalise the cloud case unfairly. The honest move is to publish both location-based and market-based numbers for cloud, and explicit embodied amortisation for edge.

Embodied vs operational tipping point. For very low-utilisation edge devices (e.g., a Jetson sitting at 5% duty cycle on a single inspection station), embodied carbon can exceed operational carbon within the device lifetime. This is an under-appreciated reason to consolidate workloads onto fewer, better-utilised edge nodes.

Practical recommendations

If you are building or buying industrial ML in 2026, the following moves consistently reduce industrial AI inference carbon footprint without sacrificing operational outcomes.

Right-size the tier. Do not run vision QC on a hyperscaler GPU if a Jetson Orin NX serves the latency budget. Tier mismatch is the single largest source of avoidable inference carbon we see in audits.
Distil and quantise aggressively. INT8 quantisation of vision models typically cuts energy per inference 2-3x with under 1 mAP loss; structured pruning adds another 20-40%. Apply the same discipline to LLMs — a distilled 7B model often suffices for tool calling that teams default to a 70B for.
Batch where latency allows. Batched cloud inference is 3-6x more energy-efficient per call than streaming single-tenant calls.
Region-shop your cloud workloads. Choose hyperscaler regions on grid intensity, not just price. The 7-10x gCO2e/kWh spread across regions of the same hyperscaler is often the biggest single decision.
Publish the assumptions, not just the number. Any per-inference carbon figure with no methodology beside it is unauditable. CSRD assurers will reject it.
Account for embodied carbon at low duty cycles. Below ~20% utilisation, edge embodied carbon dominates; consolidate.
Carbon-aware schedule the carbon-tolerant workloads. Retraining, shadow evals, fleet-aggregated reports — shift these to low-carbon hours.
Measure, do not estimate, for production workloads. Use MLPerf Power methodology, CodeCarbon, Cloud Carbon Footprint, and Electricity Maps in your CI/CD. Numbers in this benchmark are starting points; your number is what counts.

How to contribute to this living benchmark

This page is a community ledger and we welcome submissions. To contribute a measured workload:

Use MLPerf Inference Power methodology or CodeCarbon ≥3.0 for energy measurement.
Record grid intensity using Electricity Maps for the measurement window (hourly average, not annual).
Disclose PUE (measured or vendor-published), embodied amortisation method, and whether training carbon is included.
Submit via pull request to the public benchmark CSV linked on our contributions page, or email a CSV to the benchmark steward listed there.

We update this page monthly with the latest cohort of submissions. The Markdown source, the open CSV of results, and the Mermaid diagram sources are versioned so historical snapshots remain citable.

FAQ

Q1. What is a realistic gCO2e per inference for industrial vision QC in 2026?
For a YOLOv11n-class model on a Jetson Orin Nano running on a UK grid (~170 gCO2e/kWh), the illustrative figure is roughly 0.08-0.1 mg CO2e per inference. On a US-average grid the same workload is roughly 0.15-0.2 mg. On a coal-heavy Indian regional grid it can exceed 0.3 mg. These are operational only; add 5-15% for embodied and network attribution.

Q2. Does training carbon dominate inference carbon?
For widely-deployed models, no — inference dominates over the lifetime. Patterson et al. estimate Google’s ML carbon is roughly 60% inference, 40% training at fleet level. For narrowly-deployed bespoke industrial models the ratio can flip. Always amortise training across realistic served lifetime inferences before comparing.

Q3. Are hyperscaler-published numbers trustworthy?
They are useful but partial. Market-based numbers (using contracted PPAs) typically show carbon close to zero in renewables-heavy regions; location-based numbers using grid averages tell a different story. CSRD and SBTi guidance increasingly demand both. Use the Emissions Impact Dashboard (Azure), Carbon Footprint tool (GCP), or Customer Carbon Footprint Tool (AWS) as starting points, then reconcile against Electricity Maps and IEA for assurance.

Q4. How do I include embodied carbon without an LCA budget?
Use Dell, HPE, and Lenovo public Product Carbon Footprint sheets for the closest analogue server, amortise over five years and your measured inference throughput. For Jetson and similar, use the Green Algorithms or Boavizta open data.

Q5. Is edge always lower-carbon than cloud?
Per inference, usually yes for small models. Per outcome, not always — a poorly-utilised edge device can have higher amortised embodied carbon per inference than a well-utilised cloud GPU. The crossover depends on duty cycle and replacement cadence.

Q6. How does this benchmark handle the rebound effect?
We do not yet. If cheaper, lower-carbon inference encourages teams to run 10x more inferences, total emissions can rise. We flag the issue and recommend tracking total inference volume alongside per-inference intensity as a leading indicator. Future versions of this benchmark will incorporate Jevons-style scenario modelling.

Carbon Footprint of Industrial AI Inference: 2026 Living Benchmark

Carbon Footprint of Industrial AI Inference: 2026 Living Benchmark

Architecture at a glance

Why this matters in 2026 (CSRD, SEC, customer mandates)

Methodology and assumptions

Benchmark cohort: workloads we measure

Benchmark cohort: hardware tiers

Benchmark cohort: regional grids

Results: edge vs regional cloud vs hyperscaler

Per business outcome: gCO2e per detected anomaly, per inference, per shift

Trade-offs: accuracy vs power, latency vs renewables

Practical recommendations

How to contribute to this living benchmark

FAQ

Further reading

Related

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories