Carbon Footprint of Industrial AI Inference: 2026 Living Benchmark
Living benchmark — last updated 2026-06-03. The honest answer to “what is the industrial AI inference carbon footprint in 2026?” is: it ranges from roughly 0.02 g CO2e per inference on a renewables-fed edge gateway in Norway to well over 3 g CO2e per inference on a coal-leaning grid running large transformer models — a spread of more than two orders of magnitude before you even count embodied carbon. That headline is illustrative; this page is the working ledger we keep updating as MLPerf Power rounds, IEA grid data, and community submissions land. If you came here for one single number to drop into a sustainability slide, that number does not exist, and any vendor offering it is selling you a story rather than a measurement.
Architecture at a glance




This post sets out a transparent methodology for measuring the industrial AI inference carbon footprint in 2026, lists the workloads and hardware tiers we benchmark, maps the regional grids that dominate the result, and gives you illustrative ranges per business outcome — grams of CO2e per detected defect, per anomaly, per drafted work order. Every number is labelled as either a cited primary source or an illustrative community estimate. Treat the framework as the durable bit and the numbers as a snapshot that will move.
If you are a sustainability lead trying to answer CSRD double-materiality questions on AI, a plant engineer being asked to justify a vision-QC rollout on carbon grounds, or an ML platform owner pricing inference in gCO2e/kWh alongside dollars, this benchmark is built for you.
Why this matters in 2026 (CSRD, SEC, customer mandates)
Three regulatory currents converged over the past 18 months and pushed AI inference carbon from a nice-to-have ESG narrative into a balance-sheet line item.
First, the EU Corporate Sustainability Reporting Directive (CSRD) phase-in is now biting mid-cap industrials in Europe in their FY2025 reports filed during 2026. CSRD demands Scope 1, 2, and 3 disclosure under European Sustainability Reporting Standards (ESRS E1), and the European Financial Reporting Advisory Group’s (EFRAG) guidance has made it clear that material upstream and downstream emissions from purchased cloud and AI services fall into Scope 3 category 1 (“purchased goods and services”) and category 11 (“use of sold products”) for vendors. If you ship a connected product that runs ML inference at the edge or in the cloud, the operational energy of that inference is now disclosable.
Second, the US Securities and Exchange Commission’s climate disclosure rule, despite its bumpy legal ride, has driven Fortune 500 issuers to formalise GHG accounting that includes IT load. Many large industrial buyers are now flowing requirements down: if you want to remain on the approved supplier list for a major automaker or aerospace OEM, you have to report attributable kgCO2e per million inferences served.
Third, customer mandates from buyers themselves — Unilever, Maersk, Ørsted, Tata Steel — increasingly include “AI carbon transparency” clauses in master service agreements. A 2025 BCG survey of 200 industrial procurement leads found that 41% had added an emissions-per-inference clause to at least one ML services contract in the prior 12 months.
The implication is concrete: industrial ML teams need a defensible, auditable per-inference carbon number, broken down by hardware tier, region, and workload — not a vendor-supplied headline figure. That is what this benchmark exists to support.
Methodology and assumptions
Carbon accounting for AI inference is a swamp of overlapping methodologies, and the honest move is to surface our assumptions before any numbers.
Emission factors. For grid intensity we use a blended approach: International Energy Agency (IEA) 2025 country and regional averages as the long-run anchor, Ember’s 2025 European Electricity Review and global Yearly Electricity Data for higher-frequency country updates, and Electricity Maps’ hourly data for region-of-interest spot checks. Where a hyperscaler publishes its own contracted PPA mix (Google Cloud regional carbon-free energy percentage, AWS regional carbon intensity, Azure Emissions Impact Dashboard), we record both the average grid figure and the contracted figure separately. We never silently substitute one for the other.
Hardware power. For accelerators we anchor on MLPerf Inference Power v4.x results where they exist, NVIDIA published TDP for unmeasured configurations, and Cloud Carbon Footprint’s open coefficients for general server load. For microcontrollers and edge SoCs we use vendor datasheets corroborated by CodeCarbon and Green Algorithms benchmark methodology. We measure under three load states: idle, typical (steady-state inference), and peak.
Idle vs active attribution. A GPU drawing 130 W at idle and 650 W under load cannot be attributed entirely to active inference. We split idle power across the average time-shared tenancy of the device when known, and otherwise apply a 30/70 rule: 30% of idle power follows the workload, 70% is “facility overhead” attributable to provisioning, not requests. This is a defensible heuristic, not gospel — Hugging Face’s research methodology and the Green Software Foundation’s Software Carbon Intensity (SCI) specification debate this split, and we publish results both ways.
PUE. Power Usage Effectiveness multiplies IT load to give total facility load. We use 1.15 for best-in-class hyperscaler regions (matching Google’s reported 2024 fleet average), 1.25 for typical Tier III commercial colocation, 1.4 for legacy enterprise data centres, and 1.05 for on-prem edge gateways that ride existing factory HVAC.
Embodied carbon. Manufacturing, transport, and disposal of silicon and chassis. We amortise an estimated 1,500 kg CO2e per high-end GPU (consistent with publicly available Dell, Lenovo, and HPE lifecycle assessments) and 60 kg CO2e per Jetson-class edge device over a five-year useful life, then divide by lifetime inferences served. This is the most uncertain bucket; we report it as a separate line so readers can recompute.
Training carbon amortisation. A trained model has carbon debt. We amortise reported training-run emissions (where vendors publish them — OpenAI, Anthropic, Meta, Mistral have for select models) across the model’s served lifetime inferences. For models with no published training carbon, we exclude rather than guess.
Network energy. Bits over fibre and through switches consume energy. For end-to-end accounting we add a network energy coefficient from the IEA’s 2024 “Networks” tracker (roughly 0.06 kWh per GB transferred for fixed networks, 0.1 for mobile), applied to the payload size of each inference.
Attribution model. We use the GHG Protocol Scope 2 dual reporting approach (location-based + market-based) and tag every result accordingly. SCI v1.2 from the Green Software Foundation provides the functional unit framework: gCO2e per inference, gCO2e per detected outcome.
What we do not claim. This is not a Life Cycle Assessment per ISO 14040/44. It is a transparent operational carbon accounting suitable for engineering trade-off conversations and as an input to formal LCAs.
Benchmark cohort: workloads we measure
The benchmark covers six representative industrial inference workloads. Each is sized to a realistic plant duty cycle, not synthetic peak.
| Workload | Domain | Model class | Typical input | Typical latency budget | Duty cycle |
|---|---|---|---|---|---|
| WL-01 Vibration anomaly | Predictive maintenance | 1D CNN (~300 k params) | 1 s accelerometer window | 100 ms | 1 inference/sec per asset |
| WL-02 Surface defect QC | Quality inspection | YOLO-N / YOLOv11n (~3 M params) | 1280×720 image | 80 ms | 5 inferences/sec per line |
| WL-03 OCR + lot tracing | Traceability | ViT-small + CTC head | 640×480 crop | 200 ms | 2 inferences/sec per station |
| WL-04 Energy forecast | Demand response | Temporal fusion transformer (~5 M params) | 168-step series | 30 s | 1 inference/15 min per meter |
| WL-05 Twin sim assist | Digital twin | Surrogate FNO (~20 M params) | 64x64x64 mesh | 2 s | 1 inference/min per twin |
| WL-06 LLM tool-calling | Agentic ops | 7B-class instruct model | 4 k token prompt | 3 s | 1 inference/min per technician |
Workloads WL-01 to WL-03 dominate inference volume at industrial sites — they fire continuously, per machine. WL-04 to WL-06 are lower-volume but higher per-call energy. For agentic LLM patterns and how determinism interacts with energy budgets, see our note on LLM tool-calling determinism patterns for 2026.
Benchmark cohort: hardware tiers
We deliberately span seven hardware tiers, from sub-watt microcontroller to frontier 1 kW accelerator, because the right answer to “what does an inference cost in carbon?” depends mostly on which tier you actually need.
| Tier | Device | Active power (W) | Idle (W) | Where it lives | Representative cost basis |
|---|---|---|---|---|---|
| T0 | ESP32-S3 + TinyML | 0.4 | 0.05 | On the machine | $5 module |
| T1 | Jetson Orin Nano 8 GB | 7-15 | 5 | Edge gateway, cell | $500 unit |
| T2 | Jetson Orin NX 16 GB | 10-25 | 7 | Edge gateway, line | $900 unit |
| T3 | Jetson AGX Orin 64 GB | 15-60 | 12 | On-prem edge server | $2,000 unit |
| T4 | NVIDIA L4 | 40-72 | 25 | Regional cloud, colo | ~$2.20/hr cloud |
| T5 | NVIDIA H100 SXM5 | 350-700 | 130 | Hyperscaler | ~$3.50-$7/hr cloud |
| T6 | NVIDIA B200 | 700-1000 | 180 | Frontier hyperscaler | not yet generally available pricing |
T0-T3 are the working zone for most plant-floor inference; T4-T6 dominate batched fleet aggregation and large LLM workloads. For pipeline architecture and how MLOps choices map onto this ladder, our edge MLOps pipelines for industrial IoT 2026 reference applies.
Benchmark cohort: regional grids
Grid intensity is the single largest lever on inference carbon, often dwarfing model and hardware choices. The following regional snapshot blends IEA 2025 country data, Ember 2025 yearly electricity data, and Electricity Maps annual averages. All figures are illustrative annual averages in gCO2e per kWh, location-based.
| Region | Grid intensity (gCO2e/kWh, illustrative) | Notes |
|---|---|---|
| Norway | ~25 | Hydro-dominant |
| Sweden | ~40 | Hydro + nuclear |
| France | ~55 | Nuclear backbone |
| Brazil (south) | ~110 | Hydro |
| UK average | ~170 | Wind expansion, residual gas |
| Spain | ~180 | Solar + wind |
| Germany | ~340 | Gas + residual lignite |
| US national average | ~370 | Wide regional spread |
| Japan | ~430 | LNG + restarting nuclear |
| China average | ~530 | Coal-leaning, accelerating renewables build |
| India average | ~640 | Coal-dominant, fast solar growth |
| Poland | ~660 | Coal |
| South Africa | ~870 | Coal-heavy Eskom mix |
A workload that emits 0.4 g CO2e per inference in France emits roughly 12 g in South Africa for the same compute, on the same hardware. Region selection is engineering.
Results: edge vs regional cloud vs hyperscaler
The following are illustrative community benchmark estimates combining MLPerf Power-style energy-per-query measurements where available and analytical models elsewhere. They are intended for relative comparison; absolute values should be re-measured on your own kit and grid before being quoted in audited reports.
| Workload | Tier | Energy per inference (illustrative) | gCO2e per inference, France (~55 g/kWh) | gCO2e per inference, US avg (~370 g/kWh) | gCO2e per inference, India avg (~640 g/kWh) |
|---|---|---|---|---|---|
| WL-01 vibration | T0 | 0.06 J | 0.0009 | 0.006 | 0.011 |
| WL-01 vibration | T1 | 0.5 J | 0.008 | 0.05 | 0.09 |
| WL-02 vision QC | T1 | 1.8 J | 0.028 | 0.19 | 0.32 |
| WL-02 vision QC | T2 | 1.4 J | 0.021 | 0.14 | 0.25 |
| WL-02 vision QC | T4 cloud | 3.5 J | 0.054 | 0.36 | 0.62 |
| WL-03 OCR | T2 | 4 J | 0.061 | 0.41 | 0.71 |
| WL-04 forecast | T4 cloud | 12 J | 0.18 | 1.23 | 2.13 |
| WL-05 twin surrogate | T5 cloud | 90 J | 1.37 | 9.25 | 16.0 |
| WL-06 LLM tool call | T5 cloud | 320 J | 4.9 | 32.9 | 56.9 |
| WL-06 LLM tool call | T6 cloud | 280 J | 4.3 | 28.8 | 49.8 |
The pattern is consistent with published research (Patterson et al. 2021/2022, Luccioni et al. 2023/2024, MLPerf Inference Power v4.x): tiny edge inferences are essentially carbon-free at the per-call level; large transformer inferences on hyperscaler hardware can be three to four orders of magnitude more carbon-intensive, and grid choice multiplies that by another order of magnitude.
PUE adders, embodied carbon amortisation, and network energy each typically add 10-25% to the operational figure for cloud tiers and 2-8% for edge tiers; we publish the breakdowns in the open data CSV linked in the contribution section.
Per business outcome: gCO2e per detected anomaly, per inference, per shift
Per-inference carbon is a useful technical KPI, but operations leaders want carbon per outcome — per defect detected, per anomaly flagged, per work order generated. Here is how to compose it, with illustrative numbers.
Take WL-02 (surface defect QC) on a T1 Jetson Orin Nano, running at 5 inferences/sec on one inspection line in a UK plant (~170 gCO2e/kWh grid).
- Energy per inference: ~1.8 J → 0.0005 kWh
- Carbon per inference: 1.8 J × (170/3,600,000) ≈ 0.085 mg CO2e per inference
- Inferences per 8-hour shift: 5 × 3,600 × 8 = 144,000
- Inference carbon per shift: ~12.2 g CO2e
- Defect detection rate: assume 1 defect per 1,500 inferences → 96 detections per shift
- Carbon per detection: ~0.13 g CO2e — roughly the carbon of a single LED bulb running for 30 seconds on the same grid
For WL-06 (LLM tool-calling) on T5 in a US-east region (~370 g/kWh), drafting maintenance work orders:
– Carbon per inference: ~33 g CO2e per call
– Work orders per technician shift: assume 12 calls → ~400 g CO2e per shift per technician
– Carbon per drafted work order: ~33 g — comparable to brewing one cup of coffee in a kettle on the same grid
These numbers reframe the conversation. Vision QC carbon is negligible per detected defect; LLM agentic carbon is meaningful and warrants engineering attention. The 250x ratio between them is the actionable insight, not the absolute values.
Trade-offs: accuracy vs power, latency vs renewables
Three structural trade-offs dominate decisions on this benchmark.
Accuracy vs power. A larger vision model — YOLOv11m vs YOLOv11n — typically buys 1-3 mAP points at 4-6x the energy per inference. In safety-critical inspection that may be worth it; in cosmetic QC it rarely is. The right question is “what does each additional mAP point cost in gCO2e per detected defect?” rather than “is the bigger model better?”
Latency vs renewables. Carbon-aware scheduling — shifting batch workloads to lower-carbon hours via tools like Google’s Carbon-Intelligent Computing or open-source schedulers built on Electricity Maps signals — reduces emissions for non-urgent jobs by 20-40% in mixed-grid regions. But it requires latency tolerance. Twin retraining, fleet-aggregated analytics, and offline shadow evals are good candidates; real-time control loops are not.
Edge vs cloud, attribution honesty. Edge inference looks dramatically lower carbon per call, but if you naively ignore embodied carbon and the upstream training run for the model running on the edge device, you flatter the edge case. Conversely, if you ignore that cloud providers buy real PPAs and run high-PUE-efficient facilities, you penalise the cloud case unfairly. The honest move is to publish both location-based and market-based numbers for cloud, and explicit embodied amortisation for edge.
Embodied vs operational tipping point. For very low-utilisation edge devices (e.g., a Jetson sitting at 5% duty cycle on a single inspection station), embodied carbon can exceed operational carbon within the device lifetime. This is an under-appreciated reason to consolidate workloads onto fewer, better-utilised edge nodes.
Practical recommendations
If you are building or buying industrial ML in 2026, the following moves consistently reduce industrial AI inference carbon footprint without sacrificing operational outcomes.
- Right-size the tier. Do not run vision QC on a hyperscaler GPU if a Jetson Orin NX serves the latency budget. Tier mismatch is the single largest source of avoidable inference carbon we see in audits.
- Distil and quantise aggressively. INT8 quantisation of vision models typically cuts energy per inference 2-3x with under 1 mAP loss; structured pruning adds another 20-40%. Apply the same discipline to LLMs — a distilled 7B model often suffices for tool calling that teams default to a 70B for.
- Batch where latency allows. Batched cloud inference is 3-6x more energy-efficient per call than streaming single-tenant calls.
- Region-shop your cloud workloads. Choose hyperscaler regions on grid intensity, not just price. The 7-10x gCO2e/kWh spread across regions of the same hyperscaler is often the biggest single decision.
- Publish the assumptions, not just the number. Any per-inference carbon figure with no methodology beside it is unauditable. CSRD assurers will reject it.
- Account for embodied carbon at low duty cycles. Below ~20% utilisation, edge embodied carbon dominates; consolidate.
- Carbon-aware schedule the carbon-tolerant workloads. Retraining, shadow evals, fleet-aggregated reports — shift these to low-carbon hours.
- Measure, do not estimate, for production workloads. Use MLPerf Power methodology, CodeCarbon, Cloud Carbon Footprint, and Electricity Maps in your CI/CD. Numbers in this benchmark are starting points; your number is what counts.
How to contribute to this living benchmark
This page is a community ledger and we welcome submissions. To contribute a measured workload:
- Use MLPerf Inference Power methodology or CodeCarbon ≥3.0 for energy measurement.
- Record grid intensity using Electricity Maps for the measurement window (hourly average, not annual).
- Disclose PUE (measured or vendor-published), embodied amortisation method, and whether training carbon is included.
- Submit via pull request to the public benchmark CSV linked on our contributions page, or email a CSV to the benchmark steward listed there.
We update this page monthly with the latest cohort of submissions. The Markdown source, the open CSV of results, and the Mermaid diagram sources are versioned so historical snapshots remain citable.
FAQ
Q1. What is a realistic gCO2e per inference for industrial vision QC in 2026?
For a YOLOv11n-class model on a Jetson Orin Nano running on a UK grid (~170 gCO2e/kWh), the illustrative figure is roughly 0.08-0.1 mg CO2e per inference. On a US-average grid the same workload is roughly 0.15-0.2 mg. On a coal-heavy Indian regional grid it can exceed 0.3 mg. These are operational only; add 5-15% for embodied and network attribution.
Q2. Does training carbon dominate inference carbon?
For widely-deployed models, no — inference dominates over the lifetime. Patterson et al. estimate Google’s ML carbon is roughly 60% inference, 40% training at fleet level. For narrowly-deployed bespoke industrial models the ratio can flip. Always amortise training across realistic served lifetime inferences before comparing.
Q3. Are hyperscaler-published numbers trustworthy?
They are useful but partial. Market-based numbers (using contracted PPAs) typically show carbon close to zero in renewables-heavy regions; location-based numbers using grid averages tell a different story. CSRD and SBTi guidance increasingly demand both. Use the Emissions Impact Dashboard (Azure), Carbon Footprint tool (GCP), or Customer Carbon Footprint Tool (AWS) as starting points, then reconcile against Electricity Maps and IEA for assurance.
Q4. How do I include embodied carbon without an LCA budget?
Use Dell, HPE, and Lenovo public Product Carbon Footprint sheets for the closest analogue server, amortise over five years and your measured inference throughput. For Jetson and similar, use the Green Algorithms or Boavizta open data.
Q5. Is edge always lower-carbon than cloud?
Per inference, usually yes for small models. Per outcome, not always — a poorly-utilised edge device can have higher amortised embodied carbon per inference than a well-utilised cloud GPU. The crossover depends on duty cycle and replacement cadence.
Q6. How does this benchmark handle the rebound effect?
We do not yet. If cheaper, lower-carbon inference encourages teams to run 10x more inferences, total emissions can rise. We flag the issue and recommend tracking total inference volume alongside per-inference intensity as a leading indicator. Future versions of this benchmark will incorporate Jevons-style scenario modelling.
Further reading
- International Energy Agency, Electricity 2025: Analysis and forecast to 2027, IEA, Paris, 2025.
- Ember, Global Electricity Review 2025 and European Electricity Review 2025, Ember Climate, 2025.
- Electricity Maps, Live and historical grid carbon intensity data, electricitymaps.com (accessed 2026).
- MLCommons, MLPerf Inference: Power benchmark results v4.x, mlcommons.org, 2025-2026.
- Cloud Carbon Footprint project, Methodology documentation v0.x, cloudcarbonfootprint.org, 2025.
- Green Software Foundation, Software Carbon Intensity (SCI) Specification v1.2, sci.greensoftware.foundation.
- CodeCarbon, Open-source emissions tracking for ML workloads, codecarbon.io.
- Patterson, D., Gonzalez, J., Hölzle, U., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., Dean, J., The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink, IEEE Computer, 2022.
- Luccioni, A.S., Jernite, Y., Strubell, E., Power Hungry Processing: Watts Driving the Cost of AI Deployment?, FAccT 2024.
- EFRAG, ESRS E1 Climate Change — Implementation Guidance, European Financial Reporting Advisory Group, 2025.
- Greenhouse Gas Protocol, Scope 2 Guidance and Corporate Value Chain (Scope 3) Standard, WRI/WBCSD.
- Boavizta, Open Environmental Impact Data for Digital Equipment, boavizta.org.
Diagrams referenced in this post (rendered separately from assets/arch_01.mmd through arch_04.mmd): scope mapping for AI inference activities; hardware tier ladder from ESP32 to B200; regional grid intensity map; Sankey-style kWh-to-outcomes flow. Alt text for arch_03 image: “Regional grid carbon intensity map for industrial AI inference carbon footprint 2026 benchmark”.
This is a living benchmark. Last updated 2026-06-03. Submit corrections, fresh MLPerf Power numbers, or workload measurements via the contributions process above — we re-issue the page monthly.
