Azure IoT Akri: Kubernetes Resource Interface Deep Dive (Updated 2026)
Last Updated: 2026-05-16
Architecture at a glance



Azure IoT Akri Kubernetes integration is the cleanest answer to a question that haunted every edge architect for years: how do you make a USB camera, an OPC UA server, or a Modbus PLC behave like a first-class citizen inside a Kubernetes cluster? Akri is a CNCF Sandbox project, originally created by Microsoft and open-sourced in 2020, that exposes leaf IoT devices as Kubernetes resources. Non-traditional endpoints — USB cameras, OPC UA servers, ONVIF cameras, Modbus PLCs — get the same scheduling, lifecycle, and observability primitives that Kubernetes already gives you for nodes and pods. You write a Configuration CRD describing how to find a device, Akri’s Discovery Handlers go and find them on the network or the local node, an Instance is created per discovered device, and a Broker pod is scheduled to talk to it.
This 2026 update reflects how the project has actually aged. The TL;DR: Akri’s CNCF Sandbox status has stalled, the last upstream release sits at 0.12.x from late 2022, and a lot of the engineering momentum has migrated into Azure IoT Operations (GA November 2024), which embeds Akri-derived discovery internally. The patterns Akri pioneered — device-as-Kubernetes-resource, broker pods, declarative discovery — are still highly relevant, still production-deployed in retail, manufacturing, and energy environments, and still the cleanest mental model we have. This guide walks through the architecture, the CRDs, current deployment targets, a working OPC UA walkthrough, production hardening, trade-offs, and a People-Also-Ask-style FAQ.

What Akri Solves (and What It Doesn’t)
Akri solves device-discovery and broker-scheduling for leaf IoT endpoints that cannot run a kubelet themselves. A surveillance camera, a Modbus-over-TCP PLC, a barcode scanner on USB — none of these are Kubernetes nodes, but they all need workloads that consume their data. The pre-Akri pattern was a hand-rolled gateway pod with a static config map listing every device, no auto-recovery on device change, no scheduling awareness, and a fresh integration headache per protocol. Akri replaces that with a declarative model: tell the cluster “find every OPC UA server on this subnet matching these criteria, then run this broker image against each one,” and the cluster does the rest.
What Akri does well:
- Discovery as a first-class verb. A
ConfigurationCRD describes the protocol, search scope, filter rules, and a broker pod spec. Discovery Handlers (pluggable, gRPC-based) actually scan the network or query udev, then report findings back to the per-node Akri Agent. - Per-device pod scheduling. Each discovered device becomes an
Instanceresource. The Akri Controller schedules a Broker pod per instance, with the device exposed via the Kubernetes device-plugin interface — so kubelet sees it as an allocatable resource, just like an NVIDIA GPU or an SR-IOV NIC. - Lifecycle on real events. Device disappears? Instance gets cleaned up, broker pod evicted. Device reappears? New instance, new broker. No reconciliation cron jobs you wrote yourself.
- Multi-node awareness. If two edge nodes can both see the same ONVIF camera, Akri tracks shared visibility and schedules brokers without duplicating work.
What Akri does not solve:
- Application-layer protocol translation. Akri schedules a pod next to your OPC UA server; it does not turn OPC UA into MQTT for you. That’s the broker’s job, and you write the broker.
- Cloud connectivity. Akri is a Kubernetes-local concern. Pushing data to Azure IoT Hub, Event Grid, or anywhere else is a separate stack — typically MQTT broker + Edge runtime.
- Replacing OPC UA gateway products. Commercial gateways (HMS, Kepware, Matrikon) come with vendor support, certified protocol stacks, and engineering tooling. Akri is the connective tissue, not the application.
- Doing magic with unstable upstream. With releases stalled since late 2022, you either pin to 0.12.x, fork, or move to Azure IoT Operations. We come back to this in the deployment-targets section.
The honest framing: Akri is most useful as a pattern and as the embedded engine inside Azure IoT Operations. Standalone Akri on vanilla Kubernetes still works, but the support burden has shifted to you.
Akri Architecture: Agent, Controller, Discovery Handlers, Brokers
Akri’s runtime is four moving parts plus two CRDs. Understanding which piece does what is the difference between debugging in minutes vs. days.

The Akri Agent (per-node DaemonSet)
The Agent is a Rust binary that runs as a DaemonSet — one Pod per node that you want to participate in device discovery. It does three things:
- Watches
ConfigurationCRDs via the Kubernetes API. When a Configuration appears, it figures out which Discovery Handler to talk to (the protocol name maps to a registered DH endpoint). - Talks gRPC to Discovery Handlers running on the same node. The DH does the actual protocol scan and streams back a list of discovered devices.
- Bridges to kubelet’s device plugin API. Each discovered device becomes a registered resource (e.g.,
akri.sh/onvif-camera-abc), so Pods that request that resource can be scheduled and given access to a device socket or environment variables describing the endpoint.
The Agent is also responsible for creating Instance resources — one per discovered device — and updating them as device visibility changes. Devices have stable unique IDs (MAC, serial number, OPC UA application URI, udev sysfs path), which is how Akri avoids spawning duplicate instances when a device flickers.
The Akri Controller (cluster-singleton Deployment)
The Controller is a separate Deployment, typically a single replica with a leader election lock for HA. It watches Instance resources and reconciles broker workloads against them. Each broker Pod that the Controller schedules is annotated with the instance it serves; if the instance disappears, the Pod is cleaned up.
The Controller also handles shared instances: when two nodes both see the same ONVIF camera, you get one Instance resource (not two), with both node names in its nodes field. The Controller decides which node’s Agent gets to host the broker (deterministic, based on instance state), and the other node stands by.
Discovery Handlers (pluggable, gRPC)
Discovery Handlers are the protocol-specific scanners. Akri ships several built-in:
- OPC UA DH — scans a list of
DiscoveryURLsor uses LDS-ME (Local Discovery Server with Multicast Extension) to find servers, then filters on application URI patterns. - ONVIF DH — uses WS-Discovery multicast to find cameras, filters on IP ranges, MAC ranges, or scope strings.
- udev DH — enumerates local devices via udev rules (USB, video4linux, etc.). Useful for USB cameras, barcode scanners, serial adapters.
- debugEcho DH — synthetic for testing; reports a fixed list of fake devices. Lifesaver in CI pipelines.
You can register custom Discovery Handlers by writing a Rust or Go gRPC service that implements the DH protobuf API and packaging it as a Deployment that the Agent can reach. Modbus, BACnet, MQTT-topic-as-device, REST-endpoint-as-device — all have been done as custom DHs.
Brokers (your workload)
A broker is just a Pod spec you put inside the Configuration — Akri schedules one per instance. The Pod gets environment variables describing the device (URL, port, protocol-specific blob) and, where relevant, a device socket mounted in. From there, your broker code does whatever protocol translation, edge analytics, or buffering you need. Most broker images are 50-200 lines of Python or Go around a protocol SDK.
This four-piece architecture — Agent, Controller, Discovery Handlers, Brokers — is the entire mental model. Once it clicks, the YAML reads naturally.
Configuration CRD: Discovery + Broker + Instance Lifecycle
The Configuration CRD is where you declare what to discover and what to run against it. Here’s a working OPC UA Configuration:
apiVersion: akri.sh/v0
kind: Configuration
metadata:
name: akri-opcua-monitoring
spec:
discoveryHandler:
name: opcua
discoveryDetails: |
opcuaDiscoveryMethod:
standard:
discoveryUrls:
- opc.tcp://plc-cell-01.factory.lan:4840
- opc.tcp://plc-cell-02.factory.lan:4840
applicationNames:
action: Include
items:
- "UA Sample Server"
- "MicroLogix1400"
brokerSpec:
brokerPodSpec:
containers:
- name: opcua-broker
image: ghcr.io/your-org/opcua-broker:1.4.0
env:
- name: OPCUA_NAMESPACE
value: "2"
- name: TARGET_MQTT_BROKER
value: "tcp://mqtt-broker.iot.svc.cluster.local:1883"
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
brokerProperties:
OPCUA_PUBLISHING_INTERVAL_MS: "1000"
capacity: 1
instanceServiceSpec:
type: ClusterIP
ports:
- name: metrics
port: 9090
targetPort: 9090
configurationServiceSpec:
type: ClusterIP
ports:
- name: metrics-aggregated
port: 9090
targetPort: 9090
A few things worth unpacking:
discoveryHandler.name: opcua— the Agent looks for a registered DH endpoint namedopcua. If it doesn’t exist on the node, the Configuration sits idle (and shows up asNo discovery handler registeredinkubectl describe).discoveryDetails— a free-form YAML/JSON blob that the DH parses. OPC UA expectsopcuaDiscoveryMethod+applicationNames. ONVIF expectsipAddressesormacAddressesfilters. udev expectsudevRules. The shape is per-DH, which is the right call for extensibility but means you do read the per-DH docs.brokerSpec.brokerPodSpec.containers— standard Pod spec. Akri injects per-instance env vars at scheduling time (you’ll seeOPCUA_DISCOVERY_URL,OPCUA_APPLICATION_URI, etc.).capacity: 1— how many concurrent brokers can mount one instance. Set to 1 for exclusive access (most industrial protocols), higher for multi-consumer scenarios.instanceServiceSpec+configurationServiceSpec— Akri can auto-create Services per instance and an aggregated Service across all instances. Wire these into ServiceMonitors for Prometheus.
And an ONVIF variant:
apiVersion: akri.sh/v0
kind: Configuration
metadata:
name: akri-onvif-cameras
spec:
discoveryHandler:
name: onvif
discoveryDetails: |
ipAddresses:
action: Include
items: []
macAddresses:
action: Include
items: []
scopes:
action: Include
items:
- "onvif://www.onvif.org/type/video_encoder"
discoveryTimeoutSeconds: 5
brokerSpec:
brokerPodSpec:
containers:
- name: onvif-broker
image: ghcr.io/your-org/onvif-broker:0.9.0
resources:
requests: { cpu: 200m, memory: 256Mi }
capacity: 2
Once applied, the Agent on each node fires up the relevant DH, scans, and creates Instance resources. kubectl get akrii -o wide lists them with their discovered details. Each Instance triggers the Controller to schedule a broker Pod.
The full lifecycle: Configuration applied → Agent reads → DH scans → Instance created → Controller schedules broker → Pod runs → device disappears → Instance deleted → Pod cleaned up. Every step is observable via kubectl get, which is the part that makes Akri operationally pleasant compared to bespoke gateway gluecode.
2026 Deployment Targets: AKS Edge Essentials, Azure IoT Operations, K3s
You have three viable paths to run Akri-style discovery in 2026. The choice depends on whether you’re Azure-aligned, multi-cloud, or strictly on-prem.

Path 1: AKS Edge Essentials + upstream Akri Helm chart. AKS Edge Essentials 1.7+ is Microsoft’s Kubernetes-on-Windows-host edge distribution, designed for retail stores, branch offices, and small factory cells. It runs a single-node or three-node K3s-derived cluster inside a Linux VM on Windows Server or Windows 10/11 IoT Enterprise. You helm install Akri from the upstream chart, configure your Configuration resources, and you’re off. Pro: Microsoft-supported edge OS, Arc-connectable for management. Con: the Windows host coupling annoys teams that don’t already have Windows in the field. Best for retail and branch deployments where Windows is already standard.
Path 2: Azure IoT Operations (GA November 2024). This is where the real momentum is. AIO is Microsoft’s industrial-IoT stack, built on Arc-enabled Kubernetes, that bundles an OPC UA broker, an MQTT broker (Mosquitto-based with extensions), a data-processor pipeline (DataFlows), and Akri-derived discovery built into the OPC UA Connector. You don’t install standalone Akri — the discovery logic is wired in. If you’re building a greenfield industrial deployment on Azure and your protocols are OPC UA, MQTT, or ONVIF, this is almost certainly the right call in 2026. Pro: GA, supported, integrated with Azure Monitor and Azure Arc, gets continuous updates. Con: requires Azure Arc-enabled Kubernetes (which means an Azure subscription tied to every site), licensing costs apply, and you’re locked into Microsoft’s opinionated topology.
Path 3: K3s, RKE2, or vanilla Kubernetes + upstream Akri 0.12.x. This is the “I want the pattern but not the Azure tax” path. Install K3s, deploy the upstream Akri Helm chart, write your own custom Discovery Handlers if needed, and own the support yourself. The reality check: upstream Akri hasn’t shipped a release since November 2022. Issues take weeks to get a maintainer response. If you go this route, plan on either (a) freezing at 0.12.x and accepting that you’re running unmaintained code, or (b) forking and carrying patches yourself. Several teams have done this successfully — Akri is small and well-written Rust, so it’s a tractable fork — but you’re now a maintainer. Best for air-gapped, multi-cloud, or sovereign-cloud deployments where Azure isn’t on the table.
For Kubernetes version compatibility: upstream Akri 0.12.x is tested against Kubernetes 1.21-1.27. Newer Kubernetes versions (1.28+) generally work but require minor CRD adjustments (the v1beta1 CRD shape was deprecated). AKS Edge Essentials 1.7+ ships Kubernetes 1.29 and Azure IoT Operations targets 1.27+. If you’re standing up new clusters today, K3s 1.30 + a pinned Akri fork is what most pragmatic teams run.
A quick decision rule: Are you already on Azure and doing industrial IoT? Go Azure IoT Operations. Are you Windows-edge and small-footprint? AKS Edge Essentials. Anything else? K3s + frozen-fork Akri, with eyes open about the maintenance posture.
Hands-On: Discover and Use an OPC UA Device
A working end-to-end walkthrough on a single-node K3s cluster, using the upstream Akri Helm chart. Total time: 15-20 minutes.
Prerequisites: A Linux box (Ubuntu 22.04 LTS used here), K3s installed, an OPC UA test server reachable on the network. For testing, the OPC Foundation’s UA Reference Server or open62541 examples work; if you don’t have hardware, run prosys-opc-ua-simulation-server in a container.
Step 1: Install K3s.
curl -sfL https://get.k3s.io | sh -
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown $(id -u):$(id -g) ~/.kube/config
kubectl get nodes
Step 2: Add the Akri Helm repo and install with the OPC UA Discovery Handler enabled.
helm repo add akri-helm-charts https://project-akri.github.io/akri/
helm repo update
helm install akri akri-helm-charts/akri \
--version 0.12.20 \
--set kubernetesDistro=k3s \
--set opcua.discovery.enabled=true \
--set agent.allowDebugEcho=true \
--namespace akri-system --create-namespace
After 30-60 seconds, you should see:
kubectl get pods -n akri-system
# NAME READY STATUS RESTARTS AGE
# akri-agent-daemonset-xxx 1/1 Running 0 45s
# akri-controller-deployment-yyy 1/1 Running 0 45s
# akri-opcua-discovery-daemonset-zzz 1/1 Running 0 45s
Step 3: Apply an OPC UA Configuration pointing at your server.
cat <<EOF | kubectl apply -f -
apiVersion: akri.sh/v0
kind: Configuration
metadata:
name: opcua-test
namespace: akri-system
spec:
discoveryHandler:
name: opcua
discoveryDetails: |
opcuaDiscoveryMethod:
standard:
discoveryUrls:
- opc.tcp://192.168.1.50:4840
brokerSpec:
brokerPodSpec:
containers:
- name: opcua-broker
image: ghcr.io/project-akri/akri/opcua-monitoring-broker:latest
resources:
requests: { memory: 64Mi, cpu: 100m }
limits: { memory: 256Mi, cpu: 500m }
capacity: 1
EOF
Step 4: Watch instances appear.
kubectl get akrii -n akri-system -o wide
# NAME CONFIG SHARED NODES AGE
# opcua-test-12abcd34 opcua-test false ["k3s-edge"] 20s
Step 5: Confirm the broker Pod is running and talking to the device.
kubectl get pods -n akri-system -l akri.sh/configuration=opcua-test
kubectl logs -n akri-system <broker-pod-name>
# expected output: connection logs, browsed namespace, value reads
Step 6: Tear it down to see lifecycle. Stop the OPC UA server. Within 30-90 seconds, the Instance disappears and the broker Pod is evicted:
kubectl get akrii -n akri-system
# No resources found in akri-system namespace.
That’s the entire cycle. Production deployments layer in TLS for OPC UA, named credentials via secrets, multi-node DaemonSet placement, and Prometheus scrape configs — covered next.
Production Patterns: HA, Security, Monitoring
Running Akri in production at, say, 50 retail stores or 12 factory cells means hardening four areas:
Controller HA. The Controller is a Deployment, default replicas = 1. For HA, run 3 replicas with leader election (built-in). The lease is held in coordination.k8s.io/v1 Lease objects; failover takes 10-15 seconds. In practice, Controller restarts are graceful — Instances persist in etcd, brokers keep running, and the new leader reconciles state on startup.
Discovery Handler resilience. DHs are DaemonSets that scan on a schedule (default every 10 seconds for ONVIF, every 60 seconds for OPC UA). They should be lightweight (set CPU limits to 100m). A misbehaving custom DH can degrade discovery, so always set readiness probes that exercise the gRPC endpoint and use PodDisruptionBudgets on the DaemonSet.
TLS and credentials for industrial protocols. Out of the box, the OPC UA DH supports unencrypted (None security policy) — fine for lab, not for production. For real deployments, mount certificates via secrets and reference them in discoveryDetails:
discoveryDetails: |
opcuaDiscoveryMethod:
standard:
discoveryUrls:
- opc.tcp://plc.factory.lan:4840
mountCertificates: true
The Akri OPC UA DH expects certificates at /etc/opcua-certs/, mounted via a Secret named opcua-broker-credentials. Generate per-deployment client certs from your industrial-PKI (most factories run an internal CA for OT). Rotate every 90-180 days; automate with cert-manager + a sidecar that signals broker reload.
Network policies. Lock down which Pods can talk to which devices. Discovery Handlers need broad subnet access for WS-Discovery and OPC UA multicast; broker pods only need to reach their specific device. Use NetworkPolicies (Calico or Cilium) that allow DH egress to device subnets and broker egress to specific instance IPs.
Observability. Akri exposes Prometheus metrics on every Agent, Controller, and (if you wire it) every broker:
akri_agent_discovery_count— how many devices each DH has reported.akri_agent_instance_count— current instance count per Configuration.akri_controller_broker_pod_count— current broker workload count.
Wire these into a Prometheus + Grafana stack at the edge or aggregate via remote-write to a central Prometheus. Alert on akri_agent_instance_count == 0 for a Configuration that should always have devices — that catches the “discovery silently broke” failure mode that bites teams two months in.
For logging, structured JSON logs from the Agent and Controller go to stdout — Fluent Bit DaemonSet to Loki or Azure Monitor (if Arc-enabled) is the standard pattern.
Trade-offs and Anti-patterns
Akri is not the right answer for every device-on-Kubernetes problem.
Anti-pattern: using Akri as a generic protocol gateway. Akri schedules a pod per device. If your problem is “I have 50,000 MQTT clients pushing to one topic,” Akri is wildly wrong — you want a single MQTT broker Pod, not 50,000 Akri-managed pods. Akri shines when each device needs a dedicated workload (per-camera ML inference, per-PLC OPC UA polling).
Trade-off: discovery latency vs. polling cost. Cranking ONVIF discovery to every 5 seconds finds new cameras fast and burns CPU and multicast traffic. The defaults (10s ONVIF, 60s OPC UA) are conservative and right for most. Tune only when you have evidence — and never below 2 seconds.
Trade-off: upstream Akri’s maintenance posture. Already covered — be honest with yourself about whether you can carry 0.12.x indefinitely, fork it, or move to Azure IoT Operations.
Anti-pattern: Akri for cloud devices. Akri is for edge devices. If your “device” is a SaaS endpoint or a cloud API, you don’t need device-as-resource semantics; you need a regular Deployment with credentials.
Anti-pattern: skipping the broker, exposing devices via Service. Some teams try to skip the broker layer and let workloads talk directly to discovered devices using only the device-plugin env vars. Works at small scale, falls apart at multi-tenant: now every workload needs device-protocol code, and you’ve lost the abstraction Akri was offering in the first place.
FAQ
What is Azure IoT Akri?
Azure IoT Akri is a CNCF Sandbox project — originally created by Microsoft and open-sourced in 2020 — that lets Kubernetes treat leaf IoT devices (OPC UA servers, ONVIF cameras, USB devices, Modbus PLCs) as first-class cluster resources. You declare a Configuration CRD describing how to discover devices, Akri’s Discovery Handlers find them, and the Akri Controller schedules a broker Pod per device. It ships with built-in handlers for OPC UA, ONVIF, and udev, plus a gRPC plugin model for custom protocols.
Is Akri still maintained in 2026?
Upstream Akri (project-akri/akri on GitHub) has been effectively stalled since November 2022, with 0.12.x as the last release. The CNCF Sandbox status remains, but engineering momentum has moved into Azure IoT Operations, which embeds Akri-derived discovery internally and shipped GA in November 2024. Teams running standalone Akri in 2026 typically pin to 0.12.x, fork the repo, or migrate to Azure IoT Operations. The patterns Akri pioneered remain industry-standard for device-as-Kubernetes-resource.
What’s the difference between Akri and Azure IoT Operations?
Akri is a standalone Kubernetes project for device discovery and broker scheduling. Azure IoT Operations (AIO) is Microsoft’s full industrial-IoT stack — built on Arc-enabled Kubernetes — that bundles an MQTT broker, an OPC UA broker, data-processing pipelines (DataFlows), and Akri-derived discovery logic. AIO is a supported, commercially-licensed product; Akri standalone is community-maintained. If you’re greenfield on Azure and doing OPC UA or MQTT, choose AIO. If you’re multi-cloud or air-gapped, you can still use upstream Akri.
Does Akri work outside Azure?
Yes. Akri runs on any conformant Kubernetes distribution — K3s, RKE2, vanilla kubeadm, OpenShift, AKS, EKS, GKE. The “Azure IoT” branding reflects its origin (Microsoft Azure IoT engineering team) but the project has no Azure-specific dependencies. Many production deployments run Akri on K3s at the edge with no Azure connection at all. The trade-off, as of 2026, is the upstream-maintenance situation: if you go non-Azure, you own the support.
How does Akri discovery actually work?
The Akri Agent (a DaemonSet on each node) watches Configuration CRDs via the Kubernetes API. When a Configuration appears, the Agent talks gRPC to the matching Discovery Handler running on the same node. The DH does the protocol-specific scan — WS-Discovery multicast for ONVIF, an OPC UA FindServers call against listed discovery URLs, udev enumeration for local devices. Each discovered device with a unique ID becomes an Instance resource. The Akri Controller watches Instances and schedules a broker Pod per instance, with the device exposed via Kubernetes’ device-plugin interface.
Can Akri replace OPC UA Gateway products?
For straightforward use cases — discovering OPC UA servers and forwarding values to MQTT or Azure IoT Hub — yes, Akri plus a custom broker Pod can replace a commercial gateway at much lower licensing cost. For complex industrial environments with certified OPC UA stacks, redundant connections, alarms-and-conditions, historical access, and vendor SLAs — no, products like HMS Ewon, Kepware, or Matrikon UA remain better fits. The decision usually comes down to whether you have the platform team to own custom broker code vs. buying a supported product.
Further Reading
- CNI Comparison: Calico, Cilium, Flannel, Multus on Kubernetes (2026) — networking foundations Akri sits on top of.
- Cilium eBPF: Service Mesh, Networking, Observability, Security — pair with Akri for edge observability.
- ArgoCD and Flux for GitOps on Industrial Fleets (Tutorial 2026) — deploying Akri Configurations across many edge sites declaratively.
- OPC UA Server in Python with asyncua: Production-Ready Tutorial — build the OPC UA endpoint Akri discovers.
- OPC UA PubSub: MQTT vs UADP Architecture Comparison — for the MQTT-vs-UADP decision when wiring Akri brokers to upstream messaging.
References
- Akri GitHub repository — github.com/project-akri/akri. Source, releases, examples, Discovery Handler protobuf, Helm chart.
- CNCF Sandbox project page for Akri — cncf.io/projects/akri/. Project status, maintainer list, governance.
- Azure IoT Operations documentation — learn.microsoft.com/azure/iot-operations. Architecture, OPC UA connector, MQTT broker, DataFlows, AIO-Akri integration details.
- AKS Edge Essentials docs — learn.microsoft.com/azure/aks/hybrid/aks-edge-overview.
- Kubernetes Device Plugin API — kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/device-plugins/. The interface Akri uses to register discovered devices with kubelet.
