IIoT Edge Gateway Architecture: 2026 Patterns for Industrial Networks

IIoT Edge Gateway Architecture: 2026 Patterns for Industrial Networks

IIoT Edge Gateway Architecture: 2026 Patterns for Industrial Networks

The IIoT edge gateway is the boundary device that decides whether your plant data ever reaches a useful destination. In a healthy IIoT edge gateway architecture 2026 design, the gateway does five jobs at once: it bridges OT protocols (OPC UA, Modbus, EtherNet/IP, PROFINET) into IT-friendly transports (MQTT, Kafka, HTTPS); it normalises tags and timestamps into a unified namespace; it absorbs hours or days of cloud or WAN outage with disk-backed store-and-forward; it hosts edge compute slots for rules, edge ML, and SRE agents; and it sits inside a Purdue / IEC 62443 zoning model with mTLS conduits, hardware-rooted identity, and short-lived secrets. Get any one of those wrong and the plant becomes either blind, brittle, or breached.

This reference covers the 2026 patterns we actually see surviving in brownfield steel mills, greenfield EV plants, water utilities, and pharma sites: layered architecture, protocol fan-in to a Sparkplug B unified namespace, container runtimes on Jetson Orin / IPC-class hosts, IEC 62443 zoning at L3.5, dual-gateway HA with state sync, and the trade-offs nobody likes to put in vendor decks.

Layered IIoT edge gateway architecture showing physical I/O, driver, normalisation, buffer, compute, and northbound layers

The role of the IIoT edge gateway in 2026

For a decade the edge gateway was treated as a protocol converter with a network port. That framing is no longer useful. In 2026 the gateway is the policy enforcement point between OT and IT, and a small but real compute platform in its own right. Three forces have driven the change.

First, the unified namespace (UNS) pattern, popularised by Sparkplug B and pushed by the digital twin and MES vendors, only works if there is a device close to the PLC that publishes well-formed birth and death certificates, owns a stable topic prefix, and replays buffered messages cleanly after a broker reconnect. That device is the gateway. See our MQTT protocol complete technical guide for why Sparkplug B lives or dies on this discipline.

Second, edge ML went from a slide-deck idea to a line item. Vibration analytics, computer-vision inspection, energy disaggregation and anomaly scoring now run on the same gateway hardware that owns the protocol drivers — partly because the network back to the data centre is too slow or too expensive, and partly because the inference loop has to close in tens of milliseconds. Jetson Orin NX 16 GB modules and AGX Orin 64 GB developer kits are now standard reference points, with NVIDIA’s Jetson Thor positioning itself for the heavier robotics-class workloads expected from 2025-2026 onward (vendor-reported figures; verify against your own benchmarks).

Third, regulation caught up. IEC 62443-3-3 is now a procurement requirement in most large industrial RFPs, the EU’s NIS2 Directive widened the scope of “essential entities”, and the US TSA Security Directives for pipelines explicitly call out network segmentation. A device that straddles L2 and L4 traffic without proving zone membership, identity, and patchability is increasingly a non-starter in audits.

So when we talk about the industrial edge gateway design in 2026, we mean a device that is simultaneously: a deterministic protocol stack, a buffered publisher, a tiny Kubernetes-like compute fabric, and a hardened network appliance. The rest of this post unpacks each of those.

Reference architecture: layers and responsibilities

A serviceable mental model splits the gateway into six layers, each with one job. Diagram arch_01 maps them top to bottom.

Six-layer reference architecture diagram for an industrial edge gateway

1. Physical I/O layer. Serial (RS-232, RS-485 isolated), industrial Ethernet (often dual-port for ring topologies), legacy fieldbus headers (PROFIBUS, CC-Link, where present), discrete I/O (DI/DO/AI/AO). On a Moxa AIG-301 or Siemens IPC227G, this is the boundary that decides whether you need a separate I/O block or not.

2. Protocol driver layer. One container or process per protocol family: Modbus RTU/TCP, OPC UA client (with PubSub for the newer use cases), EtherNet/IP CIP scanner, PROFINET RT, plus BACnet, IEC 61850, and S7 where the site demands it. The discipline here is to keep each driver isolated: a misbehaving Modbus poll should never starve the OPC UA stack.

3. Normalisation and tag layer. Raw register values become tagged measurements: site.line5.mixer3.barrel_temp rather than 40001. Engineering-unit scaling, quality codes (good/uncertain/bad), and per-tag timestamps live here. This is also where asset metadata enrichment happens (which PLC, which work centre, which ISA-95 equipment level). Without this layer the unified namespace is just a flat broker with cryptic topic names.

4. Store-and-forward buffer. A write-ahead log or embedded time-series store (RocksDB, SQLite, BadgerDB are all common) that survives reboots, holds at least 24-72 hours of plant data depending on disk and rate, and replays in timestamp order when the northbound link comes back. This layer is non-negotiable for sites with flaky WANs.

5. Edge compute slot. A container runtime (containerd, Podman, or k3s) where rules engines, edge ML model servers, log/metric agents, and customer code run as separately deployed and updated workloads. The gateway should never be one monolithic firmware blob.

6. Northbound publishers. MQTT 5 / Sparkplug B for UNS use cases, Kafka or Redpanda producers where downstream consumers need replay semantics, plain HTTPS for cloud REST APIs, and vendor-specific cloud IoT bridges (Azure IoT Edge, AWS IoT Greengrass, Google Cloud IoT) where the enterprise has committed to one cloud.

The interesting design questions live in the seams. How do you push backpressure from the buffer back to the drivers when the WAN is down for two days? How do you let the edge ML container subscribe to the same normalised tag stream the northbound publisher uses, without re-implementing protocol decode? How do you version the tag dictionary across all six layers when an engineer adds a new sensor at 0300 on a Saturday? Answer those well and the rest of the architecture writes itself.

Protocol bridging patterns (OPC UA, Modbus, MQTT/Sparkplug, EtherNet/IP)

Protocol bridging is the historic core of the gateway, and the place where vendor differentiation is thinnest. The interesting work in 2026 is not “can you read Modbus” — every product can — but how you fan in, normalise, and publish without losing fidelity. Diagram arch_02 shows the canonical pattern: a heterogeneous southbound (Rockwell ControlLogix over EtherNet/IP, Siemens S7-1500 over PROFINET / S7 / OPC UA, Modbus TCP drives and meters, Mitsubishi MELSEC over SLMP, plus OPC UA smart sensors and IO-Link masters) collapses into a single Sparkplug B Edge Node publishing to an MQTT 5 broker that downstream historians, SCADA, digital twins, and analytics share.

Protocol bridging fan-in from PLCs and sensors into a Sparkplug B unified namespace

A few opinions that have hardened in 2026 deployments:

Prefer OPC UA where you can pick. When a sensor or controller offers OPC UA natively (most Siemens S7-1500, many Rockwell CompactLogix variants, almost all serious analyzers), use it. You get a proper information model, browseable address space, subscription-based change-of-value, and standard security profiles. The full background sits in our OPC UA protocol complete technical guide. OPC UA PubSub over UDP or MQTT is now a credible alternative for high-volume telemetry where client/server sessions add too much overhead.

Use Modbus as a fallback, not a target. Modbus TCP is the universal donor — every drive, meter, and HVAC controller has it — but it gives you no metadata, no quality codes, and no events. Drivers should add scan-group hygiene (batch reads, sensible polling intervals matched to PV time constants), and the normaliser must inject quality codes when a register times out.

Sparkplug B is the default UNS skin. Sparkplug B is the layer that turns raw MQTT into something a serious OT engineer will accept: birth certificates (NBIRTH, DBIRTH) define what a node and its devices look like, death certificates (NDEATH) propagate via MQTT Last Will and Testament so consumers know when an edge node disappears, and the state machine guarantees consumers never see ghost tags. The gateway is the natural Sparkplug B Edge Node host. The discipline is that the gateway must own a stable Group ID, Edge Node ID, and Device ID per asset and never duplicate them — re-using an Edge Node ID across two physical gateways will produce duplicate NDEATH/NBIRTH cycles that break every consumer.

EtherNet/IP and PROFINET need a careful licensing review. Both stacks have non-trivial licensing and conformance considerations. EtherNet/IP CIP scanners are widely available from Real-Time Automation, HMS, and the gateway vendors themselves; PROFINET RT is more constrained. Buy, don’t build, unless you have a very specific reason.

Sparkplug B is not a security boundary. Sparkplug only solves the state-management problem on top of MQTT. mTLS, client certificates, broker ACLs, and topic-level authorisation policies still need to be configured explicitly — Sparkplug doesn’t grant those for free.

Edge compute: container runtimes, edge ML, and time budgets

The 2026 gateway is a small compute host with hard real-time obligations on one side and best-effort analytics on the other. The architecture that holds up under stress puts protocol drivers and rules engines in one priority class, edge ML inference in another, and log/metric shipping in a third. Diagram arch_03 shows the layout.

Edge compute slot with container runtime, rules engine, model server, and log/metric agents

Operating system and runtime. Yocto-based custom Linux is still the dominant base for purpose-built appliances (Moxa, Advantech, Phoenix Contact). Ubuntu Core is gaining ground for x86-class IPCs because of snap-based atomic updates. Wind River Linux and Red Hat Device Edge cover the regulated and ruggedised end. On Arm, NVIDIA’s JetPack on Jetson Orin / Thor modules is its own ecosystem, with containerd and the NVIDIA container runtime tightly integrated. PREEMPT_RT patches help any latency-sensitive driver work — necessary if you are doing soft real-time loops at sub-10 ms.

Container runtime. containerd and Podman are both reasonable; k3s is the consensus choice when you want declarative deployment, multi-gateway fleet management, and Kubernetes-style RBAC and secrets. Avoid full vanilla Kubernetes on a gateway — the control-plane footprint is too large and the failure modes are not what an OT team wants to debug at 0200.

Edge ML. The realistic 2026 toolchain has three layers: a training environment in the cloud or data centre; a model conversion step (ONNX, TensorRT, OpenVINO); and an on-gateway model server (NVIDIA Triton Inference Server, ONNX Runtime, TensorFlow Lite, or OpenVINO Model Server) loaded into the compute slot. For vision and vibration workloads on Jetson Orin NX 16 GB and AGX Orin 32/64 GB hardware, vendor-published benchmarks are useful as ceilings — treat them as illustrative and run your own representative pipeline (preprocess, inference, postprocess) before you size capacity.

Time budgets. A simple rules-engine path from “PLC publishes tag” to “edge function emits decision” runs comfortably within a few milliseconds on a modern IPC; an edge ML path that includes feature extraction, batched inference, and a writeback to the PLC sits in the tens of milliseconds with a non-trivial tail. The honest 2026 advice: measure the tail latency, not the mean, and budget for the 99th percentile when you decide what work belongs at the edge versus the cell.

Co-tenancy hygiene. When the same gateway hosts a critical rules engine and a noisy log shipper, isolate them with Linux cgroups, set CPU pinning for the real-time workloads, give them dedicated NIC queues if PTP/TSN is in play, and reserve memory. A heavy bursty log flush has no business pre-empting a Sparkplug B publisher.

Where deterministic delivery on the OT side matters, the gateway and the upstream switching fabric have to agree on traffic class. Our TSN industrial reference architecture 2026 walks through how TSN profiles such as 802.1Qbv and 802.1AS interact with gateway scheduling.

Security zoning: Purdue, IEC 62443, identity, secrets

Security is where naive gateway deployments go wrong fastest. The 2026 baseline: the gateway lives at Purdue Level 3.5, the Industrial DMZ, with a firewall above and below, and acts as the only allowed crossing for OT data. It never has a direct route from L4 enterprise to L1 control. Diagram arch_04 captures the layout.

Purdue and IEC 62443 zoning with the gateway at L3.5 between L4 enterprise and L3 site operations

Zones and conduits. IEC 62443 splits a plant into zones (groups of assets with shared security requirements) and conduits (controlled paths between zones). The gateway typically sits in its own conduit-rich zone. Each conduit is documented (source zone, destination zone, allowed protocols, ports, encryption, authentication), and traffic that doesn’t match the documentation is dropped at the firewall.

Identity. Each gateway gets a unique cryptographic identity rooted in a hardware secure element — TPM 2.0 on x86 IPCs, dedicated secure elements on Arm modules (NXP EdgeLock, NVIDIA’s Fuse Block on Jetson), and PUF-based identity on more recent designs. SPIFFE/SPIRE is a credible identity layer for the container workloads on top, issuing short-lived SVIDs (X.509 or JWT) that the workloads use to mTLS to brokers and APIs.

Secrets. Long-lived API keys on flash are an audit failure. The 2026 pattern: a secrets agent on the gateway (Vault Agent, SPIRE Agent, or vendor equivalent) pulls short-lived secrets at boot, refreshes them on a cadence, and never persists them unencrypted. Rotation is automated. Manual secret rotation on 200 gateways is the operation no team finishes.

Certificate lifecycle. mTLS everywhere — between drivers and PLCs where the protocol supports it (OPC UA, MQTT), between the gateway and brokers, and between the gateway and any management plane. Certificates are issued by a plant or enterprise CA with short lifetimes (90 days is now common, 24 hours is achievable with SPIRE) and an automated renewal job. A cert that expires silently at 0300 on a public holiday is one of the most common causes of “the plant went dark” tickets.

Patch and update. The gateway must support staged rollouts, A/B partitions or atomic snapshots, rollback on health-check failure, and signed images. Yocto’s RAUC, Mender, balena, NVIDIA OTA, and the cloud-IoT-Edge update mechanisms all do this; pick one and treat ad-hoc SSH-and-update as a security incident.

Network policy. Egress allow-lists by destination and port. No outbound on 22, 23, or 3389. Inbound only from the documented southbound conduit. DNS pinned to plant resolvers. NTP pinned to a stratum-2 source inside the OT zone — never sync time directly from the public internet on an OT gateway.

High availability and store-and-forward design

Two failure modes dominate gateway outages: the gateway itself dies (PSU, SSD, kernel panic) and the northbound network partitions (carrier outage, broker upgrade, firewall change). The architecture in diagram arch_05 addresses both.

Dual gateway HA topology with NIC bonding, persistent disk, and state sync between active and standby

Dual-node HA. Two gateways, one active and one standby (or both active in a load-shared configuration), connected to a ring of OT switches via NIC bonding (LACP for the network-side, active-backup for OT rings with PRP/HSR or MRP). The active node owns a floating VIP and virtual MAC; the standby watches via heartbeat and takes over on failure. State synchronisation is the hard part. Three viable approaches:

  • Stateful block replication (DRBD, or vendor equivalents on appliances like Stratus ztC Edge). The standby’s disk is a byte-for-byte copy of the active. Failover is fast but the synchronisation window during long WAN partitions is a careful tuning exercise.
  • Application-level replication with Raft or similar consensus (e.g., NATS JetStream or some MQTT broker clusters that support cluster replication). Cleaner semantics but requires the application to be cluster-aware.
  • Shared-nothing with overlapping subscriptions. Both gateways subscribe to the same southbound sources and publish to the same broker with a coordination layer that dedupes. Easier to operate, but only works if your southbound protocols support multiple readers without state corruption (true for most OPC UA, careful with EtherNet/IP class 1 connections).

Store-and-forward sizing. Budget for the worst credible WAN outage. For a site that publishes 50,000 tags at 1 Hz with ~200 bytes per Sparkplug B payload after compression, that is roughly 10 MB/s peak. A 256 GB SSD will hold around seven hours of continuous data at that rate, less once you account for indexing overhead and replication. For 72-hour buffering you want a 2 TB persistent disk. Treat these as planning estimates and validate against your own payloads.

Backpressure. When the buffer fills, the gateway must degrade gracefully: reduce scan rates on low-priority tags, drop sub-second points first, alert the operator, and never drop birth or death certificates. Hard-coded “drop oldest on full” is a footgun on the day the buffer is your only source of truth.

Replay correctness. When the broker comes back, the gateway must replay buffered messages in timestamp order, mark them as historical (Sparkplug B has the is_historical flag), and not re-trigger downstream alarm rules that already fired locally. Consumers must be tolerant of replays — this is a contract the UNS architect owns.

Power. Industrial-grade DC or PoE+ input, with an internal supercapacitor or small battery sized for graceful shutdown (flush the buffer, close TCP sessions cleanly). PSU redundancy on IPC-class hosts is a small extra cost that pays for itself the first time a 24 V rail glitches.

Vendor landscape

A pragmatic 2026 view of the boxes you will actually see on the shortlist. Specs are vendor-reported and generation-current as of mid-2026 — verify against the latest datasheets before any procurement decision.

Siemens SIMATIC IPC227G / IPC427G. The IPC227G is a fanless Atom-class box favoured for control cabinets where you want a Siemens part number all the way through. The IPC427G is the heavier sibling with Core-class CPUs and more I/O. Both ship with Industrial OS (Linux) options and are commonly deployed with Siemens Industrial Edge for managed application deployment. Good when the enterprise standard is Siemens; expect to pay for the ecosystem alignment.

Stratus ztC Edge 250i. A two-node, fully redundant appliance designed for “always-on” deployments with built-in virtualisation, automated failover, and self-protecting workloads. Aimed at sites where loss of the gateway means immediate production impact. More expensive than a discrete dual-IPC setup but operationally simpler and an easier audit story.

Dell NativeEdge / Edge Gateway 5200. Dell’s edge line includes the Edge Gateway 5200, intended as a converged platform for industrial gateways and small edge compute clusters, with NativeEdge providing the management plane. Useful when the enterprise has standardised on Dell for the data centre and wants the same operational model at the edge.

Moxa AIG-301 / AIG-501. The AIG-301 is a compact Arm-based gateway with broad protocol support (Modbus, OPC UA, MQTT, AWS / Azure connectors) targeted at brownfield retrofit. The AIG-501 adds CPU headroom for edge analytics. Strong on protocol breadth and ruggedisation; less interesting if you need heavy GPU.

Phoenix Contact PLCnext. Phoenix Contact’s PLCnext line blurs PLC and edge gateway with a Linux runtime that hosts both IEC 61131-3 control logic and containerised user code. Compelling where the site wants control and gateway in one box with a single supplier.

AAEON / Advantech. AAEON’s UP Squared, BOXER, and FWS lines, and Advantech’s UNO and EPC families, dominate the “ruggedised Intel/AMD IPC” segment. Wide CPU range, broad I/O options, third-party Linux distributions. The pragmatic choice when you want generic x86 hardware and to bring your own software stack.

NVIDIA Jetson Orin and Thor. The Jetson Orin NX 16 GB module is the current workhorse for vision and ML at the edge, with the AGX Orin 32 GB and 64 GB developer kits covering heavier multi-stream workloads. Jetson Thor is positioned for robotics-class workloads from 2025-2026; treat the published TOPS numbers as ceilings and benchmark your own pipeline. Pair with a carrier board from connectTech, Aetina, or Forecr where industrial I/O matters.

Honourable mentions. HMS Anybus and Red Lion as serious protocol-bridging specialists; Litmus, Crosser, and Inductive Automation Ignition Edge as software stacks that sit on top of the above hardware; Eurotech and Kontron at the regulated and telco-grade end.

The 2026 procurement question is rarely “which gateway is best” and almost always “which one matches our supplier policy, our protocol mix, our update story, and our long-term spares plan”. Optimise for the boring answer.

Trade-offs and gotchas

The deck-friendly version of an edge gateway architecture is clean. The version that survives a five-year deployment is full of trade-offs. The ones we see most often:

Container drift. If you allow operators to docker pull or helm install directly on a gateway, your fleet’s image inventory will diverge within months. The fix is GitOps-style declarative deployment (Flux, Argo, or vendor equivalents on top of k3s) and a hard rule that nothing runs on a gateway that did not come from the registry.

Certificate expiry. Short-lived certs are safer than long-lived ones, but only if rotation actually works on every node. The day a gateway is offline for a hardware swap and rejoins after its cert has expired, the rotation job needs to handle that gracefully. Build a “cert expiring in N days” alert that pages on 7, 3, and 1 days, plus a “cert expired” alert that pages immediately.

Time sync. A gateway whose clock drifts by 30 seconds produces tags that historians will silently reorder. Run PTP where the network supports it, NTP from the plant time source otherwise, and alert on any drift over 100 ms. Never let a gateway use the host’s default pool.ntp.org.

Network partition behaviour. Confirm what your gateway does when the OT network is up but the IT-side broker is unreachable. The expected behaviour is: keep polling PLCs, fill the buffer, publish locally if there is a local broker, and stop publishing northbound. The wrong behaviour — which we still see on naive deployments — is to spin connect-retry loops that DOS the upstream firewall when it comes back.

Driver back-compat. Updating the OPC UA stack from 1.04 to 1.05 has, in real deployments, broken sessions with older Siemens or Rockwell firmwares. Stage driver upgrades by zone and keep a per-driver rollback story; never globally push a driver image in one window.

Buffer-replay storms. When a 4-hour WAN outage ends and 50 gateways simultaneously replay their buffers, the broker cluster can buckle. Stagger replays (random jitter, plus a rate limit on is_historical payloads) and load-test broker behaviour on this scenario before you ever live with it.

Edge ML model rot. A model trained on Q4 2025 data will degrade on Q3 2026 data. Plan for periodic retraining, A/B deployment of new model versions, and a kill switch that falls back to a rules-only path when inference quality drops below a threshold. Don’t pretend the model is forever.

Secret rotation on air-gapped sites. SPIFFE/SPIRE assumes the gateway can reach the SPIRE server. On true air-gapped sites you need a local SPIRE server or an equivalent. Plan that into the topology, not as an afterthought.

Vendor lock at the management layer. The hardware is rarely the lock-in. The management plane is. Be deliberate about whether you commit to Siemens Industrial Edge, Dell NativeEdge, Azure IoT Edge, AWS IoT Greengrass, or an open layer like k3s + Flux + Prometheus. The exit cost of the management plane is the real long-term cost.

Observability fatigue. Shipping every PLC tag and every container log to a central SIEM will bankrupt the log budget. Use the gateway to filter and aggregate: per-driver scan health, per-tag stale counts, per-broker reconnect counts, per-container CPU/RSS, per-disk free space. That handful of metrics catches 90% of incidents.

Compliance proof. The auditor will ask for a signed list of every package version on every gateway. If you cannot produce it in five minutes, you have an inventory problem, not a gateway problem. Solve it before the audit, not during.

Practical recommendations

A pragmatic shortlist for a 2026 IIoT edge gateway deployment, drawn from sites that are working today:

  1. Decide your unified namespace first. Pick MQTT 5 + Sparkplug B (most common), Kafka topics (when you already run Kafka and need replay), or OPC UA PubSub (where vendors push back on MQTT). The choice constrains everything else.
  2. Standardise on two gateway SKUs, not ten. A small box for cabinets (Moxa AIG-301 class) and a larger one for edge compute (Jetson AGX Orin or Advantech IPC class) covers most needs. Spares, training, and patching all benefit from a small SKU list.
  3. Insist on a hardware secure element. TPM 2.0 or equivalent. Non-negotiable.
  4. Run a single container runtime fleet-wide. k3s is the safe 2026 default. Manage it declaratively.
  5. Define the conduit before you ship the gateway. Document allowed protocols, ports, encryption, authentication, source and destination zones. Get sign-off from the OT security lead.
  6. Size the buffer for 72 hours. Cheaper than the day you wish you had.
  7. Automate certificate rotation. Test the renewal path in staging. Test the expired-cert recovery path too.
  8. Set up the alerts on day one. Buffer depth, scan-loop health, reconnect counts, cert expiry, disk free, CPU steal, container restarts. Page the right team.
  9. Run a partition drill twice a year. Disconnect the WAN intentionally. Watch the buffer fill, the replays drain, the alarms behave. Fix what breaks.
  10. Plan model lifecycle and rollback. Edge ML in production needs a deployment story and a fallback story.

FAQ

Q1. What sits in the IDMZ versus on the gateway itself?
The IDMZ typically hosts shared services (reverse proxies, jump hosts, the broker cluster if it is plant-wide, identity providers, a SIEM forwarder). The gateway is the device that originates OT data and enforces the southbound conduit policy. In smaller plants the gateway and the IDMZ broker can co-locate; in larger plants they are separated for blast-radius and patching reasons.

Q2. Do I still need OPC UA if I am running Sparkplug B end-to-end?
Yes, in most plants. OPC UA is the southbound protocol of choice for talking to modern PLCs and smart sensors; Sparkplug B is the northbound layer that publishes the normalised tags to the unified namespace. They solve different problems. A gateway often runs both: OPC UA client to the PLC, Sparkplug B Edge Node to the broker.

Q3. Can I run edge ML on the same gateway as the protocol drivers?
Yes, with discipline. Pin the protocol drivers to dedicated CPUs and a high priority class, run the model server on the GPU or NPU, and use cgroup limits on memory. Validate the worst-case CPU and memory profile under simultaneous load before deploying. If the inference workload is heavy and shared with vision streams, split into two gateways (one for protocol, one for ML) and connect them over the local broker.

Q4. How do I do zero-downtime gateway updates?
A/B partition firmware (Yocto + RAUC, Mender, or vendor equivalents); deploy applications as containers managed by k3s with rolling updates; for HA pairs, drain one node, update it, fail back, then update the partner. The discipline is to make every update reversible within minutes and to validate health after each step before proceeding.

Q5. What happens when the gateway itself runs out of disk?
Three things should happen in order: (1) operator alert at 80% full; (2) automatic tail-drop of the lowest-priority tags, never of birth or death certificates; (3) hard pause of northbound publishing only after store-and-forward integrity is at risk, with a loud audible alarm at the SCADA. Never silently drop data. Disk full is an SRE event, not a routine condition.

Q6. How do I prove IEC 62443 compliance for a gateway?
Three artefacts get you most of the way: a network architecture document showing zones, conduits, and gateway placement; an asset register with every gateway’s hardware ID, firmware version, certificate fingerprint, and last patch date; and an audit log proving every change to firmware, container images, network policy, and secrets is tied to an authenticated user and a change ticket. The product certifications (62443-4-1 for the supplier, 4-2 for the device) help, but operations are what the auditor actually examines.

Further reading

  • IEC 62443-3-3, “System security requirements and security levels” — the operational reference for OT segmentation, identity, and audit.
  • ISA-95 / IEC 62264, “Enterprise-control system integration” — the Purdue and equipment-hierarchy vocabulary.
  • Eclipse Sparkplug Specification v3 (Eclipse Foundation, 2023) — authoritative Sparkplug B reference.
  • OPC Foundation, “OPC Unified Architecture, Part 14: PubSub” — for the publish/subscribe extensions discussed above. See also our OPC UA protocol complete technical guide.
  • OASIS, “MQTT Version 5.0” — the wire-level reference; pair with our MQTT protocol complete technical guide.
  • IEEE 802.1Q-2022 (Time-Sensitive Networking amendments) — for the determinism story when TSN matters. Our TSN industrial reference architecture 2026 walks through the profiles.
  • NIST SP 800-82r3, “Guide to Operational Technology (OT) Security” — useful as a complement to IEC 62443.
  • ENISA, “Good Practices for Security of IoT in the Industrial Sector” — practical EU-flavoured guidance that pairs well with NIS2.
  • SPIFFE/SPIRE documentation (spiffe.io) — the workload identity reference if you go down the short-lived SVID path.
  • NVIDIA Jetson Orin and Thor datasheets and developer guides — for the edge ML hardware sizing exercise.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *