Sparkplug B 3.0 Edge of Network Architecture: Production Reference
Last Updated: 2026-05-16
Architecture at a glance





A Sparkplug B 3.0 edge architecture is not a protocol diagram you copy from the spec. It is the shape your plant takes when MQTT 5, store-and-forward edge nodes, a primary host application, and an ISA-95 namespace are deployed under real failure conditions. The 3.0.0 release published by the Eclipse Sparkplug Working Group in late 2023 is the first version blessed as an Eclipse specification and the first to formally reference MQTT 5 features, so the patterns that worked under 2.2 need a careful audit before you ship them to a brownfield site.
This post is a working reference for that audit. We start with the wire and the namespace, then move outward to primary host failover, MQTT 5 mapping, and the security topology. Every section ends with the trade-offs that bite in production — the failure modes Sparkplug papers rarely cover, the gotchas that show up on the night of cutover, and the numbers worth measuring before you sign off the system. By the end you should have a defensible reference architecture for greenfield deployments and a checklist for retrofits.
Context: How Sparkplug B Got to 3.0
Sparkplug began at Cirrus Link around 2016 as a thin specification on top of MQTT 3.1.1 that defined topic structure, payload encoding, and birth/death semantics for industrial telemetry. The 2.2 revision, donated to the Eclipse Foundation in 2019, codified the protobuf payload (org.eclipse.tahu.protobuf.Payload) and the seven message types — NBIRTH, DBIRTH, NDATA, DDATA, NDEATH, DDEATH, STATE — that every implementation still ships [1].
The 3.0.0 release, approved as an Eclipse Foundation Specification on 19 November 2023, was less about new features than about three structural changes [1]. First, the specification was rewritten as a normative Eclipse spec rather than a vendor whitepaper, with conformance assertions that compliance tests can target. Second, MQTT 5 became a first-class citizen — properties such as Content Type, Payload Format Indicator, User Properties, and Session Expiry Interval are now documented mappings rather than implementation folklore. Third, ambiguities around bdSeq, sequence-number wrap, and host application STATE retention were closed, which matters when you are tuning a cluster for failover.
What did not change is just as important. The wire protocol is still MQTT (3.1.1 or 5.0), the payload is still the Tahu protobuf, and the topic namespace is still spBv1.0/Group/MessageType/EdgeNode/Device. That backward compatibility is why brownfield migrations from 2.2 to 3.0 are usually a host-and-broker upgrade rather than a fleet rewrite. The edge of network nodes (EoNs) that worked under 2.2 will speak to a 3.0 host if their bdSeq/sequence handling was correct, and most of the production grief in 2026 comes not from spec changes but from the assumptions teams baked in when the spec was looser. For the protocol-level context, the Sparkplug B 3.0 unified namespace guide on this site is the companion piece.
The Edge of Network: Reference Architecture

The Sparkplug B 3.0 edge architecture has four logical tiers: field devices, the Edge of Network Node (EoN), the MQTT broker, and the primary host application that consumes the namespace. Everything else — historians, SCADA, MES, cloud lakehouses — subscribes downstream of the primary host or directly to the broker under tighter ACLs. Drawing the line between EoN and broker correctly is the single decision that drives most of the cost and most of the reliability of the system.
Field-side: protocol translation lives at the EoN, not the broker
The EoN is the first MQTT speaker in the chain. Field devices typically talk Modbus TCP/RTU, EtherNet/IP, Profinet, OPC UA, or 4-20 mA via I/O modules, and the EoN translates those into Sparkplug metrics. This is not negotiable in a clean design — putting Modbus polling logic inside the broker or the primary host conflates concerns and destroys the store-and-forward guarantee. Commercial EoN runtimes (Ignition Edge, HiveMQ Edge, Litmus Edge, Cirrus Link Modules, Flow Software) and the open-source Tahu library all assume the EoN owns protocol translation, buffering, and the bdSeq lifecycle.
The EoN also owns the local cache. When the broker is unreachable, every spec-compliant EoN buffers NDATA/DDATA to disk, then replays in order on reconnect. The buffer policy — bytes vs. messages, drop-oldest vs. drop-newest, durable on power-loss — is implementation-defined and worth verifying. A 24-hour outage at a remote pad can produce hundreds of megabytes of buffered metrics; if your EoN’s buffer policy is “ring of 10,000 messages”, you will silently lose data and never notice until a regulator asks.
The broker tier: HA, not a single box
A production Sparkplug deployment runs the broker as a cluster, not a single VM. HiveMQ, EMQX, Bevywise, NanoMQ, and Cirrus Link’s MQTT Distributor all support clustering with shared state, and any of them will satisfy the spec if configured correctly. The non-obvious requirement is that retained messages, STATE messages, and Will messages must survive a broker node failover — otherwise a primary host’s STATE/host_id = ONLINE message disappears mid-flight and the EoNs across the fleet incorrectly believe the host is down. HiveMQ’s documentation calls this out explicitly under their Sparkplug extension [2].
The broker is also where you draw the network boundary. The OT-side listener (mTLS on TCP/8883) faces the EoNs; the IT-side listener (mTLS or OAuth-bearer over TCP/8883 or WebSockets) faces consumers. Bridging two brokers — a plant-local broker and a regional/cloud broker — is a common pattern and is supported by every major broker, but each bridge adds a hop that can drop sequence guarantees if it isn’t configured to preserve QoS 1 and order. For the broker-selection decision and MQTT 5 features that change the calculus, see the MQTT 5 features deep dive.
Consumer tier: subscribe through the primary host, not the broker
A common antipattern is to wire SCADA, historian, and cloud directly to the broker with overlapping subscriptions on spBv1.0/+/+/+/+. It works in a lab. It breaks at scale because every consumer independently tracks bdSeq/seq state, every consumer has to handle rebirths, and the broker fan-out load grows linearly with consumers. The cleaner pattern is a single primary host application that owns the canonical model and republishes resolved state (e.g., to a Kafka topic, an OPC UA server, or an internal pub/sub) for downstream consumers. SCADA and historians become consumers of the resolved model, not of raw Sparkplug.
This pattern is also what makes Sparkplug compatible with a true unified namespace architecture for industrial IIoT — the broker is the transport, but the canonical contract lives one layer up.
Topic Namespace and ISA-95 Alignment

The Sparkplug topic namespace is fixed at five levels: spBv1.0/Group/MessageType/EdgeNode/Device. Sparkplug B 3.0 keeps this exactly as 2.2 defined it [1]. What changes between deployments is how you map those five levels to your physical and process hierarchy — and getting that mapping wrong is the most common reason a Sparkplug rollout looks fine for six months and then becomes unmanageable in year two.
The natural alignment is ISA-95. ISA-95 (IEC 62264) defines a five-level equipment hierarchy: Enterprise, Site, Area, Production Line (Work Center), and Work Unit. Sparkplug only gives you three identifier slots below the message-type — Group ID, Edge Node ID, Device ID — so something has to collapse. The defensible mapping is to put Site or Area into Group ID, Line or Cell into Edge Node ID, and the specific Work Unit (or actual physical device) into Device ID. Enterprise rarely belongs in the topic because most plants only ever have one enterprise from their own perspective, and burning a level on a constant is wasteful. If you genuinely have multi-enterprise traffic on the same broker, encode it as a topic prefix above spBv1.0/ via broker-level namespace isolation, not inside Sparkplug’s reserved levels.
The other axis is message type. Sparkplug reserves the second level for one of NBIRTH, DBIRTH, NDATA, DDATA, NDEATH, DDEATH, or STATE. Subscribers filter by message type to separate lifecycle traffic from telemetry: a historian wants +/DDATA/+/+, a tag-discovery service wants +/DBIRTH/+/+, and an HA monitor wants +/STATE/+/+ (or, in 3.0, spBv1.0/STATE/<host_id>). MQTT 5 shared subscriptions on the $share/group/spBv1.0/+/DDATA/+/+ pattern let you parallelize consumers across a cluster without doubling broker fan-out — a 3.0 deployment that ignores shared subs is leaving substantial cost on the table.
Where teams go wrong is putting semantic data in identifiers. Encoding asset tags like MIXER-04-TEMPERATURE in the Device ID feels natural until you have to rename equipment, swap a unit, or merge two sites. Sparkplug metrics carry their own name field inside the protobuf payload, so the topic should identify the physical edge — the IP-addressable box doing the publishing — and the metric names inside DBIRTH/DDATA should carry the semantic structure (folder paths via / in metric names are explicitly allowed). This keeps the topic stable while letting the model evolve. The Eclipse Sparkplug specification document is explicit that metric names may be hierarchical [1, §6.4].
A final note on STATE. In 3.0 the host application’s STATE topic was clarified to spBv1.0/STATE/<host_id> rather than the looser scheme some 2.2 implementations used. If you are migrating, audit every host’s STATE topic literally — a stale 2.2-style path is one of the most common reasons EoNs in a 3.0 fleet think the host is offline when it isn’t.
Primary Host, Secondary Hosts, and Store-and-Forward

The Sparkplug B primary host is the single application designated as the consumer-of-record for a Sparkplug namespace. Only one primary host owns a host_id at a time, and EoNs use that host’s STATE topic to decide whether to send data or hold it in store-and-forward. Secondary hosts subscribe in parallel for observability or warm standby but do not assert STATE. Getting this asymmetry right is what makes Sparkplug usable for SCADA-class control loops rather than just telemetry.
The 3.0 specification defines the primary host’s STATE message as a retained MQTT message on spBv1.0/STATE/<host_id> with a payload of {"online": true, "timestamp": <epoch_ms>} (JSON, not protobuf) [1, §6.5]. The host’s Will message is the same topic with {"online": false, ...} so that an ungraceful disconnect flips the flag automatically through the broker’s Will mechanism. Every EoN configured with that host_id subscribes to the topic, and the moment they see online: false they enter store-and-forward: telemetry continues to be timestamped and queued locally, but no NDATA/DDATA is published until the host returns online: true.
Store-and-forward is where most production deployments either pay off or fail badly. The spec mandates that an EoN MUST buffer outgoing messages when the primary host is offline and replay them in order on recovery, but it deliberately leaves buffer size, persistence (in-memory vs. on-disk), and overflow policy to implementations [1]. The questions worth answering at design time are: how many hours of outage can each EoN tolerate at peak data rate; does the buffer survive a hard power cycle of the EoN itself; and does the EoN deduplicate on replay or trust the host to handle out-of-order timestamps. For control-loop use the answer to all three usually needs to be “disk, survives power-cycle, replays in strict timestamp order.”
Failover between a primary and a warm-standby host is not a Sparkplug protocol feature — Sparkplug only knows about one primary at a time. The pattern that works is two host applications writing to the same host_id but only one with online: true retained at any moment, coordinated through an external leader-election (etcd, Consul, Kubernetes leases). When the leader changes, the new leader publishes online: false for the prior leader’s STATE (via the broker’s session takeover) and then online: true for itself. EoNs see no gap, and the sequence-number contract is preserved because all NBIRTHs flow through the same host_id consumer state machine.
The subtle failure mode here is split-brain: two hosts both believing they are leader, both publishing STATE/online: true, EoNs flapping between them. Defend against it by giving every host application a unique MQTT client ID (so the broker enforces session uniqueness if you also enforce it via ACL) and by making the leader-election fence operation gate the STATE publish, not just the local “I am leader” flag. The comparison of OPC UA versus MQTT Sparkplug B on this site walks through the equivalent failure modes in the OPC UA Pub/Sub world for teams choosing between the two stacks.
Mapping Sparkplug B 3.0 to MQTT 5 Features

MQTT Sparkplug B 3.0 is the first version that explicitly cross-references MQTT 5. MQTT 5 was published by OASIS in March 2019 [3] and adds properties, reason codes, shared subscriptions, topic aliases, session expiry, message expiry, payload format indicators, and request/response correlation. Sparkplug 3.0 does not require MQTT 5 — implementations may continue to use MQTT 3.1.1 — but it documents which MQTT 5 features map to Sparkplug semantics and which do not [1, §4.2]. Knowing the mapping is what lets you tune a fleet for cost and resilience rather than just compliance.
The most useful MQTT 5 features for a Sparkplug fleet are:
- Will Delay Interval. Lets an EoN’s
NDEATHWill fire only after a configurable grace period, suppressing false-positive offline events on transient network blips. Useful for cellular/satellite links where 15-30 seconds of dropout is normal. - Session Expiry Interval. Replaces the binary
CleanSessionflag with a numeric expiry. Set it to several hours for EoNs so a reconnecting node resumes its session and the broker can deliver any queuedNCMD/DCMDmessages it missed. - Topic Aliases. The broker can replace long topic strings with a 2-byte alias after first use, which materially cuts bandwidth on cellular EoNs that publish many short metrics. Eclipse Tahu and HiveMQ both support this transparently.
- Shared Subscriptions (
$share/group/topic). DistributeDDATAconsumption across a pool of consumers — for example, a horizontally-scaled stream processor — without the broker fanning out to every instance. - Payload Format Indicator and Content Type.
Content Type = application/protobufandPayload Format Indicator = 0(binary) make Sparkplug payloads explicitly typed to broker-side tooling and inspection proxies. - User Properties. Free-form key-value pairs on every PUBLISH. Useful for adding a tracing ID, a tenant tag, or an integrity hash without polluting the protobuf payload.
Reason codes on CONNACK, PUBACK, and DISCONNECT are the other large win. Under 3.1.1 a refused connection just disconnects; under MQTT 5 you get a reason code like 0x87 Not Authorized or 0x99 Payload Format Invalid and can route an alert correctly. For EoN observability this is the difference between “broker silently rejected our publish” and “we know exactly what failed.”
Features Sparkplug deliberately does not use are equally informative. Sparkplug requires QoS 0 for NBIRTH/DBIRTH (because they are followed immediately by NDATA/DDATA that establishes sequence) and QoS 1 or 0 for data messages [1, §6.3]; QoS 2 is allowed but discouraged because the additional handshake adds latency without semantic gain on top of the spec’s own bdSeq/seq ordering contract. Retained messages are only used for STATE, never for NBIRTH/NDATA — retaining lifecycle messages would break the rebirth contract because new subscribers would receive stale NBIRTHs. If you see a fleet retaining DDATA, somebody mis-read the spec.
Production Patterns: Security, Scale, Failure Modes

Sparkplug B production deployment rests on three pillars: identity, transport security, and the operational handling of failure modes. The protocol itself gives you almost no security — Sparkplug is a payload and topic convention on top of MQTT, and MQTT is just plain TCP unless you configure it otherwise. Everything below comes from the deployment, not the spec.
Identity at the edge is X.509 client certificates per EoN. The mTLS handshake on TCP/8883 authenticates the broker to the EoN and the EoN to the broker; the broker then maps the client certificate’s subject to an authorisation principal and applies topic ACLs. The non-obvious operational requirement is certificate rotation. Industrial EoNs run for years and most operators set cert lifetimes at 1-2 years to balance security against renewal pain. Plan a rotation runbook now — the worst time to discover you have no out-of-band cert delivery channel is the day before 4,000 EoN certs expire. KMS or HSM custody of the broker’s CA is non-negotiable for any deployment that handles regulated process data.
For the consumer tier — historians, SCADA, cloud — OAuth 2.0/OIDC with JWT bearer tokens is now the dominant pattern. HiveMQ, EMQX, and Cirrus Link Distributor all support OAuth via extensions or built-in plugins. The mTLS-everywhere alternative still works but is administratively harder when consumer headcount is dynamic (think: a Databricks job that comes up, subscribes, and tears down). TLS 1.3 in transit everywhere, no exceptions; TLS 1.2 with PFS cipher suites is acceptable for legacy EoNs that genuinely cannot upgrade but should be tracked as deprecation debt.
ACLs at the broker should be topic-scoped. An EoN’s client certificate authorises publishing only on spBv1.0/<its_group>/+/<its_node_id>/+; a primary host authorises subscribing on spBv1.0/+/+/+/+ and publishing on spBv1.0/STATE/<its_host_id>. Avoid wildcard publishing privileges for any client — a compromised EoN with spBv1.0/+/# publish rights can blast spoofed NBIRTH messages across the fleet and trigger cascading rebirths.
Scale-wise, the numbers worth knowing: HiveMQ has published benchmarks showing single-cluster deployments at 100,000+ concurrent Sparkplug-style clients, and EMQX has demonstrated millions of MQTT connections on appropriately sized clusters [2,4]. The bottleneck in practice is rarely raw broker throughput — it is the primary host’s ability to handle rebirth storms. A 10,000-EoN fleet that all rebirth within a 60-second window after a broker upgrade can saturate a single primary host’s tag-resolution pipeline. Test for this. Stagger broker maintenance, throttle NCMD/Rebirth requests, and architect the host with a queue between MQTT ingestion and downstream republishing.
The dominant failure modes to design against: broker partition (a multi-AZ broker cluster that splits brain), EoN clock drift (metric timestamps disagreeing with host-side wall clock), bdSeq wrap (the bdSeq is a uint16 in the protobuf, so a node that bounces 65,536 times in a session theoretically wraps), and silent buffer overflow at the EoN. For a deeper comparison with adjacent transports, see DDS vs. MQTT vs. OPC UA industrial messaging protocols 2026.
Trade-offs and Gotchas
Sparkplug B 3.0 is not free. Three trade-offs are worth flagging before you commit.
First, the protobuf payload is opaque to anyone without the schema. That is great for bandwidth (Tahu payloads are typically 30-60% smaller than equivalent JSON) and bad for debugging — mosquitto_sub shows you bytes, not metrics. Plan for a Sparkplug-aware inspection tool (Eclipse Tahu’s org.eclipse.tahu.SparkplugBPayloadDecoder or a HiveMQ Sparkplug extension) and budget the team-time to use it.
Second, the single-primary-host model is a hard constraint. You can have secondary hosts for read-only observability, but only one application can own the host_id at a time. Teams that want “two SCADA systems writing to the same edge fleet” cannot do that with one Sparkplug namespace — they need two namespaces (different host_ids, possibly different brokers) or they need to put a queue/transformation layer above Sparkplug. Mistaking this is the single most common architectural rework on Sparkplug projects.
Third, rebirth storms. Anything that causes a fleet to reconnect simultaneously — broker restart, host restart, network partition recovery — produces a synchronized burst of NBIRTH + DBIRTH traffic. A site with 500 EoNs each publishing 200 metrics on rebirth produces 100,000 metric definitions in a 30-60 second window. Brokers handle this; primary hosts often don’t. The mitigation is rebirth throttling at the host (queueing NCMD/Rebirth requests over a 5-10 minute window during planned events) and back-pressure between the MQTT consumer thread and the model-update pipeline.
A subtler gotcha: Sparkplug timestamps are millisecond UTC and most EoNs trust their local clock. NTP drift on industrial gateways can be hundreds of milliseconds. If you are doing tight cross-asset correlation, sync EoNs to a hardened time source (PTP where you can, NTP with multiple sources elsewhere) or accept that the timeline you are reconstructing has noise.
Practical Recommendations
For a greenfield site in 2026, start with these defaults. Use Sparkplug B 3.0 over MQTT 5, not 3.1.1. Pick a broker that ships first-class Sparkplug tooling (HiveMQ Enterprise with the Sparkplug Extension, EMQX with its Sparkplug plugin, or Cirrus Link MQTT Distributor) — the months of effort saved on debugging are worth the license. Cluster the broker across three availability zones with a minimum three-node quorum; never run a single-broker production deployment.
At the edge, choose an EoN runtime with explicit disk-backed store-and-forward (Ignition Edge, HiveMQ Edge, Litmus Edge, Cirrus Link Modules). Configure each EoN with a buffer that survives a hard power cycle and is sized for at least 24 hours of outage at peak rate. Use one X.509 client cert per EoN, issued from a KMS-custodied CA, with a documented rotation runbook.
For the primary host, write or buy something with three properties: rebirth throttling, a bounded queue between MQTT ingest and downstream republish, and a separate STATE/host_id topic per environment (prod, staging, dr) so a failover test doesn’t accidentally take down production EoNs. Coordinate primary/secondary failover through an external leader election, not through MQTT alone.
For consumers (SCADA, historian, cloud), subscribe to the primary host’s resolved model — Kafka, OPC UA, internal pub/sub — rather than direct to Sparkplug. The one exception is a Sparkplug-native analytics consumer that uses MQTT 5 shared subscriptions to scale horizontally.
FAQ
What is the difference between Sparkplug B 3.0 and 2.2?
Sparkplug B 3.0, approved by Eclipse on 19 November 2023, is the first version published as a normative Eclipse Foundation Specification rather than a vendor-driven whitepaper. The wire protocol, protobuf payload, topic namespace, and seven message types are unchanged from 2.2. What changed is the formal MQTT 5 mapping, clarified STATE topic format (spBv1.0/STATE/<host_id>), tightened conformance assertions, and resolved ambiguities around bdSeq and sequence-number behaviour. Most 2.2 EoNs work against a 3.0 host without modification.
Can Sparkplug B 3.0 use MQTT 3.1.1 or does it require MQTT 5?
Sparkplug B 3.0 supports both MQTT 3.1.1 and MQTT 5; the spec does not mandate MQTT 5. However, MQTT 5 unlocks features Sparkplug benefits from — Session Expiry Interval, Will Delay, Topic Aliases, Shared Subscriptions, reason codes, and User Properties — that are unavailable on 3.1.1. New deployments should default to MQTT 5 unless they have a brownfield broker constraint, and migration plans should treat 3.1.1 as a transitional state with a defined sunset.
How does the Sparkplug primary host failover work?
Sparkplug’s primary host model permits exactly one host owning a given host_id at a time. That host publishes a retained STATE message with {"online": true} and uses an MQTT Will of {"online": false} so abrupt disconnect flips the flag automatically. Failover to a warm standby is coordinated outside Sparkplug — typically via etcd, Consul, or a Kubernetes lease — with the new leader publishing online: true after the prior leader publishes online: false. Split-brain prevention requires fenced leader election, not just MQTT semantics.
What is store-and-forward in Sparkplug, and what does the spec actually require?
Store-and-forward is the EoN’s local buffer for telemetry published while the primary host is offline or the broker is unreachable. Sparkplug B 3.0 requires that EoNs buffer outgoing messages during outage and replay them in order on recovery, but deliberately leaves buffer size, persistence model (memory vs. disk), and overflow policy to the implementation. Production deployments should require disk-backed buffering that survives power-cycle, sized for at least the worst-case outage duration at peak data rate.
Does Sparkplug B 3.0 work with the unified namespace concept?
Yes — Sparkplug is one of the dominant transports for unified namespace (UNS) implementations. Sparkplug supplies the topic structure, lifecycle messages, and contract; UNS is the broader architectural pattern where a single broker (or federated broker mesh) is the canonical source of truth for plant data. The mapping is straightforward: Sparkplug’s Group/EdgeNode/Device identifiers align with ISA-95 levels, and Sparkplug’s metric-name hierarchy carries the semantic model. A UNS does not require Sparkplug, but Sparkplug is the cleanest path to one in 2026.
Further Reading
- Pillar: Sparkplug B 3.0 protocol and unified namespace guide
- MQTT 5 features deep dive: shared subscriptions and topic aliases for 2026
- DDS vs. MQTT vs. OPC UA: industrial messaging protocols 2026
- Unified namespace architecture for industrial IIoT
- OPC UA vs. MQTT Sparkplug B comparison
- External: Eclipse Sparkplug Specification 3.0.0
- External: OASIS MQTT Version 5.0 specification
- External: Cirrus Link Sparkplug documentation
References
- Eclipse Sparkplug Working Group. Sparkplug B Specification, Version 3.0.0. Eclipse Foundation Specification, approved 19 November 2023. https://sparkplug.eclipse.org/specification/version/3.0/documents/sparkplug-specification-3.0.0.pdf
- HiveMQ. HiveMQ Enterprise Extension for Sparkplug — documentation. https://docs.hivemq.com/hivemq-sparkplug-extension/
- OASIS. MQTT Version 5.0 — OASIS Standard. Published 7 March 2019. https://docs.oasis-open.org/mqtt/mqtt/v5.0/mqtt-v5.0.html
- EMQX / EMQ Technologies. EMQX 5 MQTT broker scalability benchmarks. https://www.emqx.com/en/blog/reaching-100m-mqtt-connections-with-emqx-5-0
- Cirrus Link Solutions. Chariot MQTT Server and Sparkplug documentation. https://docs.chariot.io/display/CLD/Sparkplug
- ISA / IEC. ANSI/ISA-95 (IEC 62264) — Enterprise-Control System Integration. International Society of Automation. https://www.isa.org/standards-and-publications/isa-standards/isa-standards-committees/isa95
