Apache Pulsar Geo-Replication for Industrial IoT Telemetry

Apache Pulsar Geo-Replication for Industrial IoT Telemetry

Apache Pulsar Geo-Replication for Industrial IoT Telemetry

A global manufacturing operation with 50 plants across 5 continents faces a brutal reality: millions of IoT sensors fire telemetry at 10,000+ messages per second, distributed latency demands are non-negotiable, and regulatory sovereignty requirements (GDPR in EU, China cybersecurity law in APAC, India DPDP Act) mean data cannot be treated as a uniform stream. Apache Pulsar’s geo-replication was built for exactly this problem. Where Kafka MirrorMaker bolts replication onto a single-region architecture after the fact, Pulsar treats multi-region as a first-class design primitive. This post walks through the architectural decisions, BookKeeper’s segment storage model, replication trade-offs, and a production-grade 5-region backbone for Apache Pulsar geo-replication in industrial IoT. What this post covers: why standard messaging stacks break under global IIoT telemetry; how Pulsar’s stateless broker + BookKeeper ledger design enables elastic geo-replication; topology patterns and selective replication for sovereignty; a reference architecture for multi-plant manufacturers; edge pre-processing with Pulsar Functions; and operational practices for running Pulsar clusters across continents.

Why IIoT Pushes Messaging Stacks to Their Limits

Industrial IoT is not a horizontal slice of web telemetry. Plants produce structured streams with legal compliance requirements, devices demand exactly-once semantics for billing and quality-score audits, and the geographic distribution is not a feature—it is an operational mandate.

Volume and velocity. A single automotive assembly line with vision systems, pressure sensors, and actuator feedback generates 500–1,000 messages per second. Scale that to 50 plants and you are at the 25,000–50,000 msg/sec floor. Kafka handles this, but the broker’s partition-leader design couples replication tightly to the primary cluster topology. Moving data out of region means running MirrorMaker clusters in every region as a separate logical system—operationally expensive and prone to lag and inconsistency.

Regulatory sovereignty. The EU’s GDPR mandates that personal data (e.g., facility location, shift worker identity linked to asset assignments) remain in-region. China’s Cybersecurity Law requires sensitive industrial data to stay within Chinese infrastructure. India’s DPDP Act imposes similar localization rules. A single global Kafka cluster cannot satisfy these constraints; you must shard by region and enforce replication boundaries. Pulsar’s namespace-level replication control makes this declarative—a single cluster can host both “global-analytics” (replicated to all regions) and “local-control” (in-region only) topics.

Latency budgets. A closed-loop control loop feeding sensor data to edge ML models expects <500ms end-to-end latency. Sending raw sensor data from a São Paulo plant to a US East analytics cluster and back via Kafka MirrorMaker introduces 200–400ms of replication lag, plus serialization overhead. Pulsar Functions running co-located with the regional broker can pre-process, aggregate, and enrich data before cross-region replication, reducing bandwidth and latency by 10–50x.

Exactly-once for billing. Manufacturing OEE (Overall Equipment Effectiveness) and predictive maintenance depend on never double-counting a machine failure or asset utilization event. Kafka’s partition-leader design guarantees exactly-once within a cluster, but cross-region exactly-once requires external coordination. Pulsar’s replication mechanism is built on durable subscription offsets managed per-cluster, making it easier (though not automatic) to achieve end-to-end exactly-once semantics.

Data replay and ML training. Historical telemetry is gold for training time-series models. Pulsar’s tiered storage integration (to S3 / GCS / Azure Blob) makes it efficient to materialize months of history for ML researchers without burning disk on every broker.

Pulsar Architecture: Brokers + BookKeeper + ZooKeeper/Oxia

To understand why Pulsar’s geo-replication is elegant, you must first understand its separation of concerns.

Brokers are stateless. Unlike Kafka, where a broker holds partition leaders and replicas, a Pulsar broker is purely a compute plane. It accepts produce/consume requests, routes them to the ledger system, and coordinates subscriptions. A broker can be killed and replaced without data loss or rebalancing. This is the key architectural insight: scaling a cluster means adding brokers without reshuffling data.

BookKeeper is the storage plane. Every Pulsar topic is a sequence of ledgers, each a distributed write-ahead log. When a producer sends a message, the broker writes it to the current ledger via the BookKeeper client library. BookKeeper distributes the write across a quorum (typically 2 out of 3 bookies) for durability. Once a ledger fills (default 50 MB or 5 minutes), it is sealed and a new ledger begins. This segment-based design is critical for geo-replication: sealed ledgers are immutable and can be shipped across regions without coordination.

BookKeeper bookies run two types of disks:
Journal disk (NVMe): write-optimized, stores in-flight entries before they are batched to the ledger disk.
Ledger disk (SSD): read-heavy, stores sealed ledger segments in a columnar index.

Tiered storage integration allows cold (sealed) ledger segments to be offloaded to S3 / GCS / Azure Blob automatically, freeing bookie disk for hot segments.

Metadata: ZooKeeper (or Oxia in Pulsar 3.x+). The broker cluster, bookie cluster, and consumer subscriptions are coordinated by a metadata service. Pulsar 2.x uses Apache ZooKeeper; Pulsar 3.x introduced Oxia, a ZooKeeper replacement built for lower latency and better distributed consensus. For new deployments, Oxia is preferred. For legacy ZK deployments, migration tools exist. The metadata layer is per cluster—each region runs its own ZK/Oxia ensemble.

Topology. A Pulsar cluster consists of:
– 3+ Brokers (stateless, horizontal scale).
– 3+ Bookies with dedicated journal + ledger disks (the actual storage).
– 3 ZooKeeper nodes (or Oxia replicas) for metadata.

A producer connects to any broker, which fans the write across 2+ bookies. Consumers subscribe to a topic and read from sealed ledgers (which can be on any bookie, or even cold storage if tiered storage is enabled).

alt

Geo-Replication Models: Full-Mesh, Hub-and-Spoke, Active-Standby

Pulsar replication is asynchronous by design and runs at the namespace level. A namespace can specify which clusters should receive replicated messages. The replication mechanism is built on a special “replicator” consumer subscription within each broker, which reads from the source cluster and produces to target clusters.

Full-Mesh Replication. Every cluster replicates to every other cluster. A producer in NA writes to the NA cluster; the replicator immediately subscribes to that topic and produces the same message to EU, APAC, and BR clusters. Consumers in any region read from the local cluster. Conflicts are impossible because each region is the producer authority for its own input stream—there is no global leader.

Pros: low latency for local reads; natural failover (if NA goes down, EU still has the data).
Cons: network cost (message fans out to all clusters); operational complexity (more replicator instances).

Hub-and-Spoke Replication. A single “hub” cluster (e.g., NA) receives all data. Regional spoke clusters replicate from NA, and local producers send to the spoke. Analytic pipelines read from NA; edge applications read from spokes.

Pros: simpler routing; single source of truth for compliance audits.
Cons: NA becomes a bottleneck; latency for spoke-to-spoke communication.

Active-Standby. One cluster is active; the other is a hot standby. If the active cluster fails, clients switch to standby. Replication is one-way (active → standby).

Pros: simple; used for disaster recovery.
Cons: standby is idle until failover.

For global IIoT, a selective hybrid is common: a full-mesh topology for low-latency local reads, but selective replication of namespaces. Regional telemetry stays in-region; global analytics replicate to a hub.

alt

Reference Architecture: 5-Region Plant Telemetry Backbone

Imagine a manufacturer with plants in North America, Europe, India, China, and Brazil. Each region operates independently for sovereignty but must feed a global analytics platform. Here is the architecture:

Regional Clusters. Each region runs a Pulsar cluster:
– 5 Brokers (HA, horizontal scale).
– 5 Bookies (3 journal disks on NVMe, 2 ledger-only replicas).
– 3 Oxia metadata nodes (Pulsar 3.x).
– Local MQTT bridge (e.g., Mosquitto or cloud-native HiveMQ) for sensor ingestion.
– OPC UA gateways for legacy PLC communication.

Namespace Segmentation for Sovereignty.

  1. telemetry/device-data — sensor readings (voltage, temperature, pressure). Produced locally, replicated to NA hub cluster only for analytics. EU data stays in EU; China data stays in China.
  2. control/local-commands — setpoints, alarm acknowledgments, mode switches. Produced and consumed in-region only. Zero replication.
  3. events/plant-state-changes — equipment down, maintenance started, shift change. Replicated full-mesh for operational awareness across all regions.

Analytics Hub in NA Cluster. The NA cluster ingests all global telemetry/device-data messages (replicated from other regions) and feeds a Lakehouse (Apache Iceberg on S3 via Flink SQL). ML engineers train global models on this unified view.

Regional ML Training Sets. Each region runs a separate Pulsar Function that filters and enriches local telemetry, populates a per-region Iceberg table on local S3, and trains region-specific models (e.g., temperature models tuned to Brazil’s tropical climate vs. Canada’s seasonal swings).

Security. TLS + mTLS between regions; broker ACLs restrict namespace access per region.

alt

Pulsar Functions for Edge Pre-Processing

Raw sensor data is inefficient to replicate. A vibration sensor fires 100 Hz (100 messages/sec). Replicating every sample across 5 regions costs bandwidth and introduces latency. Pulsar Functions, a lightweight stream-processing runtime embedded in the broker, can aggregate and filter before replication.

Pipeline. Sensors → raw topic (in-region). Pulsar Functions:
1. Filter Function: Drop heartbeat messages; keep anomalies.
2. Aggregate Function: Sum vibration samples into 1-second RMS values.
3. Enrich Function: Join sensor ID with asset registry (plant name, equipment type, SLA).

Output topic: plant-summary/site-X — 1/100th the message rate, ready for global replication.

Benefits:
– 10–50x bandwidth reduction across regions.
– Sub-millisecond latency (Functions run co-located with the broker).
– Declarative topology (Python or Go functions packaged as Docker images).

Drawbacks:
– Functions are not ideal for complex stateful logic (use Flink or Spark for that).
– Scaling is tied to the broker; if you need independent Function scaling, run Flink instead.

alt

Trade-offs vs Kafka MirrorMaker, Confluent Cluster Linking, and RedPanda

Kafka + MirrorMaker. The traditional approach: run a separate MirrorMaker cluster in each region, each pulling from a primary Kafka cluster and replicating topics. Works, but:
– MirrorMaker is a separate deployment; more operational surface.
– Partition leader assignment in the source cluster is decoupled from replication targets, making it harder to guarantee low-latency local reads.
– Exactly-once replication requires external coordination (checkpoint management).

Confluent Cluster Linking. A proprietary Kafka extension that treats cluster-to-cluster replication as a first-class feature. Operationally simpler than MirrorMaker; built into the broker. Drawbacks:
– Requires Confluent Cloud; not open-source.
– Pricing scales with replication volume.
– Vendor lock-in.

RedPanda. A high-performance Kafka-compatible broker with impressive single-cluster throughput and latency. As of 2026, RedPanda has not released a native geo-replication primitive comparable to Pulsar’s; most deployments rely on external tools or bespoke replication logic.

Apache Pulsar. Geo-replication is baked into the architecture. Stateless brokers + BookKeeper + namespace-level replication mean:
– Scaling is independent of replication topology.
– Segment-based durability makes sealed ledgers cheap to ship across regions.
– No external MirrorMaker process needed.
– Subscription offsets are per-cluster, simplifying exactly-once semantics.
– Tiered storage integration is native.

Drawback: smaller ecosystem than Kafka (fewer third-party integrations, smaller team at StreamNative).

For greenfield IIoT deployments prioritizing multi-region and regulatory sovereignty, Pulsar wins. For orgs heavily invested in Kafka, Cluster Linking or MirrorMaker may be pragmatic.

Practical Recommendations and Operational Gotchas

Bookie Disk Layout. Separate NVMe and SSD. Journal disk holds in-flight writes; ledger disk is read-optimized. Mixing them onto a single disk tank write throughput.

Minimum Cluster Size. 3+ bookies per region. Write quorum defaults to 2, so 2 bookies can be down without data loss; the 3rd is the quorum. With 5 bookies, you can afford 2 simultaneous failures.

ZooKeeper to Oxia Migration. If upgrading to Pulsar 3.x, Oxia is lower-latency and handles cluster-to-cluster replication metadata more efficiently. Use the provided migration tools; test in staging first.

Replication Backlog Monitoring. Watch the replication subscription’s backlog (lag between the produce rate and the replication rate). If backlog grows, replication is falling behind; check network latency and broker CPU. Tools: Prometheus metrics (pulsar_replication_rate_in, pulsar_replication_rate_out).

Tiered Storage Scheduling. Offload segments to S3 / GCS after 7–30 days (tunable). Cold storage is cheap; bookie disk is expensive. Balance retention window against S3 API costs.

Namespace-Level ACLs and Replication. Double-check your namespace replication config. A common mistake: configuring a topic for replication but forgetting to enable the replication cluster in the namespace’s replicationClusters list. The topic will be created, but messages will not replicate.

Testing Failover. Regularly (quarterly) test failover of one region. Kill a broker and verify consumers reconnect. Fail a bookie and verify automatic ledger recovery.

alt

FAQ

Q: Does Pulsar guarantee exactly-once end-to-end across regions?
A: No, not automatically. Pulsar guarantees exactly-once within a cluster (via deduplication keys and subscription offsets). For end-to-end exactly-once across regions, you must either (1) use idempotent consumers (deduplicate on message ID), or (2) design the sink (e.g., Iceberg, DynamoDB) to handle duplicate inserts idempotently. Pulsar’s replication is efficient, but you still need application-level idempotency.

Q: Can I replicate some topics and not others in the same cluster?
A: Yes. Replication is configured per namespace, not per cluster. In the same cluster, you can have namespace A (replicated full-mesh) and namespace B (local-only). This is how you enforce data sovereignty.

Q: What happens if a bookie in region A goes down during replication?
A: The broker detects the failure and re-replicates the affected ledger to another bookie in the same region (via the Pulsar replica recovery protocol). Data is safe. Replication to other regions continues uninterrupted.

Q: How do I monitor if replication is lagging?
A: Prometheus metrics: pulsar_replication_rate_in and pulsar_replication_rate_out (messages/sec). Compare the two; if replication rate is significantly lower, latency is present. Also monitor the replicator subscription’s backlog (in the Pulsar admin API: curl http://broker:8080/admin/v2/namespaces/public/default and check subscription details).

Q: Should I use MirrorMaker or Pulsar replication?
A: If you are already on Kafka, MirrorMaker is a lower-migration burden. If you are building a new multi-region system and value operational simplicity and native geo-replication, Pulsar is worth the learning curve. Kafka’s larger ecosystem is a real advantage if you rely on many integrations.

Further Reading


About the Author

Riju is an infrastructure architect specializing in real-time data platforms for industrial IoT. Over the past 5 years, he has designed and operated Kafka, Flink, and Pulsar clusters for global manufacturing and logistics networks. He is a maintainer of open-source Pulsar tooling and an advisor to StreamNative on edge-processing patterns.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *