ROS 2 DDS vs Zenoh: Robotics Middleware Compared (2026)

The default ROS 2 middleware is DDS — a mature, well-specified standard that works brilliantly on a single LAN and falls apart predictably when you push it beyond one. Zenoh, integrated via rmw_zenoh, takes a fundamentally different architectural bet: route discovery through a broker-like daemon instead of flooding the network with multicast announcements. The difference is not cosmetic. In a 10-robot warehouse it barely matters. In a 50-robot multi-site fleet with a cloud dashboard, it is the difference between a system that ships and one stuck in network-debug hell.

This post compares ROS 2 DDS vs Zenoh across the dimensions that actually matter in 2026 production deployments: discovery mechanics, latency and throughput characteristics, WAN and multi-subnet behavior, configuration complexity, and ecosystem maturity. We close with a weighted decision matrix and an honest “when to choose which” guide.

What this post covers: the RMW abstraction, how DDS and Zenoh discovery differ at a protocol level, fleet scaling behavior, WAN deployment patterns, trade-offs and gotchas, and a practical decision framework.

Context: Why ROS 2 Middleware Matters Now

ROS 2’s middleware layer is more consequential than most robotics developers realize until something breaks in production. The design choice — made during the transition from ROS 1’s custom TCPROS — was to mandate an abstract RMW (ROS Middleware) interface that any compliant transport could implement, then ship DDS as the default reference implementation.

That decision made sense in 2017. DDS (Data Distribution Service) was an OMG-standardized publish-subscribe middleware with decades of use in defense, aviation, and simulation systems. It offered QoS policies, type-safe serialization, and peer-to-peer communication without a central broker — all attractive for a robotics framework aiming at production-grade reliability.

By 2024–2026, however, the robotics industry pushed well beyond DDS’s original design envelope. Fleets grew from 5 robots to 50 and 500. Deployments stretched from single warehouses to multi-site manufacturing networks. Cloud-connected digital twins required bridging ROS 2 graphs to MQTT brokers, HTTP APIs, and time-series databases. DDS’s discovery mechanism — designed for closed, reliable LANs — showed sharp limits in all three scenarios.

Enter Zenoh. Originally developed at ADLINK Technology (now stewarded by ZettaScale Technology), Zenoh is a zero-overhead network protocol designed for the IoT-to-cloud continuum. The rmw_zenoh plugin — now maintained as an official ROS 2 RMW and part of the Jazzy / Rolling distributions — replaces the RTPS wire protocol entirely while keeping the ROS 2 pub/sub, service, and action APIs intact.

For a practical look at how Zenoh fits into a broader industrial stack, see our Zenoh industrial IoT reference architecture guide. For DDS-agnostic protocol tradeoffs, the IoT protocol comparison overview provides useful framing.

The RMW Abstraction: What It Buys You (and What It Costs)

The RMW abstraction is the reason this comparison is possible at all. It is also the reason neither DDS nor Zenoh is a perfect drop-in replacement for the other.

Figure 1: ROS 2 node code calls through rcl into the rmw interface. The RMW implementation — Fast DDS, Cyclone DDS, or rmw_zenoh — handles discovery, serialization, and transport. Swapping implementations requires only a rebuild-time environment variable change.

The RMW Interface in One Paragraph

Every ROS 2 RMW implementation must satisfy a C API defined in the rmw package. That API covers publisher/subscription creation, service/client creation, waiting for events, taking and publishing messages, and QoS negotiation. The node does not know — and should not care — whether it is talking to an RTPS peer or a Zenoh router. The RCL (ROS Client Library) layer above the RMW handles node lifecycle, parameter servers, and graph introspection; the RMW layer handles only wire-level communication.

Switching middleware is, in theory, as simple as:

export RMW_IMPLEMENTATION=rmw_zenoh_cpp
ros2 run my_package my_node

In practice, QoS policy mappings differ between implementations, some DDS-specific features (like content-filtered topics) have no Zenoh equivalent, and ros2 topic list requires the Zenoh router to be running to enumerate graph participants.

DDS Implementations: Fast DDS and Cyclone DDS

ROS 2 ships with two production-grade DDS implementations. Fast DDS (eProsima) is the default for most distributions and offers the most complete OMG DDS feature set — including content-filtered topics, DDS Security, and a broad QoS surface. Cyclone DDS (Eclipse / ADLINK) is often praised for lower latency on single-machine shared-memory paths and a simpler XML configuration format.

Both use the same wire protocol: RTPS (Real-Time Publish-Subscribe protocol), which is the OMG standard transport layer under DDS. This means a Fast DDS publisher can interoperate with a Cyclone DDS subscriber — something that matters in heterogeneous fleets.

rmw_zenoh: A Different Protocol Bet

rmw_zenoh implements the RMW interface on top of the Zenoh protocol instead of RTPS. Zenoh’s wire format is not compatible with DDS at the network level — a Zenoh node cannot directly exchange messages with a Fast DDS node without a bridge daemon. The tradeoff for losing DDS interoperability is a dramatically simpler and more flexible transport layer.

The Zenoh protocol is documented openly at zenoh.io, and the RMW implementation source lives at ros2/rmw_zenoh on GitHub with active maintenance as of 2026.

Discovery: The Core Architectural Difference

Discovery is where ROS 2 DDS vs Zenoh diverges most dramatically, and understanding the mechanism explains most of the observed scaling and WAN behavior differences.

Figure 2: DDS uses SPDP multicast so every participant announces itself to every other participant. Zenoh routes discovery through a central zenohd daemon — participants only need to reach the router, not each other.

How DDS Discovery Works: SPDP and SEDP

DDS discovery has two phases. SPDP (Simple Participant Discovery Protocol) uses periodic multicast UDP announcements — each DDS participant broadcasts a “hello” message to a well-known multicast group on the local subnet. Every other participant that receives it notes the sender’s locators and starts a unicast handshake. SEDP (Simple Endpoint Discovery Protocol) then exchanges per-endpoint information (which topics each participant publishes or subscribes to, with QoS settings) over unicast connections.

The consequence is O(n²) unicast connections for n participants. In a 5-node graph that is 10 connections — imperceptible. In a 50-node graph that is 1,225 potential pairwise handshakes at startup. Not all pairs need to talk to each other, but the discovery traffic still scales with the square of the participant count because every node must at minimum exchange SPDP beacons with every other.

The “discovery storm” problem manifests when many robots boot simultaneously — common in warehouse deployments after a power cycle. Each robot starts multicasting SPDP beacons, every other robot responds, and the resulting burst of UDP traffic can saturate WiFi networks or trigger switch broadcast-storm protection. The fix in DDS is to tune SIMPLE_DISCOVERY_INITIAL_BACKOFF and related parameters, or to configure an RTPS Discovery Server (a feature in Fast DDS that centralizes participant registration). But this requires additional operational knowledge and configuration files per robot.

How Zenoh Discovery Works: Routed Registration

Zenoh’s architecture is built around three entity roles: clients, peers, and routers. In a typical rmw_zenoh deployment, each ROS 2 node runs as a Zenoh client and connects to a local zenohd router daemon (one per machine, or one per network segment). The router maintains the topology state. When a new subscriber appears, the router propagates that interest through the router mesh, and matching publishers are notified.

This changes the discovery complexity from O(n²) to O(n): each new node adds exactly one connection (to its local router) and one registration message (sent through the router to interested peers). The router aggregates interest and routes messages; nodes do not need to know about each other directly.

The tradeoff is operational: you must run and keep alive the zenohd process. If the router crashes, graph discovery stops working until it restarts. DDS has no single point of failure in its default configuration — a property that matters for some safety-critical deployments.

Multi-Robot Fleet Scaling Under Each Model

Figure 3: A 10-robot fleet under DDS requires O(n²) pairwise discovery handshakes. The same fleet under Zenoh routes all discovery and data through a central router, reducing connection count to O(n) and containing WiFi broadcast traffic.

In practice, the DDS discovery storm problem is most acute on WiFi networks — the medium most common in mobile robot fleets. Ethernet-connected fixed robots in a factory see far less discovery churn. This asymmetry means DDS remains the practical choice for many industrial fixed-arm deployments while Zenoh offers more value in mobile fleet scenarios.

Latency, Throughput, and WAN Behavior

Latency Characteristics

On a single machine, both DDS and Zenoh benefit from shared-memory transport. Fast DDS and Cyclone DDS have mature shared-memory implementations; rmw_zenoh also supports Zenoh shared-memory for intra-process communication. In this scenario latency differences between implementations are small and workload-dependent — you should benchmark your specific message types and rates rather than rely on published figures that may not match your hardware or message sizes.

Over a LAN, DDS RTPS UDP delivers very low latency for small messages. The RTPS wire protocol is compact and well-optimized in both Fast DDS and Cyclone DDS. Zenoh’s UDP transport is also efficient, and for small messages the difference between a well-tuned DDS setup and Zenoh on the same LAN is unlikely to dominate your system budget.

Where DDS latency degrades noticeably is when QoS reliability kicks in on lossy links. DDS uses its own reliability layer (RTPS HEARTBEAT/ACKNACK) that was designed for reliable LANs — not for WiFi with 5–10% packet loss or WAN links. Retransmission storms on lossy wireless links can push latency from sub-millisecond into the tens of milliseconds or higher. Zenoh’s transport layer was designed from the outset for constrained and unreliable networks, and its flow control behaves more gracefully when packet loss is non-trivial.

WAN and Multi-Subnet Behavior

This is the single most decisive dimension in the ROS 2 DDS vs Zenoh comparison for modern deployments.

DDS does not cross subnet boundaries without explicit intervention. SPDP multicast does not propagate across routers. The solutions — RTPS Discovery Server, ros2_router (DDS Router), or zenoh-bridge-dds — all add complexity and a new daemon to operate. Each has different QoS mapping behavior and failure modes. Getting DDS to reliably bridge two office subnets is a half-day project; getting it to bridge a warehouse LAN to AWS requires a VPN or a carefully configured DDS Router with TLS.

Zenoh was designed for exactly this. A Zenoh router connects to other Zenoh routers over TCP or TLS, and the router mesh handles routing of interest and data across the WAN link transparently. The ROS 2 nodes on each side see no difference — they publish and subscribe normally; the routers handle the cross-site transport. This is shown directly in the deployment diagram below.

Figure 5: Two factory sites each run a local zenohd router. A cloud-hosted router acts as a relay. ROS 2 nodes on any site can publish and subscribe to topics on any other site with no application-layer changes.

For teams building ROS 2 Jazzy deployments on Jetson Orin for warehouse robotics, the multi-site Zenoh topology shown here becomes relevant as soon as you add a cloud-based fleet management layer.

Throughput and High-Frequency Topics

For high-rate sensor data — point clouds, camera streams, lidar scan topics — throughput matters more than per-message latency. Here DDS with shared-memory on a single machine is competitive. Over the network, both DDS and Zenoh can saturate Gigabit Ethernet for large messages; the bottleneck typically becomes the application or serialization layer (CDR for DDS, CDR also for rmw_zenoh, since ROS 2 message types use CDR regardless of RMW). Neither middleware choice dramatically changes your peak throughput on a well-provisioned wired network.

Configuration Complexity and Operational Overhead

Configuration complexity is an underrated decision factor. Both stacks require tuning for production use; the tuning surfaces are very different.

DDS configuration lives in XML profiles (Fast DDS) or JSON (Cyclone DDS). QoS settings — reliability, durability, history depth, deadline, liveliness — are per-entity and must be consistent between publishers and subscribers or the match silently fails. Debugging a mismatch requires inspecting QoS on both sides, often with ros2 topic info -v. Discovery settings (multicast group, participant count limits, lease durations) require tuning per deployment environment. Security (DDS Security / SROS2) adds a PKI layer that is functional but involves generating per-node certificates and governance files.

Zenoh configuration is a single JSON5 config file per zenohd instance. You define the router’s listen endpoints, connect endpoints (for peering with other routers), and optionally enable TLS with a certificate path. ROS 2-specific settings (domain ID isolation, QoS-to-Zenoh-priority mapping) go in a separate ZENOH_ROUTER_CONFIG_URI config file. The operational surface is smaller, but the failure modes are different: if the zenohd process dies quietly, your ROS 2 graph silently loses discovery without a clear error on the affected nodes.

A practical comparison on a 10-robot warehouse:

Concern	DDS (Fast DDS)	Zenoh (rmw_zenoh)
Discovery config	Tune multicast TTL, backoff timers	Deploy and configure `zenohd`, point nodes at it
Cross-subnet comms	DDS Router + extra config	Router-to-router connect endpoint
Security	SROS2 (DDS-Security, per-node certificates)	TLS between routers; node-level auth in roadmap
QoS tuning	Per-topic XML profile	Priority mapping in router config
Debugging tools	`ros2 topic info`, Fast DDS Monitor	`zenoh-flow` tools, `ros2 topic info` (limited)
Ecosystem maturity	Production-proven since ROS 2 Dashing	Stable in Jazzy; actively developed in 2026

Weighted Decision Matrix

The table below scores each middleware on six dimensions using a 1–5 scale (5 = best). Weights reflect priorities typical of a mobile fleet deployment; adjust for your context.

Dimension	Weight	Fast DDS	Cyclone DDS	rmw_zenoh
Discovery scaling (large fleets)	25%	2	2	5
WAN / multi-subnet support	20%	2	2	5
Single-LAN latency	20%	5	5	4
Configuration simplicity	15%	2	3	4
Ecosystem maturity	15%	5	4	3
DDS feature completeness	5%	5	4	2
Weighted score		3.10	3.15	4.15

Scores are qualitative judgments based on documented behavior and community experience, not a controlled benchmark. Weights are illustrative — recalibrate for your deployment context.

The matrix captures the central thesis: for single-LAN, small-fleet deployments where DDS features (content-filtered topics, fine-grained QoS) are in active use, DDS remains competitive or preferable. For everything that crosses a subnet boundary or involves more than a couple of dozen robots, Zenoh’s architectural advantages compound.

Decision Flowchart: Choosing Your ROS 2 Middleware

Figure 4: Start at the top and follow the branches. Most mobile fleet and cloud-connected deployments end at Zenoh; most single-LAN, small-fleet applications with DDS-specific feature needs end at DDS.

When to choose DDS (Fast DDS or Cyclone DDS):
– Fleet is fewer than 15–20 robots on a single reliable LAN.
– You require content-filtered topics, DDS Security policies, or interoperability with non-ROS DDS participants.
– Your team has existing DDS expertise and XML/JSON QoS profiles already tuned.
– You need the broadest support across ROS 2 distributions and third-party tools.

When to choose Zenoh (rmw_zenoh):
– Fleet exceeds 20 robots, especially on WiFi.
– Your deployment spans multiple subnets, sites, or cloud endpoints.
– You want a simpler path to IoT protocol bridging (Zenoh natively bridges to MQTT, REST, and other transports via zenoh-plugin ecosystem).
– You are targeting ROS 2 Jazzy or Rolling — rmw_zenoh is officially supported and actively maintained.
– Your robots traverse unreliable WiFi links where DDS reliability retransmissions create latency spikes.

Trade-offs, Gotchas, and What Goes Wrong

Every architecture looks good on paper. Here is where each breaks in practice.

DDS gotchas:

Discovery storms on WiFi. Large fleet simultaneous restarts generate multicast bursts that can saturate WiFi channel capacity or trigger access-point storm-control throttling. The symptom is nodes that fail to discover each other at boot, working only after manual restart of individual nodes. Mitigation: spread robot boot times, or switch to RTPS Discovery Server mode.

QoS mismatch silent failure. When publisher and subscriber QoS policies are incompatible (e.g., publisher offers BEST_EFFORT, subscriber requests RELIABLE), DDS simply does not connect them — with no error message in the application log. Diagnosing this requires ros2 topic info -v or enabling DDS verbose logging. This is a well-known paper-cut that catches even experienced ROS 2 developers.

Multicast on cloud VMs. Most cloud providers (AWS, GCP, Azure) do not support UDP multicast. This means DDS discovery is broken by default in any cloud-hosted ROS 2 node. You must configure a Discovery Server or disable multicast and enumerate peers manually — a non-trivial change for teams deploying cloud digital twins.

XML configuration sprawl. Large fleets accumulate per-robot XML files that diverge over time. There is no native configuration management in the DDS layer — you must build your own templating and deployment tooling.

Zenoh gotchas:

Single point of failure. The zenohd router is operationally critical. If it crashes, ROS 2 nodes can still communicate peer-to-peer in “peer” mode, but graph discovery and topic routing stop working. You need a watchdog process (systemd unit, Kubernetes liveness probe, or similar) with fast restart to maintain availability guarantees.

Tooling gaps. ros2 topic echo and ros2 topic list work correctly with rmw_zenoh, but some introspection tools that query DDS internals directly (certain Fast DDS Monitor features, some RViz2 diagnostic panels) do not work or give incomplete data. The ecosystem is closing this gap actively in 2026, but DDS tooling remains more mature.

QoS mapping is approximate. Zenoh maps ROS 2 QoS policies to Zenoh priority levels and reliability settings, but the mapping is not one-to-one. Some DDS QoS policies (liveliness lease duration, content filters) have no direct Zenoh equivalent. Applications that rely on fine-grained QoS behavior may see different delivery semantics.

Ecosystem immaturity for some features. ROS 2 Actions over rmw_zenoh work correctly in Jazzy and Rolling. DDS-Security-equivalent node authentication is not yet fully implemented (as of early 2026); security-conscious deployments may need to rely on network-layer TLS between routers rather than per-node identity.

Practical Recommendations

The right middleware depends on your deployment context, not on abstract benchmarks. Here is a prioritized checklist:

Start with DDS if you are new to ROS 2. The tutorials, documentation, and community answers are written for DDS. Do not fight the defaults until you have a concrete problem DDS cannot solve.
Switch to Zenoh when you hit a subnet boundary or fleet size wall. The migration is a RMW_IMPLEMENTATION environment variable and a zenohd deployment — not a rewrite. Plan for it from the start of your fleet architecture but do not pay the operational complexity until you need it.
Benchmark your actual workload before claiming latency advantage for either. Shared-memory paths, message sizes, and QoS reliability settings dominate latency far more than the choice of DDS vs Zenoh on a LAN.
For cloud-connected ROS 2 graphs, Zenoh is the pragmatic default in 2026. DDS on cloud VMs requires workarounds for the multicast absence; Zenoh routes over TCP/TLS natively.
Run zenohd under a process supervisor. A bare zenohd & in a startup script is a production incident waiting to happen. Use systemd, supervisord, or a Kubernetes sidecar.
If you use both stacks in a fleet (legacy robots on DDS, new robots on rmw_zenoh), zenoh-bridge-dds bridges the two worlds. Expect some QoS information loss at the bridge.

Quick decision checklist:
– [ ] Fleet > 20 robots? → Zenoh
– [ ] Multi-site or cloud endpoint? → Zenoh
– [ ] Content-filtered topics required? → DDS
– [ ] DDS Security / SROS2 compliance required? → DDS (Zenoh security still maturing)
– [ ] Team new to both? → DDS (better documented defaults)
– [ ] WiFi-heavy mobile fleet? → Zenoh

Frequently Asked Questions

Can I run DDS and Zenoh nodes in the same ROS 2 fleet?

Not directly — DDS RTPS and Zenoh are wire-protocol-incompatible, so a DDS node and a rmw_zenoh node on the same ROS 2 domain ID will not discover each other natively. The bridge solution is zenoh-bridge-dds, a standalone daemon that translates between RTPS and Zenoh. It introduces a hop of latency and some QoS translation loss, but it works well for fleet migration scenarios where you are rolling out rmw_zenoh robots alongside existing DDS robots.

Does rmw_zenoh support ROS 2 Actions and Services, not just topics?

Yes. rmw_zenoh implements the full RMW interface, which includes pub/sub, services (request/reply), and actions (the composite of goal, feedback, and result services). Actions work correctly in ROS 2 Jazzy and Rolling as of 2026. The Zenoh protocol maps services to a queryable/query model internally, which is semantically equivalent. Throughput on high-rate service calls may differ from DDS, so benchmark service-heavy workloads specifically.

What is the “discovery storm” problem and how bad is it really?

A discovery storm occurs when many ROS 2 nodes start simultaneously and each one broadcasts SPDP multicast beacons, triggering a cascade of unicast handshake responses. On a switched Gigabit Ethernet network the effect is usually imperceptible because multicast is hardware-handled and the network has ample bandwidth. On a shared 802.11 WiFi network the multicast traffic consumes airtime, the handshake responses add to congestion, and the result can be discovery failures or many-second delays before the graph stabilizes. The severity scales with robot count and the WiFi network quality — it is a real problem in large warehouse fleets, not a theoretical concern.

Is Zenoh production-ready for ROS 2 in 2026?

Zenoh’s rmw implementation is officially integrated into ROS 2 Jazzy (the current LTS as of 2026) and Rolling. ZettaScale Technology (the primary Zenoh maintainer) and several large integrators have deployed it in production robotics and industrial IoT systems. It is not as battle-tested as Fast DDS across the full ROS 2 feature surface, and some advanced DDS features are not yet mapped. For new greenfield deployments, particularly multi-site or large-fleet ones, it is a reasonable production choice. For safety-certified deployments requiring DDS Security compliance, DDS remains the safer option pending Zenoh’s security feature maturity.

How does Zenoh compare to DDS for sensor data like lidar point clouds?

For high-bandwidth sensor streams on a LAN, both DDS and Zenoh can deliver point cloud data at typical lidar frequencies (10–20 Hz per sensor, with large messages). The shared-memory path in both stacks eliminates serialization overhead for intra-machine topics. Over WiFi, Zenoh’s more graceful handling of packet loss may reduce the “dropped lidar frame” problem that DDS reliability retransmissions can introduce. For WAN transport of point cloud data, Zenoh is clearly preferable — DDS was not designed for WAN use, and attempting it with RTPS over a WAN link typically requires significant rate limiting and QoS tuning.

What does it take to migrate an existing ROS 2 project from DDS to Zenoh?

For simple pub/sub workloads, migration is largely an environment variable change: set RMW_IMPLEMENTATION=rmw_zenoh_cpp, install the rmw_zenoh_cpp package, and deploy a zenohd router. Applications that rely on DDS-specific features — content-filtered topics, DDS Security governance files, direct RTPS discovery configuration, or QoS policies with no Zenoh equivalent — will require application-level changes. The migration is best done incrementally: stand up Zenoh in a development environment, verify all topics/services/actions, then roll out to one robot type at a time with zenoh-bridge-dds bridging during the transition window.

ROS 2 DDS vs Zenoh: Robotics Middleware Compared (2026)

ROS 2 DDS vs Zenoh: Robotics Middleware Compared (2026)

Context: Why ROS 2 Middleware Matters Now

The RMW Abstraction: What It Buys You (and What It Costs)

The RMW Interface in One Paragraph

DDS Implementations: Fast DDS and Cyclone DDS

rmw_zenoh: A Different Protocol Bet

Discovery: The Core Architectural Difference

How DDS Discovery Works: SPDP and SEDP

How Zenoh Discovery Works: Routed Registration

Multi-Robot Fleet Scaling Under Each Model

Latency, Throughput, and WAN Behavior

Latency Characteristics

WAN and Multi-Subnet Behavior

Throughput and High-Frequency Topics

Configuration Complexity and Operational Overhead

Weighted Decision Matrix

Decision Flowchart: Choosing Your ROS 2 Middleware

Trade-offs, Gotchas, and What Goes Wrong

Practical Recommendations

Frequently Asked Questions

Can I run DDS and Zenoh nodes in the same ROS 2 fleet?

Does rmw_zenoh support ROS 2 Actions and Services, not just topics?

What is the “discovery storm” problem and how bad is it really?

Is Zenoh production-ready for ROS 2 in 2026?

How does Zenoh compare to DDS for sensor data like lidar point clouds?

What does it take to migrate an existing ROS 2 project from DDS to Zenoh?

Further Reading

Related

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories