Zenoh vs MQTT vs DDS for Robotics Middleware (2026)

If you are building or scaling a robot fleet this year, the Zenoh vs MQTT vs DDS question is no longer academic. It decides whether your robots discover each other reliably, whether your control loops stay deterministic, and whether your telemetry survives a flaky cellular link to the cloud. Each of these three middlewares solves a different slice of the robotics communication problem, and picking the wrong one shows up months later as silent discovery storms, dropped commands, or a brittle WAN bridge that nobody wants to touch.

This guide is written for engineers making a real decision in 2026. It walks through how the three stacks actually move data, gives you a side-by-side decision matrix across transport, discovery, QoS, latency, and operational cost, and ends with a decision tree and a deployment topology you can copy. No vendor spin, just the trade-offs as they look from the trenches today.

Background: the robotics middleware problem

A modern robot is a distributed system on wheels. Perception, planning, control, and safety all run as separate processes that must exchange data with microsecond-to-millisecond timing, often across multiple compute boards on the same chassis, and then again across a network to a fleet manager or a digital twin. The middleware is the layer that handles all of that messaging so application code does not have to.

Consider a single autonomous mobile robot moving through a warehouse. Its lidar publishes point clouds at tens of hertz, its cameras stream frames, a localization node fuses them into a pose estimate, a planner turns that pose into a velocity command, and a safety monitor watches all of it to slam the brakes if anything looks wrong. Every one of those arrows is a middleware message with its own timing, reliability, and ordering requirements. Now multiply that robot by fifty, scatter them across a facility on patchy Wi-Fi and cellular, and add a fleet manager and a cloud twin that need to see all of them at once. The middleware is no longer a detail — it is the substrate that determines whether the whole system is reliable or quietly broken.

For ROS 2 — the dominant robotics framework — this layer is abstracted behind the RMW (ROS Middleware) interface, which lets you swap the underlying transport without rewriting nodes. Historically that transport has been DDS. The ROS 2 documentation lists Fast DDS as the default and Cyclone DDS as a long-standing alternative, both built on the RTPS wire protocol. DDS gives ROS 2 its rich Quality of Service model and peer-to-peer discovery, but that same design has well-documented pain on real networks.

That pain is exactly why Eclipse Zenoh entered the picture. As the rmw_zenoh project and ROS 2 documentation describe, Zenoh is now a qualified Tier-1 RMW shipped with recent ROS 2 distributions, and it is the first easy-to-use RMW that does not rely on DDS. Meanwhile MQTT comes from the IoT world entirely — a broker-based publish/subscribe protocol that has quietly become the default for fleet telemetry and remote monitoring, even when DDS or Zenoh runs onboard. Understanding where each one fits is the whole game.

It helps to remember that these three were born to solve different problems. DDS came out of the defense and aerospace world, where the requirement was deterministic, low-jitter data sharing among many processes on a shared bus — so it optimized for rich data-centric semantics and tight timing on a controlled network. MQTT came out of constrained telemetry, originally for monitoring oil pipelines over satellite links, so it optimized for a tiny footprint, tolerance of intermittent connectivity, and a simple broker that any device could reach. Zenoh is the newest of the three and was designed from the start for the messy middle ground that robotics now lives in: data that has to flow from a microcontroller to the edge to the cloud, across links that are sometimes fast and trusted and sometimes lossy and routed. When you weigh Zenoh vs MQTT vs DDS, you are really weighing three different opinions about what the network looks like. The rest of this article makes those opinions concrete.

The core comparison: Zenoh vs MQTT vs DDS

Let us start with the architecture, because almost every downstream difference flows from it. The three stacks make fundamentally different bets about where the intelligence lives.

DDS pushes everything to the endpoints: every participant is a peer that discovers and talks to every other peer directly. Zenoh keeps brokerless peer paths but adds an optional router that handles discovery and bridges across network segments. MQTT centralizes everything through a broker that every client connects to. Hold that picture in mind as we go through the matrix.

The consequences of those three pictures cascade. A fully peer-to-peer system like DDS has no single point of failure and no extra hop in the data path, which is wonderful for latency and resilience on a clean LAN — but it also means every node must find every other node, and that discovery problem grows with the square of the participant count. A broker-centric system like MQTT has the opposite trade: discovery is trivial because there is nothing to discover, but every message pays for a round trip through the broker and the broker itself becomes the thing you must scale and protect. Zenoh’s bet is that you can have most of the peer-to-peer data path while delegating only the hard parts — discovery and cross-segment routing — to a lightweight router that sits out of the fast path once connections are established. Whether that bet pays off for you depends entirely on your network, which is why the matrix below is organized around network-facing properties.

The decision matrix

Here is the head-to-head across the dimensions that actually drive a middleware choice for robotics in 2026. Latency figures are presented as qualitative ranges drawn from published comparisons rather than absolute guarantees, because real numbers depend heavily on payload size, network, and tuning.

Dimension	DDS (Fast/Cyclone)	Zenoh	MQTT 5
Transport	RTPS over UDP, multicast-first	Zenoh protocol over TCP/UDP, QUIC-friendly	TCP (TLS), WebSockets
Topology	Brokerless peer-to-peer	Brokerless peers + optional router	Central broker (hub-and-spoke)
Discovery	Distributed RTPS, multicast by default	Router-assisted gossip scouting; multicast off by default	None needed — clients connect to broker
QoS model	Richest: reliability, durability, deadline, liveliness, history	Reliability + durability; few incompatible settings	Three levels: QoS 0, 1, 2
Latency profile	Lowest, near-deterministic on a clean LAN	Comparable to DDS, lower discovery overhead	Higher; broker adds a hop
Edge / WAN fit	Weak — multicast and chatty discovery break over WAN	Strong — designed for lossy, constrained, routed links	Strong — built for unreliable WAN telemetry
ROS 2 support	Default RMW, deeply integrated	Tier-1 RMW (`rmw_zenoh`), shipped with recent distros	Not an RMW; used via bridges
Operational cost	Low on LAN, high to tame over WAN	Low–moderate; one router to run	Low–moderate; broker to run and scale

A few rows deserve unpacking, but first a word on how to read it: no single row crowns a winner. The matrix is a set of trade dials, and the right choice is whichever middleware’s strengths line up with the constraints you cannot change — your network, your timing budget, and your team’s operational appetite. A skunkworks lab robot on a wired bench and a hundred-unit cellular fleet can reach opposite conclusions from the very same table, and both can be right.

Discovery: where DDS hurts and Zenoh helps

DDS uses distributed RTPS discovery that, by default, leans on multicast UDP. On a controlled lab LAN this is elegant — plug in a node and it finds the graph automatically. In the field it is the single most common source of grief. As the ROS 2 community and middleware vendors note, many real networks block or rate-limit multicast, segment subnets, or carry too many participants for the all-to-all discovery traffic to scale. The result is silent discovery failures and, in large graphs, “discovery storms” that saturate the network before any payload moves.

Zenoh attacks this directly. With the default rmw_zenoh configuration, nodes connect to a Zenoh router that handles discovery and forwards that information to peers via gossip scouting, with multicast disabled by default. Published studies cited in the arXiv comparison of DDS, MQTT, and Zenoh report that Zenoh dramatically reduces discovery overhead versus DDS while delivering latency in the same class. MQTT sidesteps discovery entirely: there is no peer graph to discover, only a broker address every client already knows.

Why does this matter so much in practice? Because discovery is where robotics middleware tends to fail invisibly rather than loudly. A dropped data packet shows up as a missed message you can measure and alarm on. A failed discovery shows up as two nodes that simply never see each other — no error, no log, just a topic that is permanently empty. On a single bench that is easy to catch; across a fleet spread over a warehouse, a campus, or a cellular network it can hide for weeks. DDS deployments routinely work around this by abandoning multicast in favor of a static discovery server or a hand-maintained peer list, which restores reliability at the cost of configuration you must keep in sync. Zenoh’s router-assisted model is essentially that workaround promoted to a first-class, default design, which is a large part of why teams operating over real networks gravitate to it. The trade is that you now depend on the router being up — a topic we return to in the gotchas.

Latency and QoS: matching the model to the loop

For a hard control loop on a trusted LAN, DDS is still the benchmark. Its near-deterministic, low-latency delivery and its deep QoS vocabulary — deadline, liveliness, history depth, durability — let you express timing contracts that perception and safety stacks rely on. A deadline policy lets a subscriber declare that it must receive an update at least every N milliseconds and be notified the instant that promise is broken; a liveliness policy lets a safety node detect a silent publisher before a stale command does damage. These are not conveniences, they are the primitives a functional-safety argument is built on, and DDS expresses them natively.

Zenoh covers the QoS basics — reliability and durability — with very few incompatible-setting traps, and its measured latency sits close to DDS while using fewer resources. The trade-off is that its QoS vocabulary is intentionally smaller, so if your design leans on deadline or liveliness semantics today, you must check that the equivalent behavior is available or that you can implement it in the application layer. MQTT’s three QoS levels are coarser still: QoS 0 is fire-and-forget, QoS 1 guarantees at-least-once with possible duplicates, and QoS 2 guarantees exactly-once at the cost of a four-way handshake. That is a fine model for telemetry where the occasional duplicate or dropped reading is harmless, but it is not where you put a 1 kHz servo loop, and the broker hop alone disqualifies it from the tightest timing budgets. The honest summary: DDS for the tightest loops and the strongest QoS contracts, Zenoh for the routed and large-scale cases, MQTT for the cloud edge.

Transport and wire efficiency

Underneath the topology differences sit different wire protocols, and they shape behavior more than most teams expect. DDS rides RTPS, typically over UDP, which keeps the data path lean and lets the middleware implement its own reliability and fragmentation on top — excellent on a LAN, but it is also why DDS leans on multicast and why it can be unfriendly to firewalls and NAT that expect well-behaved TCP. Zenoh speaks its own compact protocol that runs over TCP and UDP and works comfortably over QUIC, which means it can multiplex many logical streams over a single connection and traverse routed, NAT’d networks without the contortions DDS requires. That flexibility is a quiet but real advantage when your robots live behind cellular routers or cloud VPC boundaries. MQTT runs over plain TCP, optionally with TLS or WebSockets, which makes it trivially routable and firewall-friendly — part of why it spread so far in IoT — at the cost of TCP’s head-of-line blocking when a link degrades.

Payload efficiency matters too. Zenoh was explicitly designed for minimal wire overhead so it can serve constrained devices and bandwidth-limited links, which is one reason it performs well where DDS’s chattier protocol struggles. DDS carries richer per-message metadata to support its QoS machinery, which is a fair trade on a fast LAN but pure overhead on a thin WAN pipe. MQTT’s per-message framing is famously tiny, which is exactly what you want when a robot is dribbling telemetry over an expensive cellular plan. None of this decides the matter alone, but it explains why the same three contenders keep landing in the same three roles.

Edge and WAN fit: the dimension that breaks ties

In a controlled lab, all three can be made to work and the choice feels like splitting hairs. The moment your data leaves a single trusted LAN, the field separates sharply. DDS is the weakest here: its multicast-first discovery and chatty peer traffic assume a network it rarely gets in the field, and taming it for a WAN means layering on discovery servers, transport plugins, and static configuration that quickly becomes its own maintenance burden. MQTT is strong over the WAN by birthright — it was designed for exactly the intermittent, high-latency satellite and cellular links that early telemetry ran on, with features like persistent sessions and last-will messages that assume the connection will drop. Zenoh is purpose-built for the routed, lossy, multi-segment world and behaves well over QUIC and TCP across links that would make raw DDS stumble, while still preserving a real-time-capable data path. For robotics programs that span more than one network segment — which in 2026 is most of them — this row is frequently the one that decides the whole question.

When to choose which

The matrix tells you the properties; this tree tells you the decision. Most teams do not pick one middleware for everything — they pick one for the onboard real-time domain and another for the fleet-to-cloud domain, then bridge.

Walk it top-down:

Not a ROS 2 system, telemetry only? Reach for MQTT. If you are shipping sensor readings, GPS, battery state, and health metrics to a dashboard, the broker model is the path of least resistance, and the broader IoT tooling ecosystem is enormous.
A ROS 2 system that crosses a WAN or rides lossy links? Choose Zenoh. Multi-robot mesh deployments, cellular-connected AMRs, and anything routed across subnets are exactly what Zenoh’s router-assisted, multicast-free design was built for.
Hard real-time on a single trusted LAN? Stay with DDS. If everything lives on one wired segment and you need the richest QoS and lowest jitter, the default ROS 2 stack is hard to beat.
Large fleet or very high node count? Lean Zenoh even on a LAN, because DDS discovery cost grows quickly with participant count, while a Zenoh router keeps that overhead bounded.

A worked example makes the tree concrete. Picture a fleet of warehouse AMRs. Onboard, each robot’s perception-control loop is hard real-time and lives entirely on the robot’s internal network — that domain wants DDS or a Zenoh peer for its determinism. Between robots and the site server, the link is ordinary facility Wi-Fi that drops and roams as robots move between access points — that domain wants Zenoh, whose router-assisted, multicast-free model tolerates the churn. From the site to the cloud dashboard and digital twin, the data is aggregated telemetry over the public internet — that domain wants MQTT. One fleet, three answers, each correct for its segment. The mistake is trying to force a single middleware to serve all three, which is how you end up either with DDS discovery storms over Wi-Fi or with an MQTT broker wedged into a control loop where it does not belong.

For an applied example of the onboard side of this decision, our walkthrough of Nav2 warehouse navigation for autonomous mobile robots shows where deterministic local messaging actually matters in a perception-to-planning pipeline.

Trade-offs and gotchas

No middleware is free. Here is the honest list of where each one will bite you.

DDS gotchas. The strength — fully distributed discovery — is also the liability. Multicast assumptions break on enterprise, cloud, and cellular networks. Large ROS 2 graphs can generate discovery traffic that scales superlinearly with node count, and tuning DDS to behave over a WAN (static peer lists, discovery servers, transport overrides) is doable but turns into real operational work. Fast DDS and Cyclone DDS also differ in defaults and behavior, so “DDS” is not one thing — your XML profiles matter.

Zenoh gotchas. The default rmw_zenoh topology requires a Zenoh router to be running for discovery, which is a new operational component to deploy, monitor, and make highly available. Zenoh is younger than DDS in the robotics context, so the body of field-hardened deployment patterns, third-party tooling, and Stack Overflow answers is still growing. Its QoS model is intentionally simpler than DDS, so if you depend on advanced policies like deadline or liveliness, verify they map to what you need before committing.

MQTT gotchas. The broker is a single logical point that every message traverses — great for fan-out telemetry, wrong for low-latency peer-to-peer control. MQTT has no native concept of the typed, discoverable topic graph that ROS 2 expects, so using it inside a robot means bridging, and bridges add latency, translation logic, and a failure surface. QoS 2 buys exactly-once delivery but, as fleet operators have measured, applying it to everything can multiply network overhead and cut throughput — match the QoS level to the data, not the other way round.

A cross-cutting trap: mixing them carelessly. A DDS-to-MQTT or Zenoh-to-MQTT bridge is the standard pattern, but every bridge is a place where QoS semantics, message rates, and back-pressure must be reconciled. A topic published at 200 Hz on a DDS side does not belong on an MQTT topic at the same rate unless the cloud truly needs it; the bridge is where you down-sample, batch, and decide which QoS level each forwarded stream deserves. Treat the bridge as a first-class component with its own monitoring, not as glue.

One more gotcha that catches teams late: version and profile drift. “DDS” is not one implementation but several, and Fast DDS and Cyclone DDS ship different defaults, so a system that behaves perfectly with one can misbehave with the other even though both claim DDS compliance. Zenoh, being younger, moves faster between releases, so a configuration validated on one rmw_zenoh version may need revisiting after an upgrade. And MQTT brokers vary in how they implement MQTT 5 features like shared subscriptions and message expiry. The practical defense is the same in all three cases: pin your versions, document the exact combination of ROS 2 distro, RMW, and broker or DDS vendor you validated against, and re-test deliberately when any of them changes rather than assuming compatibility.

Practical recommendations and checklist

For most 2026 robotics programs, the pragmatic architecture is layered: a real-time middleware onboard and at the site edge, MQTT for the long-haul telemetry to the cloud, and a single well-instrumented bridge between them.

In that topology, robots run DDS or Zenoh peers locally for control and perception, a site-level Zenoh router handles cross-link discovery and store-and-forward over the WAN, and an MQTT broker in the cloud feeds the fleet dashboard and the digital twin. Each layer plays to its strengths and nothing is asked to do a job it was not designed for. The onboard domain keeps its deterministic, low-jitter messaging because it never has to traverse the WAN; the site router absorbs the lossy long-haul link and shields the robots from its instability; and the cloud broker handles fan-out to as many dashboards, analytics jobs, and twin instances as you care to connect, none of which has any business reaching into a robot’s real-time control path.

A reasonable variant for ROS 2-native fleets is to run Zenoh end to end — rmw_zenoh onboard and Zenoh routers at the edge — which collapses one translation boundary and keeps a single QoS model from robot to cloud, bridging to MQTT only at the final cloud hop where the IoT tooling ecosystem lives. If your robots are not ROS 2 at all and you are purely streaming telemetry, you can skip the real-time layer entirely and let MQTT carry everything. The point of the layered model is not that every program needs all three middlewares — it is that you should consciously match each communication domain to the middleware designed for it, rather than stretching one choice across jobs it was never meant to do.

Use this checklist before you commit:

Map your network honestly. Single trusted LAN, or routed/cellular/multi-site? This one answer eliminates options fast.
Inventory your QoS needs. List the policies you truly depend on (deadline, liveliness, durability) and confirm the candidate supports them.
Count your nodes. Estimate participant count at scale and pressure-test discovery cost, not just steady-state throughput.
Decide your real-time boundary. Draw the line between hard real-time control and best-effort telemetry — they probably want different middleware.
Plan the bridge as a component. If you mix protocols, give the bridge an owner, monitoring, and back-pressure handling.
Benchmark on your own hardware and network. Treat all published latency numbers as directional; measure your payloads, your links. See our IoT protocol latency benchmarks for MQTT, CoAP, AMQP, and HTTP/3 for how to structure a fair test.
Pin your versions. ROS 2 distro, RMW version, and DDS/Zenoh release all interact — lock them and document the combination.

Operational cost and the migration path

Choosing a middleware is also choosing what you will operate at 2 a.m. when something breaks, so it is worth being concrete about the total cost of ownership beyond raw performance.

DDS is cheapest to operate when your world is a single trusted LAN: there is no broker and no router to run, robots simply find each other, and the only ongoing cost is keeping XML QoS profiles consistent. Its cost curve bends upward the moment the network gets complicated — every WAN segment, every multicast-hostile switch, and every jump in node count adds tuning work, and that work tends to live in the heads of a few specialists. Budget for that expertise if DDS is your long-haul backbone.

Zenoh introduces one new operational object: the router. That is a real cost — you must deploy it, monitor it, and ideally run it redundantly so a single router restart does not blind your fleet’s discovery. In exchange you get a system that behaves predictably across messy networks without per-deployment multicast archaeology, which for distributed fleets is usually a net reduction in operational pain rather than an increase. The smaller ecosystem is the more honest cost: fewer years of accumulated field wisdom, fewer ready-made answers, so expect to do more first-principles debugging.

MQTT is operationally familiar to anyone who has run IoT infrastructure: the broker is the system, and scaling, securing, and making it highly available are well-trodden problems with mature managed offerings. The cost is that the broker is also the bottleneck and the blast radius — every client depends on it, so its capacity planning and failover design are not optional.

On migration: the good news for ROS 2 teams is that the RMW abstraction makes switching between DDS and Zenoh largely a matter of changing an environment variable and standing up the new infrastructure, not rewriting nodes. That makes a staged migration realistic. A common path is to keep DDS on existing LAN-bound robots, pilot Zenoh on the next cohort that has to cross a WAN, and run both in parallel during the transition. Because both expose the same ROS 2 topic graph to application code, the nodes neither know nor care which transport carries their data — which is exactly the portability the RMW layer was designed to give you. The one thing not to do is assume your carefully tuned DDS QoS profiles transfer unchanged; re-validate the QoS-sensitive paths on the new transport before you trust them in production.

FAQ

Is Zenoh replacing DDS as the default ROS 2 middleware?
Not yet wholesale. DDS (Fast DDS) remains the historical default, but Zenoh has reached Tier-1 RMW status via rmw_zenoh and ships with recent ROS 2 distributions. Many teams now choose Zenoh deliberately for routed, multi-robot, or WAN-heavy deployments where DDS discovery struggles, while keeping DDS for tightly-coupled LAN systems.

Can I use MQTT directly inside ROS 2?
Not as a drop-in RMW. MQTT is a broker-based IoT protocol, not a ROS middleware, so it lacks ROS 2’s typed, discoverable topic graph. Teams use it via a bridge — running DDS or Zenoh onboard and forwarding selected topics to an MQTT broker for cloud telemetry and fleet monitoring.

Which has the lowest latency for robot control?
On a clean, trusted LAN, DDS generally delivers the lowest and most deterministic latency, which is why it remains the benchmark for hard control loops. Zenoh measures in the same class with markedly lower discovery overhead. MQTT adds a broker hop and is better suited to telemetry than to tight control loops.

Why does DDS discovery fail on some networks?
DDS discovery defaults to multicast UDP, which many enterprise, cloud, and cellular networks block, rate-limit, or segment. It also scales poorly as participant count grows. Zenoh avoids this by using router-assisted gossip scouting with multicast disabled by default.

Do I have to run a Zenoh router?
With the default rmw_zenoh configuration, yes — nodes rely on a Zenoh router for discovery and cross-segment forwarding. It is a lightweight component, but it is an operational element you must deploy, monitor, and make resilient, which is a real difference from DDS’s router-free model.

Can I run more than one of these in the same robot fleet?
Yes, and most production fleets do. The common pattern is a real-time middleware (DDS or Zenoh) for onboard and edge messaging plus MQTT for cloud telemetry, joined by a bridge. The key discipline is treating each protocol boundary as a deliberate component — deciding what crosses it, at what rate, and with what QoS — rather than letting data leak between domains by accident.

Zenoh vs MQTT vs DDS for Robotics Middleware (2026)

Zenoh vs MQTT vs DDS for Robotics Middleware (2026)

Background: the robotics middleware problem

The core comparison: Zenoh vs MQTT vs DDS

The decision matrix

Discovery: where DDS hurts and Zenoh helps

Latency and QoS: matching the model to the loop

Transport and wire efficiency

Edge and WAN fit: the dimension that breaks ties

When to choose which

Trade-offs and gotchas

Practical recommendations and checklist

Operational cost and the migration path

FAQ

Further Reading

Related

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories