Eclipse Ditto Tutorial: Build a Digital Twin Backend (2026)

Most digital twin projects die in the same place: a device publishes telemetry to a broker, and then someone has to invent a database schema, a REST API, a change-notification fan-out, and an access-control model from scratch — for every twin. This Eclipse Ditto tutorial shows you the alternative. Ditto is an open-source framework, governed by the Eclipse Foundation, that gives you the twin backend as a running service: a state store, a REST and WebSocket API, MQTT and Kafka connectors, fine-grained authorization, and a search index, all keyed off one JSON model of your device. You bring devices and a consumer app; Ditto holds the authoritative twin in between.

By the end you will have stood up Ditto with Docker Compose, modelled a pump as a thing with features, ingested live MQTT telemetry through a connection with a payload mapper, queried the twin over HTTP, and streamed change events into a browser.

What this covers: the things-and-features model, the Ditto Protocol, the HTTP API, MQTT connectivity and payload mapping, WebSocket and SSE change events, and policy-based authorization — with runnable config at each step.

Context and Background

A digital twin backend has one core job: hold the current, authoritative state of a physical asset and make that state queryable, modifiable, and observable by other systems. The hard part is never storing a number — it is doing so with structure, access control, and real-time fan-out at the scale of thousands or millions of devices.

It helps to be precise about what “digital twin” means here, because the term is overloaded. In some communities a twin is a physics-based simulation; in others it is a 3D visualization; in the data-plane sense Ditto serves, a twin is the live, addressable, access-controlled software representation of a device’s state and capabilities. These layers compose rather than compete: a simulation or a 3D scene needs a source of current state to be useful, and that source is exactly what Ditto provides. Keeping the layers distinct is what lets each evolve independently — you can swap your visualization engine without touching your state backend, and vice versa. This tutorial is squarely about the state-and-API layer.

The market splits into two camps. Cloud-native twin platforms — Azure Digital Twins, AWS IoT TwinMaker — bind you to one provider’s identity, billing, and data plane. On the other side sit roll-your-own stacks built on a time-series database plus a hand-written API, which start fast and rot under the weight of bespoke authorization and notification code. Standards bodies tackle the modelling layer differently again: the Asset Administration Shell architecture defines an interoperable Industry 4.0 metamodel, while Azure’s stack leans on the Digital Twins Definition Language.

Eclipse Ditto occupies a deliberate middle ground. It is vendor-neutral, self-hostable, and opinionated about exactly one thing: the twin’s state and API, not its 3D visualization or simulation. Ditto reached version 3.0 in 2022; the current line is 3.9, with 3.9.0 announced in May 2026 and built on Java 25, per the official release notes. It speaks Web of Things (WoT) for model definitions and integrates cleanly with a broker-centric ingestion layer such as a unified namespace. That focus is why it slots neatly behind an MQTT broker rather than trying to replace it.

The framing that helps most is this: Ditto is a system of record for current state, not a message bus and not a historian. A message bus (MQTT, Kafka) moves events but forgets them; a historian (InfluxDB, TimescaleDB) remembers everything but answers “what was the value at 14:03” rather than “what is the value now, and who is allowed to see it.” Ditto answers the second question with a queryable, access-controlled, real-time document per asset. When teams try to make Ditto play all three roles they fight the design; when they let it own current state and delegate transport and history elsewhere, it fits cleanly into an existing stack. That separation of concerns is the through-line for every decision in this tutorial.

It is worth dwelling on why that “current state” niche is the one worth owning, because it is the piece every twin project needs and almost nobody enjoys building. Transport you can buy off the shelf — a broker is a solved problem. History you can buy off the shelf — a time-series database is a solved problem. What sits between them, and what gets reinvented badly on every project, is the authoritative, addressable, access-controlled now: a representation that answers a query in milliseconds, that enforces who may see which field, that notifies interested parties the instant it changes, and that survives the device going to sleep. That layer is genuinely hard to build well because it must be simultaneously fast to read, safe to expose, and consistent under concurrent writes — and it is exactly the layer Ditto productizes. Recognizing that this is the scarce, expensive piece is what makes the “let the broker move it, let the historian remember it, let Ditto own the now” division of labor feel obvious rather than arbitrary.

Eclipse Ditto Architecture and Core Concepts

Eclipse Ditto is a set of microservices that together expose your devices as JSON digital twins. The Things service holds twin state in MongoDB; Policies enforces access; Things-Search builds a queryable index; Connectivity bridges external protocols like MQTT and Kafka; and the Gateway terminates the HTTP, WebSocket, and SSE APIs. You interact with twins, never with the services directly.

Figure 1: Ditto’s service topology. Devices publish to a broker; the Connectivity service maps payloads into Ditto Protocol commands; the Things service persists state to MongoDB and feeds Things-Search; the Gateway serves REST, WebSocket, and SSE to consumer apps, with Policies enforcing authorization across all paths.

The diagram shows the request flow you will exercise throughout this tutorial. Telemetry enters from the left through the Connectivity service, lands in the Things service, and is persisted to MongoDB while simultaneously indexed by Things-Search. On the right, consumer applications read and subscribe through the Gateway. Every read and every write is checked against the relevant policy. Nothing reaches MongoDB without passing authorization first.

The separation of services is not cosmetic; it is what lets Ditto scale each concern independently. The Things service is the write-heavy core and is sharded by thing ID, so a fleet of millions of twins spreads across instances. Things-Search maintains a denormalized projection in its own MongoDB collections, updated asynchronously from twin events, which is why search is eventually consistent: a twin you just wrote may take a beat to appear in query results. Connectivity is horizontally scalable per connection and is the only service that holds a socket open to your broker. The Gateway is stateless and sits behind your load balancer. Because each tier scales on its own axis, a read-heavy dashboard workload and a write-heavy ingestion workload do not contend for the same resources — a property you lose the moment you collapse all of this into a single hand-rolled API process.

Under the hood, the reason Ditto can shard the Things service cleanly is that it is built on an actor model: each thing is, in effect, a small stateful actor addressed by its ID, and the cluster routes every command for org.acme:pump-01 to the one actor instance that owns that thing. Two consequences follow that matter in practice. First, writes to a single thing are serialized through its owning actor, which is what gives you a coherent per-thing revision sequence and makes the optimistic-locking story in Step 4 work without a distributed lock service. Second, hot things — a single twin receiving an enormous write rate — are a real bottleneck, because all of that traffic funnels through one actor; the cure is to model state so that load spreads across many things rather than piling onto one. Understanding that “a thing is an actor” demystifies both why Ditto scales horizontally with thing count and why it does not magically scale a single overloaded twin.

Things, features, attributes, and definitions

A thing is the unit of twinning in Ditto — one physical asset, one JSON document. Its identifier is a two-part namespace:name string, for example org.acme:pump-01. The namespace groups twins logically and is the unit that policies and search can scope to.

Inside a thing, Ditto separates two kinds of data. Attributes hold static or slowly changing metadata: a serial number, an install location, a manufacturer. Features hold functional, dynamic data, grouped into named blocks. A pump might have a temperature feature, a vibration feature, and a flow feature. Each feature carries properties (the reported, current values) and optionally desiredProperties (the target values an operator wants the device to converge on — the basis for command-and-control). A thing can also carry a definition, a URL pointing at a WoT Thing Model that describes its features formally.

That definition is more than documentation. When a thing references a Web of Things model, Ditto in the 3.x line can validate writes against it and skeleton-generate the feature structure, so a malformed property type is rejected at the API boundary rather than silently corrupting the twin. Each feature can additionally carry its own feature-level definition, a list of model identifiers that says “this block conforms to these interfaces” — the mechanism by which two pumps from different vendors can both expose a standard temperature interface and be queried uniformly. Ditto also attaches metadata to any node (last-modified timestamps, for instance) without polluting your value schema. The practical takeaway: model the asset’s capabilities as features with definitions, not as a flat bag of attributes, and your fleet stays queryable and interoperable as it grows.

Policies: authorization as data

Access control in Ditto is not bolted on; it is a first-class entity. A policy is a separate JSON document, referenced by a thing’s policyId, that lists subjects (authenticated identities) and grants or revokes them READ and WRITE permission on resources (paths within the thing, like thing:/features/temperature). Because authorization is data, you version it, audit it, and reuse one policy across many things.

The twin channel versus the live channel

Ditto distinguishes the twin from the device. The twin channel reads and writes Ditto’s persisted representation — fast, always available, no round-trip to a sleeping sensor. The live channel routes a command through to the actual device and is used when you genuinely need the physical value right now or want to issue an actuation command. Most reads hit the twin; most control actions use live. This split is the single most important mental model in Ditto.

Concretely, a twin read returns in single-digit milliseconds because it is a MongoDB lookup; a live read can take as long as the device takes to wake, respond, and round-trip the network, and it can time out if the device is offline. The two channels are also reflected directly in the Ditto Protocol topic path, where the fourth segment is literally twin or live. Choosing the wrong channel is a common early mistake: querying the live channel for a dashboard that refreshes every second will both hammer your devices and stall on any that are asleep. The rule of thumb is to read the twin for observation, write the twin’s desiredProperties to express intent, and use the live channel only when you must touch the physical device in the moment.

Hands-On Walk-Through: From Compose to Change Events

This is the runnable core of the tutorial. We will start Ditto, create a policy and a thing, push telemetry both directly and through MQTT, then watch changes stream out.

Step 1 — Run Ditto with Docker Compose

The fastest path is the official Compose stack. The snippet below is a trimmed, self-contained version sufficient for local development. In production you would split the services and externalize MongoDB.

# docker-compose.yml
version: "3.7"
services:
  mongodb:
    image: docker.io/mongo:7.0
    command: mongod --storageEngine wiredTiger
    ports:
      - "27017:27017"

  policies:
    image: docker.io/eclipse/ditto-policies:3.9.2
    environment:
      - MONGO_DB_HOSTNAME=mongodb
    depends_on: [mongodb]

  things:
    image: docker.io/eclipse/ditto-things:3.9.2
    environment:
      - MONGO_DB_HOSTNAME=mongodb
    depends_on: [mongodb]

  things-search:
    image: docker.io/eclipse/ditto-things-search:3.9.2
    environment:
      - MONGO_DB_HOSTNAME=mongodb
    depends_on: [mongodb]

  connectivity:
    image: docker.io/eclipse/ditto-connectivity:3.9.2
    environment:
      - MONGO_DB_HOSTNAME=mongodb
    depends_on: [mongodb]

  gateway:
    image: docker.io/eclipse/ditto-gateway:3.9.2
    environment:
      - ENABLE_DUMMY_AUTH=true
    ports:
      - "8080:8080"
    depends_on: [policies, things, things-search, connectivity]

  nginx:
    image: docker.io/nginx:1.27-alpine
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    ports:
      - "8081:80"
    depends_on: [gateway]

Setting ENABLE_DUMMY_AUTH=true lets you authenticate by simply asserting a subject in a header — convenient for a tutorial, and something you must never ship. We will use the dummy subject nginx:ditto below. Bring the stack up:

docker compose up -d
# wait for the gateway to report healthy
curl -s http://localhost:8080/health | jq .status

For real deployments, prefer the maintained Ditto Helm chart on Kubernetes and wire OAuth 2.0 / OpenID Connect for authentication instead of dummy auth.

Step 2 — Create a policy

Because every thing needs a policy that grants at least one subject READ and WRITE, we create the policy first. The HTTP API is versioned in the path as /api/2; version 1 was removed back in Ditto 2.0, so API 2 is the only option today.

curl -X PUT \
  http://localhost:8080/api/2/policies/org.acme:pump-policy \
  -u ditto:ditto \
  -H 'Content-Type: application/json' \
  -d '{
    "entries": {
      "owner": {
        "subjects": {
          "nginx:ditto": { "type": "dummy auth user" }
        },
        "resources": {
          "thing:/":   { "grant": ["READ", "WRITE"], "revoke": [] },
          "policy:/":  { "grant": ["READ", "WRITE"], "revoke": [] },
          "message:/": { "grant": ["READ", "WRITE"], "revoke": [] }
        }
      },
      "observer": {
        "subjects": {
          "nginx:dashboard": { "type": "read-only dashboard" }
        },
        "resources": {
          "thing:/features/temperature": { "grant": ["READ"], "revoke": [] }
        }
      }
    }
  }'

Note the observer entry: it grants the dashboard subject read access to only the temperature feature. That scoping is exactly the policy mechanism Figure 4 walks through later.

Step 3 — Model the thing

Now create the twin. The JSON structure of a thing in API 2 is fixed: thingId, policyId, an optional definition, then attributes and features.

curl -X PUT \
  http://localhost:8080/api/2/things/org.acme:pump-01 \
  -u ditto:ditto \
  -H 'Content-Type: application/json' \
  -d '{
    "policyId": "org.acme:pump-policy",
    "attributes": {
      "manufacturer": "AcmePumps",
      "serialNo": "AP-44910",
      "location": "Plant-3 / Line-B"
    },
    "features": {
      "temperature": {
        "properties": { "value": 41.2, "unit": "Celsius" }
      },
      "vibration": {
        "properties": { "value": 0.8, "unit": "mm/s" }
      }
    }
  }'

Figure 2: The anatomy of a Ditto thing. Static metadata lives under attributes; functional blocks live under features, each with reported properties and optional desiredProperties. The policyId links access control and the definition links a Web of Things model.

The model in Figure 2 is what makes Ditto’s API ergonomic. Because the thing is a JSON tree, every node has an address. You can read or write the whole thing, a single feature, or one property — the URL path mirrors the JSON path exactly.

Step 4 — Read and partially update twin state

Read the whole twin, then drill into one property:

# whole thing
curl -u ditto:ditto http://localhost:8080/api/2/things/org.acme:pump-01

# just the temperature value
curl -u ditto:ditto \
  http://localhost:8080/api/2/things/org.acme:pump-01/features/temperature/properties/value

To update only the temperature without resending the entire document, use a PATCH with application/merge-patch+json. Ditto applies JSON Merge Patch (RFC 7396) semantics, and supports ETag-based conditional requests for optimistic locking:

curl -X PATCH \
  http://localhost:8080/api/2/things/org.acme:pump-01/features/temperature/properties \
  -u ditto:ditto \
  -H 'Content-Type: application/merge-patch+json' \
  -d '{ "value": 47.9 }'

That single PATCH triggers a merged event internally — the same event your consumers will subscribe to in Step 7.

Two HTTP features make this write path safe under concurrency. Every twin response carries an ETag header reflecting the current revision; pass it back as If-Match and Ditto rejects the write with 412 Precondition Failed if another writer changed the twin in between. That gives you optimistic locking without a separate lock service. Ditto also supports a condition query parameter holding an RQL expression, so you can express “only apply this write if the recorded temperature is still below 50” directly on the request — useful for idempotent edge logic that must not overwrite a newer reading. Both mechanisms let many writers share one twin without a coordinator, which matters when a device, an operator UI, and an automation rule all touch the same asset.

Step 5 — Ingest telemetry over MQTT

Direct HTTP writes are fine for control planes, but real devices speak MQTT. Ditto’s Connectivity service consumes from a broker via a connection with sources (inbound topics) and optionally targets (outbound topics). You create a connection by piggybacking a DevOps command, or via the connections API. Here is a connection that subscribes to a Mosquitto broker:

curl -X POST \
  http://localhost:8080/api/2/connections \
  -u devops:foobar \
  -H 'Content-Type: application/json' \
  -d '{
    "name": "mqtt-pump-ingest",
    "connectionType": "mqtt",
    "connectionStatus": "open",
    "failoverEnabled": true,
    "uri": "tcp://test.mosquitto.org:1883",
    "sources": [{
      "addresses": ["acme/+/telemetry"],
      "authorizationContext": ["nginx:ditto"],
      "qos": 1,
      "payloadMapping": ["pumpMapper"]
    }],
    "mappingDefinitions": {
      "pumpMapper": {
        "mappingEngine": "JavaScript",
        "options": {
          "incomingScript": "function mapToDittoProtocolMsg(headers, textPayload) { var d = JSON.parse(textPayload); var topic = \"org.acme/\" + d.id + \"/things/twin/commands/merge\"; var value = { temperature: { properties: { value: d.tempC } }, vibration: { properties: { value: d.vibMm } } }; return Ditto.buildDittoProtocolMsg(\"org.acme\", d.id, \"things\", \"twin\", \"commands\", \"merge\", \"/features\", headers, value); }"
        }
      }
    }
  }'

Two things deserve attention. First, the source address acme/+/telemetry uses an MQTT single-level wildcard so one connection serves every pump. Second, the device does not have to speak Ditto Protocol. The JavaScript payload mapper converts an arbitrary JSON telemetry message into a Ditto merge command targeting /features, deriving the thing ID from the payload. This is the standard pattern: keep devices dumb, normalize at the edge of Ditto.

The mapper is the part most worth understanding deeply, because it is where the real-world mess of device firmware meets Ditto’s clean model. The incomingScript runs once per received message inside Ditto’s connectivity service. Its job is to return one or more Ditto Protocol messages built by the Ditto.buildDittoProtocolMsg(...) helper, whose arguments mirror the topic path: namespace, name, group, channel, criterion, action, the JSON pointer path, headers, and the value. Choosing merge rather than modify matters: merge applies a partial update and leaves untouched properties alone, so a telemetry message that carries only temperature does not wipe out the vibration value. A modify on /features would replace the entire features object. For binary or non-JSON payloads — CBOR, Protobuf, a packed sensor frame — the same hook receives a bytePayload you decode in JavaScript before building the message. You can also return an empty array to deliberately drop a message, which is the clean way to filter heartbeats or malformed frames without erroring. Keep mappers small and pure; they run on the hot path for every inbound message and a slow mapper throttles your entire ingestion rate.

There is a symmetric outgoingScript for targets, which converts Ditto events back into whatever wire format a downstream system expects before publishing them to an MQTT target topic. That is how you fan twin changes back out to other brokers, gateways, or legacy systems without coupling them to the Ditto Protocol.

Now publish a telemetry message as a device would. The QoS-1 publish carries plain device JSON, not Ditto JSON:

mosquitto_pub -h test.mosquitto.org -t 'acme/pump-01/telemetry' -q 1 \
  -m '{ "id": "pump-01", "tempC": 52.4, "vibMm": 1.3 }'

Within a moment, a GET on the twin shows the merged values — the mapper translated the device payload into a twin update without any code on the device side. If the mapper throws, Ditto logs a mapping error and drops the message, so test scripts against the connectivity mapping docs before going live.

Step 6 — Understand the Ditto Protocol

Everything above produced the same internal currency: a Ditto Protocol message. Its routing is carried in a six-segment topic path, {namespace}/{name}/{group}/{channel}/{criterion}/{action}. The merge you sent maps to:

org.acme/pump-01/things/twin/commands/merge

things is the group, twin the channel, commands the criterion, merge the action. When the command succeeds, Ditto emits a corresponding event on org.acme/pump-01/things/twin/events/merged. Understanding this grammar is what lets you subscribe to exactly the changes you care about.

The grammar repays a moment of study because every API surface in Ditto is a projection of it. The criterion segment is the one that surprises people: it takes not only commands and events but also messages (for the live-channel device messaging you reach through the WebSocket) and errors (so a failed command produces an addressable error topic you can subscribe to, rather than a silent drop). The action for a command and the action for its resulting event differ on purpose — you send modify or merge, and you receive modified or merged — which lets a consumer distinguish “someone asked to change this” from “this actually changed.” When you later filter an event stream, you are filtering against this exact topic structure, so a subscription like “every merged event in the org.acme namespace” is just a pattern over the six segments. Learn the grammar once and the HTTP paths, the WebSocket envelopes, and the connection mappers all stop looking like separate APIs and start looking like one addressing scheme seen from different angles.

Figure 3: End-to-end message flow. A consumer opens an SSE stream first; the device publishes telemetry; Connectivity runs the JavaScript mapper and issues a twin merge command; the Things service persists to MongoDB and emits a merged event, which streams back to the consumer in near real time.

Figure 3 traces the full path you have now built. The ordering matters: a consumer subscribes before the change so it observes the merged event. Ditto does not replay missed events to a late subscriber, so for guaranteed delivery you would add acknowledgements or read current state on connect.

Step 7 — Stream change events (WebSocket and SSE)

Two real-time interfaces fan out twin changes. The WebSocket is duplex: you can both send commands and receive events. After connecting to /ws/2, send the literal control frame START-SEND-EVENTS, after which the socket delivers every event the authenticated subject is authorized to see, as Ditto Protocol JSON.

// Browser or Node WebSocket consumer
const ws = new WebSocket("ws://localhost:8080/ws/2", [], {
  headers: { "x-ditto-dummy-auth": "nginx:ditto" } // dev-only auth
});
ws.onopen = () => ws.send("START-SEND-EVENTS");
ws.onmessage = (e) => {
  if (e.data.startsWith("START-SEND-EVENTS:ACK")) return;
  const msg = JSON.parse(e.data);
  console.log(msg.topic, "->", JSON.stringify(msg.value));
};

For one-way consumption, Server-Sent Events (SSE) is simpler and works with the browser’s native EventSource. Open an SSE stream against the things resource with Accept: text/event-stream; unlike the WebSocket, SSE delivers the changed entity in normal Thing JSON, not Protocol envelopes. You can narrow the stream with an RQL filter and a field projection:

curl -N -u ditto:ditto \
  -H 'Accept: text/event-stream' \
  'http://localhost:8080/api/2/things?ids=org.acme:pump-01&fields=features/temperature'

// Browser EventSource (one-way, native)
const es = new EventSource(
  "/api/2/things?fields=thingId,features/temperature/properties/value"
);
es.onmessage = (ev) => {
  if (!ev.data) return;            // keep-alive comments are empty
  const thing = JSON.parse(ev.data);
  render(thing.thingId, thing.features.temperature.properties.value);
};

Push another mosquitto_pub from Step 5 and both consumers print the new value within milliseconds. That is the complete loop: device to broker to Connectivity to twin to consumer, with no polling anywhere.

Which interface should you reach for? Use SSE when the consumer is a browser dashboard that only needs to observe: it rides ordinary HTTP, survives proxies and corporate firewalls that block raw WebSockets, reconnects automatically via the native EventSource, and hands you ready-to-render Thing JSON. Use the WebSocket when the consumer also needs to send — issue live commands, call device messages, or subscribe to several signal types on one socket — because it is full-duplex and carries the raw Ditto Protocol envelope with its topic and headers intact. A common production shape is SSE for the many read-only screens and a small number of WebSocket connections for the services that actually control devices. Both honor the same policies, so a consumer never receives an event for a twin or feature it cannot read — the authorization check in Figure 4 runs on the event stream exactly as it does on a plain GET.

Step 8 — Search across twins

The Things-Search service indexes every twin and exposes an RQL query language. To find every pump running hot:

curl -u ditto:ditto -G http://localhost:8080/api/2/search/things \
  --data-urlencode 'filter=gt(features/temperature/properties/value,50)' \
  --data-urlencode 'option=sort(-features/temperature/properties/value),size(25)'

Search respects the same policies as reads, so a subject only ever sees twins it is authorized to read.

Step 9 — Verify the pipeline and debug what breaks

Before trusting the loop, prove each hop. The discipline here saves hours, because a twin that “isn’t updating” can fail at the broker, the connection, the mapper, or authorization — and the symptom is identical at the twin: nothing changes.

First, confirm the connection is live. Ditto exposes per-connection status and metrics through the connections API, and a connection can report open while still failing to consume — for example if the broker rejected the subscription QoS. Retrieve the live status and the counters:

# overall status, including client state and last error
curl -u devops:foobar \
  http://localhost:8080/api/2/connections/mqtt-pump-ingest/status

# inbound/outbound consumed, mapped, and dropped counts
curl -u devops:foobar \
  http://localhost:8080/api/2/connections/mqtt-pump-ingest/metrics

Read those metrics like a funnel. If consumed is climbing but mapped is not, the JavaScript mapper is throwing — fix the script. If consumed is flat while you are publishing, the source address or broker subscription is wrong — check the topic and QoS. If mapped climbs but the twin still does not change, the authorizationContext subject lacks WRITE on the path — a policy problem, which Figure 4 explains.

Second, watch the twin directly. Open the SSE stream from Step 7 in one terminal and publish from another; if the event arrives on SSE, the write path is healthy end to end. If the HTTP read shows the new value but SSE is silent, the consumer subscribed too late or lacks read permission. This bisection — connection metrics, then twin read, then change stream — localizes almost every failure to a single hop in under a minute.

Step 10 — How authorization is actually decided

It is worth tracing what Ditto does on every single request, because the policy model in Step 2 only makes sense once you see it run. When a request arrives, the gateway resolves the authenticated subject (from OAuth/OIDC in production, or the dummy header here). Ditto loads the thing’s policy by policyId, matches the policy entries whose subjects include that subject, and evaluates the resources those entries reference against the path being accessed.

Figure 4: How a policy decision is made. The authenticated subject is matched against policy entries; the requested resource path is evaluated against granted and revoked permissions; the effective decision is grant minus revoke, with the most specific path winning. A denial returns 403 Forbidden.

Two rules from Figure 4 trip people up. First, permissions are evaluated as grant minus revoke: a revoke always beats a grant at the same or coarser path, which lets you grant broadly then carve out exceptions. Second, the most specific path wins, so a grant on thing:/features/temperature can coexist with a revoke on thing:/ to express “deny everything except the temperature feature.” In our Step 2 policy, the observer subject can read only thing:/features/temperature and nothing else — try reading the vibration feature as that subject and Ditto returns a 403. This is the same machinery that filters search results, so authorization is consistent across every API surface.

Trade-offs, Gotchas, and What Goes Wrong

Ditto is powerful, but several sharp edges catch first-time teams. Knowing them ahead of time saves days.

Dummy auth leaks into pilots. The ENABLE_DUMMY_AUTH flag is the fastest way to a working demo and the fastest way to an open backend. Anyone who can reach the gateway can assert any subject. Disable it the moment you move past localhost and put OAuth 2.0 / OIDC or nginx-terminated auth in front.

Payload mappers fail silently. A JavaScript mapper that throws, returns the wrong topic, or emits malformed Protocol JSON does not crash anything visible — the message is logged as a mapping error and discarded. Telemetry simply stops arriving. Always test mappers with representative payloads and watch the connection’s logs and metrics during rollout.

Late SSE/WebSocket subscribers miss events. Change streams are live, not a log. A consumer that connects after a change never sees it, and there is no built-in backfill. For at-least-once semantics, combine acknowledgement requests with a read of current state on (re)connect, or push events into a durable broker via a target.

MQTT QoS is a contract with the broker, not a guarantee. Ditto requests the QoS you configure, but the external broker decides what it honors — AWS IoT, for instance, will not acknowledge QoS 2 subscriptions. A qos: 0 source quietly drops messages during a reconnect-for-redelivery cycle. Match QoS to the broker’s real capabilities.

MongoDB is the scaling pressure point. Twin state, the search index, and connection metadata all live in MongoDB. High-churn telemetry written straight to the twin can hammer it. Aggregate or throttle at the edge, and do not treat the twin as a time-series store — pair it with a proper historian for long-term trends.

Policy sprawl. One policy per thing is simple but becomes thousands of near-identical documents. Use namespace-scoped policy entries and policy imports (expanded in the 3.9 line) to centralize common rules instead of copy-pasting. The 3.9 release adds namespace root policies that are transparently merged into every policy in a namespace, and a resolved-policy view that returns the effective merged policy after imports — both of which exist precisely because sprawl is the failure mode teams hit at scale.

Eventual consistency in search surprises people. Because Things-Search updates its index asynchronously from twin events, a write-then-search immediately after can miss the just-written twin. This is correct behavior, not a bug, but code that creates a thing and then queries for it in the same breath will flake intermittently. Read the thing directly by ID when you need read-your-writes consistency, and reserve search for analytical or fleet-wide queries where a sub-second lag is irrelevant.

Modify versus merge on the write path. Sending a full PUT/modify when you mean to update one property silently deletes everything you omitted. This is the most common data-loss bug in early Ditto integrations. Default to PATCH/merge for partial updates and reserve full replacement for genuine resets, exactly as the MQTT mapper above does.

The thundering-herd reconnect. When a broker or the Connectivity service restarts, every device and every consumer can reconnect at once, producing a spike that looks like a load problem but is really a coordination problem. On the ingestion side, failoverEnabled with backoff prevents a connection from hammering a recovering broker; on the consumer side, jittered reconnection and a state read on reconnect (rather than a replay request Ditto cannot serve) keep the herd from stampeding the gateway. This is invisible in a single-device demo and very visible the first time a real fleet reconnects, so it belongs on the pre-production checklist rather than in the post-incident review.

Practical Recommendations

Treat Ditto as the authoritative state and API layer, not as your ingestion buffer or your historian. Put a broker in front for ingestion and a time-series database alongside for history; let Ditto own the current twin and its access surface. Model deliberately: attributes for what rarely changes, features for what does, and a WoT definition so the model is self-describing. Normalize device payloads in connection mappers so devices stay simple and the twin schema stays clean.

Operationally, lock down authentication before anything leaves your laptop, and design your consumers to tolerate missed events by reading state on connect. Decide your write path early — direct HTTP for control planes, MQTT-with-mapper for device fleets — and keep the two consistent.

For capacity planning, start from your write rate, not your twin count. A million mostly-idle twins is cheap; ten thousand twins each merging a reading every second is a sustained ten thousand writes per second hitting the Things service and cascading into search index updates and event fan-out. If that rate is real, aggregate at the edge — average or downsample readings before they reach the twin — so the twin reflects meaningful state rather than raw sensor noise. Use desiredProperties to model control intent cleanly: write the target to the twin, let a downstream connection translate it into a live command, and let the device’s next telemetry confirm convergence. That pattern keeps the request/response control loop decoupled from the device’s availability. Finally, attach a WoT definition to every thing from day one; retrofitting models onto a populated fleet is far more painful than starting with them, and the 3.9 WoT discovery endpoint only pays off if your things are actually described.

Checklist before you call a Ditto backend production-ready:

[ ] Real authentication (OAuth 2.0 / OIDC), ENABLE_DUMMY_AUTH removed.
[ ] Policies scoped to least privilege; common rules factored into imports or namespace entries.
[ ] Payload mappers tested against malformed and edge-case device messages.
[ ] MQTT source QoS matched to broker capability; redelivery behavior understood.
[ ] Consumers reconcile state on connect; critical paths use acknowledgements.
[ ] MongoDB sized and monitored; a separate historian handles long-term telemetry.
[ ] WoT definition attached to things so the model is discoverable.

Frequently Asked Questions

What is Eclipse Ditto used for?

Eclipse Ditto is an open-source digital twin backend. It stores the authoritative JSON state of physical devices as things, exposes that state through REST, WebSocket, and SSE APIs, ingests telemetry from brokers like MQTT and Kafka, enforces fine-grained authorization with policies, and indexes twins for search. Teams use it to avoid hand-building the state store, API, and notification layer that every twin project otherwise needs from scratch.

What is the difference between a thing and a feature in Ditto?

A thing is the whole digital twin — one physical asset represented as a JSON document with a namespace:name identifier. A feature is a named functional block inside that thing, holding dynamic data under properties and optional desiredProperties. Static metadata such as serial number or location lives in attributes instead. In short: the thing is the asset; features are its capabilities or sensor groups.

How does Eclipse Ditto handle MQTT?

Ditto’s Connectivity service opens an MQTT connection with sources (topics it subscribes to) and targets (topics it publishes to). Inbound device payloads that are not already Ditto Protocol are converted by a JavaScript payload mapper into Ditto commands, typically a merge on a thing’s features. Source addresses support MQTT wildcards, and you set the requested QoS per source and target, subject to what the external broker honors.

How do I get real-time updates from a Ditto twin?

Two interfaces stream change events. The WebSocket at /ws/2 is bidirectional: after sending the START-SEND-EVENTS frame, it delivers every authorized event as Ditto Protocol JSON, and lets you send commands back. Server-Sent Events on /api/2/things with Accept: text/event-stream is one-way and simpler, delivering the changed entity as Thing JSON and working with the browser’s native EventSource. Both can be filtered with RQL.

Is Eclipse Ditto free and production-ready?

Yes. Ditto is licensed under the Eclipse Public License 2.0 and is free to self-host. It is a mature, actively maintained project under the Eclipse Foundation, currently on the 3.9 line. Production readiness depends on you: deploy it on Kubernetes with a managed MongoDB, replace dummy auth with OAuth 2.0 / OIDC, scope policies tightly, and pair it with a broker for ingestion and a historian for long-term data.

What database does Eclipse Ditto use?

Ditto persists twin state, the search index, and connection metadata in MongoDB. Each service uses its own collections, and MongoDB’s document model maps naturally onto Ditto’s JSON things. Because all state funnels into MongoDB, it is the primary scaling consideration: size it for your twin count and write rate, and avoid writing high-frequency telemetry directly into the twin when a time-series store is the better home.

Eclipse Ditto Tutorial: Build a Digital Twin Backend (2026)

Eclipse Ditto Tutorial: Build a Digital Twin Backend (2026)

Context and Background

Eclipse Ditto Architecture and Core Concepts

Things, features, attributes, and definitions

Policies: authorization as data

The twin channel versus the live channel

Hands-On Walk-Through: From Compose to Change Events

Step 1 — Run Ditto with Docker Compose

Step 2 — Create a policy

Step 3 — Model the thing

Step 4 — Read and partially update twin state

Step 5 — Ingest telemetry over MQTT

Step 6 — Understand the Ditto Protocol

Step 7 — Stream change events (WebSocket and SSE)

Step 8 — Search across twins

Step 9 — Verify the pipeline and debug what breaks

Step 10 — How authorization is actually decided

Trade-offs, Gotchas, and What Goes Wrong

Practical Recommendations

Frequently Asked Questions

What is Eclipse Ditto used for?

What is the difference between a thing and a feature in Ditto?

How does Eclipse Ditto handle MQTT?

How do I get real-time updates from a Ditto twin?

Is Eclipse Ditto free and production-ready?

What database does Eclipse Ditto use?

Further Reading

Related

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories