Building a Digital Twin with DTDL v3 and Azure Digital Twins: End-to-End Tutorial
You need a working digital twin deployed this quarter. This DTDL digital twin tutorial walks you through shipping a real Azure Digital Twins instance—from designing your schema in DTDL v3, to running telemetry through the graph, to querying live state in a production topology. We explain the first-principles trade-offs that shape every layer, show you exact code and CLI commands that work, and prepare you for what breaks when it matters.
TL;DR
- What you build: A three-tier factory topology (Factory → ProductionLine → Machine → Sensor) modeled in DTDL v3, deployed on Azure Digital Twins, with IoT Hub → Azure Function → twin-property patching in near-real-time.
- Core insight: DTDL extends JSON-LD (semantic reusability), ADT separates schema from instances (flexible scaling), relationships are first-class objects (queryable edges, not foreign keys).
- Cost: ~$50–200/month for 10k twins + 100k telemetry events/hour; queries and relationships are metered separately.
- Time: 60–90 minutes to deploy; another 15 minutes to see your first telemetry flow through the graph.
- Prerequisites: Azure subscription, Azure CLI, Node/Python, JSON/HTTP fluency, basic IoT understanding.
Terminology Primer
Before diving into code, ground four foundational terms with plain-language definitions and analogies.
DTDL (Digital Twins Definition Language)
A machine-readable ontology (schema) that describes what a digital twin is and what it can do. Think of it like a class definition in object-oriented programming—it specifies properties (state), telemetry (time-series data), commands (actions you can invoke), and relationships (edges to other twins). DTDL is built on JSON-LD, which standardizes how machines interpret linked data; this lets DTDL schemas interoperate with other semantic platforms.
Digital Twin
A virtual replica of a physical asset (a factory, machine, sensor) whose state is kept synchronized with the physical asset in near-real-time. The twin is not the asset itself, nor is it just a database record—it’s a runtime graph node with properties, relationships, and event subscriptions that reflect live activity. If a physical temperature sensor reads 65°C, the twin’s lastReading property updates within milliseconds.
Interface
A DTDL interface is a reusable schema template. It defines the contract: “any twin instantiated from this interface will have these properties, telemetry, commands, and relationships.” Interfaces compose hierarchically (a Factory interface can relate to ProductionLine interfaces, which relate to Machine interfaces). Each interface is versioned and immutable once deployed.
Relationship
An edge between two twins that represents a semantic connection. Unlike a foreign key in a relational database, a relationship is a first-class object: it can have properties (e.g., installationDate on a mounts relationship), can be queried directly, and can fire events when created or deleted. This is why ADT’s query language supports JOIN Machine RELATED Factory.hasLine.hasMachine—the relationship itself is queryable.
Ontology
The emergent structure formed by instances of interfaces and their relationships. Your ontology answers questions like “which machines are in production line A?” and “which sensors exceed the temperature threshold?” without expensive joins or denormalization.
Architecture Layer 1: End-to-End Pipeline
Before modeling, visualize the data flow from device through cloud to consumer. This diagram shows how each component hands off work.
Setup: Below is the complete journey of a telemetry message: a physical sensor emits a reading, it travels to Azure IoT Hub, gets captured by an Azure Function, is patched into the digital twin, and triggers downstream events.

Walkthrough:
- Physical Sensors → IoT Hub: Devices connect via MQTT or HTTPS and send JSON payloads. IoT Hub is the single ingress point; it handles authentication, deduplication, and routing.
- Event Hub Endpoint: IoT Hub has a built-in Event Hub endpoint that all messages flow to automatically. This decouples ingestion from processing.
- Azure Function: Listens on the Event Hub, deserializes each message, and constructs a JSON Patch (RFC 6902) targeting a specific twin. Functions execute in milliseconds; cold starts (30–60 sec) are a gotcha in serverless designs.
- Twin Update: The patch is sent to ADT via REST (
PATCH /digitaltwins/{id}). ADT applies the patch and updates the twin’s properties atomically. - Event Grid: ADT publishes events (twin created, property changed, relationship added) to Event Grid. Downstream systems subscribe to these events for real-time reactions (alerting, analytics, dashboard refresh).
Why this layering? IoT Hub handles connection state and scale; Event Hub decouples message production from processing; Functions keep the data-plane stateless; Event Grid ensures asynchronous fanout without tight coupling. Each layer can be scaled independently.
Architecture Layer 2: DTDL v3 Interface Composition
DTDL interfaces compose like building blocks. Understanding composition is key to schema reuse and avoiding redundancy.
Setup: A DTDL interface declares what a twin has and what it does. The four interface types are properties (state), telemetry (streaming data), commands (RPC calls), and relationships (edges to other twins). Below is how they fit together:

Walkthrough:
- Properties store state:
operationalStatus(enum),factoryId(string),cycleTime(double). Properties are patched when the physical asset changes state. They are writable and queryable. - Telemetry streams time-series data:
vibration(in Hz),temperature(in C),pressure(in Pa). Telemetry is read-only and event-driven. Each telemetry message is a new event, not an overwrite. - Commands are invocations:
startMachine(),setTemperatureTarget(newTarget). The twin receives a command, executes logic (or relays to the physical device), and returns a response. Commands require round-trip latency. - Relationships connect twins:
Factory hasLine ProductionLine,Machine hasSensor Sensor. Relationships are first-class queryable edges, not denormalized foreign keys. - Semantic Annotations (v3): DTDL v3 lets you tag telemetry and properties with URIs from the IoT ontology (e.g.,
iot:Temperature,iot:Acceleration). This enables cross-domain interop—a tool designed for anyiot:Temperaturecan consume your telemetry without schema negotiation.
Why this design? Properties are mutable state (queries reference them). Telemetry is immutable history (logged to time-series storage). Commands enable bidirectional control. Relationships keep the graph navigable without denormalization. Semantic annotations let heterogeneous systems plug together.
Architecture Layer 3: Factory Ontology Graph
Now instantiate the schema. Below is a concrete three-level hierarchy: one factory, multiple lines, machines per line, sensors per machine.
Setup: The ontology graph shows how instances relate. Each node is a twin; each edge is a relationship. Queries traverse this graph.

Walkthrough:
- Factory node: Top of the hierarchy; aggregates all production lines. A real enterprise might have multiple factories, each a separate top-level twin.
- ProductionLine nodes: Aggregate machines. Each line has a throughput property (units per hour). Lines are the unit of production scheduling.
- Machine nodes: The workhorses. Each has a
make(manufacturer),model, andcycleTime(seconds per unit). Machines emit telemetry and are targets for commands. - Sensor nodes: Leaf nodes. Each sensor is typed (temperature, pressure, vibration) and has a
lastReadingproperty that gets patched every time a telemetry message arrives.
Why this structure? The hierarchy mirrors physical topology. Queries like “all sensors in factory FAC-001” become traversals: Factory → hasLine → hasMachine → hasSensor. No denormalization, no redundant data; the graph is the source of truth. If you add a new line, you create one LineA twin and one hasLine relationship; you don’t duplicate factory metadata.
Architecture Layer 4: Telemetry Ingestion Pipeline
Telemetry arrives at IoT Hub and must be mapped to twins. Below is the per-message transformation flow.
Setup: Each telemetry message from a physical sensor becomes a JSON Patch that ADT applies atomically.

Walkthrough:
- IoT Message: Arrives with a device ID (e.g.,
TEMP-A01), sensor reading (72.3), and timestamp. The device ID is the correlation key that maps the physical sensor to its digital twin. - Parse & Route: The Function deserializes the JSON and extracts the device ID. A lookup (database, config, or hardcoded mapping) resolves the device ID to a twin ID.
- JSON Patch: RFC 6902 defines a standardized patch format. For a sensor telemetry, a single
replaceoperation updateslastReading. For complex payloads (a machine emitting multiple metrics), the patch can have many operations. - REST PATCH: Sent to
https://{adt-host}/digitaltwins/{twin-id}?api-version=2023-06-30with the patch in the body. ADT applies it atomically. - Twin Updated: The twin’s property is patched in ADT’s store. The update is near-instantaneous (< 1 second typically).
- Event Grid: ADT publishes a
twin.property.changeevent to Event Grid. Subscribers receive it within milliseconds. This enables downstream reactions (dashboards refresh, alerts fire, analytics record the event).
Why this design? Decoupling parsing from patching (via the Function) lets you handle different message formats (MQTT JSON, OPC-UA, Modbus) without changing ADT. The JSON Patch format is standard and atomic. Event Grid fanout avoids tight coupling between ADT and consumers.
Terminology Grounding: First-Principles Design Decisions
Before code, understand why DTDL and ADT are shaped the way they are.
Why does DTDL extend JSON-LD?
JSON-LD is a W3C standard for embedding linked data in JSON. It uses @context to define URIs for terms, @id for globally unique identifiers, and @type for semantic type. DTDL extends JSON-LD because (1) semantics are standardized and reusable across domains, (2) tools built for JSON-LD can parse DTDL, and (3) you can reference external ontologies (e.g., the IoT ontology for iot:Temperature). The alternative—inventing a proprietary schema format—would lock you in; JSON-LD keeps you plugged into the open ecosystem.
Why does ADT separate schema (interface) from instances (twins)?
In a relational database, the schema and rows are coupled: you define a table, then insert rows. Changing the schema (adding a column) requires a migration; all rows must conform. ADT decouples them: you define an interface once, then create unlimited instances. Updating the interface does not auto-migrate existing twins; old twins stay with the old schema. This trades consistency for agility—you can evolve the schema without breaking running twins. Drawback: you must manage versioning (Interface;1, Interface;2). Benefit: zero downtime schema changes.
Why are relationships first-class objects?
In SQL, relationships are encoded as foreign keys: a column in the child table points to a row in the parent table. This is efficient for ACID transactions but poor for graph traversal. ADT treats relationships as first-class objects with their own identities and properties. A relationship can have a creationDate or installationLocation. Queries can traverse relationships directly (JOIN Machine RELATED Factory.hasLine.hasMachine). This is slower than an indexed foreign key for single lookups but vastly faster for graph traversals and enables relationship-aware logic (e.g., “find all machines installed in 2024”).
Why is Event Grid the output, not direct API calls?
If ADT published twin updates directly to dashboards, analytics pipelines, and alert systems, you would couple ADT to every consumer. Event Grid decouples: ADT publishes one event; any number of subscribers can react asynchronously. A new consumer doesn’t require ADT changes. Drawback: eventual consistency (1–5 second lag between twin update and consumer reaction). Benefit: scalability and agility.
Deep Dive: Modeling a Wind Turbine with DTDL v3
Let’s model a real asset: a wind turbine with mechanical, electrical, and control subsystems. This introduces components (optional but powerful), semantic annotations, and telemetry with complex schemas.
Use case: A utility operates a fleet of 1000 turbines. Each turbine needs to report rotor speed, power output, and gearbox temperature. The DTDL must capture the hierarchy: Turbine contains Rotor, Generator, Gearbox, and ControlSystem. Each subsystem streams telemetry.
Rotor Interface (leaf component):
{
"@context": "dtmi:dtdl:context;3",
"@id": "dtmi:com:example:wind:Rotor;1",
"@type": "Interface",
"displayName": "Rotor",
"description": "Wind turbine rotor, captures blade RPM and pitch angle.",
"contents": [
{
"@type": "Property",
"@id": "dtmi:com:example:wind:Rotor:numberOfBlades;1",
"name": "numberOfBlades",
"schema": "integer",
"writable": false
},
{
"@type": "Telemetry",
"@id": "dtmi:com:example:wind:Rotor:rotorSpeed;1",
"name": "rotorSpeed",
"schema": "double",
"description": "Rotor speed in RPM.",
"semanticType": "dtmi:standard:schema:telemetry:Frequency;1"
},
{
"@type": "Telemetry",
"@id": "dtmi:com:example:wind:Rotor:bladePitch;1",
"name": "bladePitch",
"schema": "double",
"description": "Blade pitch angle in degrees.",
"semanticType": "dtmi:standard:schema:telemetry:Angle;1"
}
]
}
Generator Interface:
{
"@context": "dtmi:dtdl:context;3",
"@id": "dtmi:com:example:wind:Generator;1",
"@type": "Interface",
"displayName": "Generator",
"description": "Electrical generator subsystem.",
"contents": [
{
"@type": "Property",
"@id": "dtmi:com:example:wind:Generator:ratedCapacity;1",
"name": "ratedCapacity",
"schema": "double",
"description": "Rated power output in MW."
},
{
"@type": "Telemetry",
"@id": "dtmi:com:example:wind:Generator:activePower;1",
"name": "activePower",
"schema": "double",
"description": "Active power in MW.",
"semanticType": "dtmi:standard:schema:telemetry:Power;1"
},
{
"@type": "Telemetry",
"@id": "dtmi:com:example:wind:Generator:frequency;1",
"name": "frequency",
"schema": "double",
"description": "Grid frequency in Hz.",
"semanticType": "dtmi:standard:schema:telemetry:Frequency;1"
}
]
}
Gearbox Interface:
{
"@context": "dtmi:dtdl:context;3",
"@id": "dtmi:com:example:wind:Gearbox;1",
"@type": "Interface",
"displayName": "Gearbox",
"description": "Gearbox subsystem with thermal monitoring.",
"contents": [
{
"@type": "Property",
"@id": "dtmi:com:example:wind:Gearbox:gearRatio;1",
"name": "gearRatio",
"schema": "double",
"writable": false
},
{
"@type": "Telemetry",
"@id": "dtmi:com:example:wind:Gearbox:oilTemperature;1",
"name": "oilTemperature",
"schema": "double",
"description": "Gearbox oil temperature in Celsius.",
"semanticType": "dtmi:standard:schema:telemetry:Temperature;1"
},
{
"@type": "Telemetry",
"@id": "dtmi:com:example:wind:Gearbox:vibration;1",
"name": "vibration",
"schema": "double",
"description": "Vibration level in mm/s (ISO 20816).",
"semanticType": "dtmi:standard:schema:telemetry:Acceleration;1"
}
]
}
Turbine Interface (composite, contains components):
{
"@context": "dtmi:dtdl:context;3",
"@id": "dtmi:com:example:wind:Turbine;1",
"@type": "Interface",
"displayName": "Wind Turbine",
"description": "Complete wind turbine with subsystems.",
"contents": [
{
"@type": "Property",
"@id": "dtmi:com:example:wind:Turbine:turbineId;1",
"name": "turbineId",
"schema": "string"
},
{
"@type": "Property",
"@id": "dtmi:com:example:wind:Turbine:location;1",
"name": "location",
"schema": "string",
"description": "GPS coordinates or site name."
},
{
"@type": "Property",
"@id": "dtmi:com:example:wind:Turbine:operationalStatus;1",
"name": "operationalStatus",
"schema": {
"@type": "Enum",
"valueSchema": "string",
"enumValues": [
{"name": "generating", "displayName": "Generating"},
{"name": "stopped", "displayName": "Stopped"},
{"name": "maintenance", "displayName": "Maintenance"},
{"name": "fault", "displayName": "Fault"}
]
}
},
{
"@type": "Component",
"@id": "dtmi:com:example:wind:Turbine:rotor;1",
"name": "rotor",
"schema": "dtmi:com:example:wind:Rotor;1"
},
{
"@type": "Component",
"@id": "dtmi:com:example:wind:Turbine:generator;1",
"name": "generator",
"schema": "dtmi:com:example:wind:Generator;1"
},
{
"@type": "Component",
"@id": "dtmi:com:example:wind:Turbine:gearbox;1",
"name": "gearbox",
"schema": "dtmi:com:example:wind:Gearbox;1"
},
{
"@type": "Relationship",
"@id": "dtmi:com:example:wind:Turbine:partOf;1",
"name": "partOf",
"target": "dtmi:com:example:wind:WindFarm;1",
"description": "This turbine is part of a wind farm."
}
]
}
Why components? Components let you embed sub-interfaces without creating separate twins. A Turbine twin can have a nested rotor component with its own properties and telemetry. When you query the turbine and request rotor.*, you get the rotor’s telemetry. Components avoid twin explosion (creating a separate twin for every subsystem) while preserving structure. Downside: you can’t query components independently; they’re always accessed through their parent twin.
Why semantic types? The semanticType URI (e.g., dtmi:standard:schema:telemetry:Temperature;1) tells downstream tools what the value means. A dashboard built to consume any Temperature semantic type can auto-render a rotor.rotorSpeed (even though rotor speed is not temperature) as long as it’s tagged with the right semantic type. This enables tool reuse across projects.
Deployment: From Schema to Running Twins
Now deploy the factory schema to Azure and create live instances.
Step 1: Create the Azure Digital Twins instance
# Set variables
RESOURCE_GROUP="rg-factory-twins"
ADT_INSTANCE="factory-twins-$(date +%s)"
LOCATION="eastus"
# Create resource group
az group create --name $RESOURCE_GROUP --location $LOCATION
# Create ADT instance
az dt create \
--dt-name $ADT_INSTANCE \
--resource-group $RESOURCE_GROUP \
--location $LOCATION
# Verify creation and capture host
ADT_HOST=$(az dt show \
--dt-name $ADT_INSTANCE \
--resource-group $RESOURCE_GROUP \
--query "hostName" -o tsv)
echo "ADT Instance: $ADT_INSTANCE"
echo "ADT Host: $ADT_HOST"
Step 2: Upload DTDL models (dependency order matters)
Save each interface into a separate file: factory.json, productionline.json, machine.json, sensor.json. Upload in leaf-to-root order (dependencies first):
# Upload models
az dt model create \
--dt-name $ADT_INSTANCE \
--resource-group $RESOURCE_GROUP \
--models \
sensor.json \
machine.json \
productionline.json \
factory.json
# List uploaded models
az dt model list \
--dt-name $ADT_INSTANCE \
--resource-group $RESOURCE_GROUP \
--output table
Step 3: Create twin instances via REST API
Use az rest to call the ADT API directly:
# Create the factory twin
az rest \
--method PUT \
--url "https://${ADT_HOST}/digitaltwins/factory-shanghai-01?api-version=2023-06-30" \
--resource "https://digitaltwins.azure.net" \
--body '{
"$metadata": { "$model": "dtmi:com:example:factory:Factory;1" },
"factoryId": "FAC-001",
"location": "Shanghai Plant A",
"operationalStatus": "running"
}'
# Create production line twin
az rest \
--method PUT \
--url "https://${ADT_HOST}/digitaltwins/line-a?api-version=2023-06-30" \
--resource "https://digitaltwins.azure.net" \
--body '{
"$metadata": { "$model": "dtmi:com:example:factory:ProductionLine;1" },
"lineId": "LINE-A",
"throughput": 120.0
}'
# Create machine twin
az rest \
--method PUT \
--url "https://${ADT_HOST}/digitaltwins/machine-a01?api-version=2023-06-30" \
--resource "https://digitaltwins.azure.net" \
--body '{
"$metadata": { "$model": "dtmi:com:example:factory:Machine;1" },
"machineId": "MACH-A01",
"make": "Siemens",
"model": "WM500",
"cycleTime": 8.5
}'
# Create sensor twin
az rest \
--method PUT \
--url "https://${ADT_HOST}/digitaltwins/sensor-temp-a01?api-version=2023-06-30" \
--resource "https://digitaltwins.azure.net" \
--body '{
"$metadata": { "$model": "dtmi:com:example:factory:Sensor;1" },
"sensorId": "TEMP-A01",
"sensorType": "temperature",
"lastReading": 0.0
}'
# Create relationships (edges)
az rest \
--method PUT \
--url "https://${ADT_HOST}/digitaltwins/factory-shanghai-01/relationships/factory-to-line-a?api-version=2023-06-30" \
--resource "https://digitaltwins.azure.net" \
--body '{
"$relationshipName": "hasLine",
"$targetId": "line-a"
}'
az rest \
--method PUT \
--url "https://${ADT_HOST}/digitaltwins/line-a/relationships/line-to-machine-a01?api-version=2023-06-30" \
--resource "https://digitaltwins.azure.net" \
--body '{
"$relationshipName": "hasMachine",
"$targetId": "machine-a01"
}'
az rest \
--method PUT \
--url "https://${ADT_HOST}/digitaltwins/machine-a01/relationships/machine-to-sensor-temp-a01?api-version=2023-06-30" \
--resource "https://digitaltwins.azure.net" \
--body '{
"$relationshipName": "hasSensor",
"$targetId": "sensor-temp-a01"
}'
Step 4: Wire telemetry ingestion with Azure Function
Create a Python Azure Function that listens to IoT Hub and patches twins:
# function_app.py
import azure.functions as func
import json
import logging
from azure.identity import DefaultAzureCredential
from azure.digitaltwins.core import DigitalTwinsClient
from datetime import datetime
# Initialize ADT client
credential = DefaultAzureCredential()
adt_client = DigitalTwinsClient(
url="https://{ADT_INSTANCE}.api.{LOCATION}.digitaltwins.azure.net",
credential=credential
)
app = func.FunctionApp()
@app.event_hub_message_trigger(
arg_name="events",
connection="EventHubConnection",
consumer_group="$Default"
)
def ingest_telemetry(events: func.EventHubEvent):
"""
Ingests IoT telemetry and patches digital twins.
Expects payload: {"device_id": "sensor-temp-a01", "temperature": 72.3, "timestamp": "..."}
"""
for event in events:
try:
# Parse message
body = json.loads(event.get_body().decode('utf-8'))
device_id = body.get('device_id')
temperature = body.get('temperature')
timestamp = body.get('timestamp', datetime.utcnow().isoformat())
if not device_id:
logging.warning("Message missing device_id")
continue
# Build JSON Patch (RFC 6902)
patch = [
{
"op": "replace",
"path": "/lastReading",
"value": temperature
},
{
"op": "replace",
"path": "/lastReadingTimestamp",
"value": timestamp
}
]
# Patch the twin
adt_client.update_digital_twin(device_id, patch)
logging.info(f"Patched {device_id}: temperature={temperature}°C at {timestamp}")
except json.JSONDecodeError as e:
logging.error(f"JSON decode error: {e}")
except Exception as e:
logging.error(f"Error processing telemetry: {e}")
Deploy the function:
# Create function app
FUNC_APP_NAME="factory-telemetry-ingress"
az functionapp create \
--resource-group $RESOURCE_GROUP \
--consumption-plan-location $LOCATION \
--runtime python \
--runtime-version 3.11 \
--functions-version 4 \
--name $FUNC_APP_NAME
# Configure Event Hub connection in app settings
# The connection string comes from IoT Hub's built-in Event Hub endpoint
EVENT_HUB_CONNECTION=$(az iot hub connection-string show \
--hub-name $IoT_HUB \
--query "cs" -o tsv | sed 's/;SharedAccessKeyName=owner;/;SharedAccessKeyName=$Default;/')
az functionapp config appsettings set \
--name $FUNC_APP_NAME \
--resource-group $RESOURCE_GROUP \
--settings "EventHubConnection=$EVENT_HUB_CONNECTION"
Step 5: Set up IoT Hub
IoT_HUB="iot-factory-hub"
DEVICE_ID="sensor-temp-a01"
# Create IoT Hub
az iot hub create \
--resource-group $RESOURCE_GROUP \
--name $IoT_HUB \
--sku S1 --location $LOCATION
# Register a test device
az iot hub device-identity create \
--hub-name $IoT_HUB \
--device-id $DEVICE_ID
# Get device connection string
DEVICE_CONN=$(az iot hub device-identity connection-string show \
--hub-name $IoT_HUB \
--device-id $DEVICE_ID \
--query "connectionString" -o tsv)
echo "Device connection string: $DEVICE_CONN"
Test telemetry flow:
# Simulate a sensor sending temperature readings
az iot device c2d-message send \
--hub-name $IoT_HUB \
--device-id $DEVICE_ID \
--data '{"device_id": "sensor-temp-a01", "temperature": 72.3, "timestamp": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"}'
# Wait 2 seconds for processing, then query the twin
sleep 2
az rest \
--method GET \
--url "https://${ADT_HOST}/digitaltwins/sensor-temp-a01?api-version=2023-06-30" \
--resource "https://digitaltwins.azure.net" \
--output json
The response should show "lastReading": 72.3.
Querying the Twin Graph
Azure Digital Twins exposes a SQL-like query language. Below are essential patterns and their performance characteristics.
Architecture Layer 5: Query & Traversal Patterns

Simple property lookup (response time: 50–200ms):
SELECT T FROM DIGITALTWINS T
WHERE T.factoryId = 'FAC-001'
Traverse one level of relationships (response time: 100–300ms):
SELECT Line FROM DIGITALTWINS Factory
JOIN Line RELATED Factory.hasLine
WHERE Factory.factoryId = 'FAC-001'
Deep traversal: find all sensors in a factory (response time: 200–500ms):
SELECT Sensor, Sensor.lastReading, Sensor.sensorType
FROM DIGITALTWINS Factory
JOIN Line RELATED Factory.hasLine
JOIN Machine RELATED Line.hasMachine
JOIN Sensor RELATED Machine.hasSensor
WHERE Factory.factoryId = 'FAC-001'
Filter on component properties (v3 feature):
SELECT Turbine
FROM DIGITALTWINS Turbine
WHERE Turbine.rotor.rotorSpeed > 15.0
AND Turbine.operationalStatus = 'generating'
Count twins by type:
SELECT COUNT(*) as sensor_count
FROM DIGITALTWINS T
WHERE T.$metadata.$model = 'dtmi:com:example:factory:Sensor;1'
Pagination (handle large result sets):
SELECT Sensor
FROM DIGITALTWINS Factory
JOIN Line RELATED Factory.hasLine
JOIN Machine RELATED Line.hasMachine
JOIN Sensor RELATED Machine.hasSensor
WHERE Factory.factoryId = 'FAC-001'
ORDER BY Sensor.sensorId
OFFSET 0 LIMIT 100
Execute a query programmatically (Python):
from azure.digitaltwins.core import DigitalTwinsClient
from azure.identity import DefaultAzureCredential
credential = DefaultAzureCredential()
client = DigitalTwinsClient(url=adt_host_url, credential=credential)
query = """
SELECT Sensor, Sensor.lastReading, Sensor.sensorType
FROM DIGITALTWINS Factory
JOIN Line RELATED Factory.hasLine
JOIN Machine RELATED Line.hasMachine
JOIN Sensor RELATED Machine.hasSensor
WHERE Factory.factoryId = 'FAC-001'
AND Sensor.sensorType = 'temperature'
ORDER BY Sensor.lastReading DESC
"""
try:
results = client.query_twins(query)
for result in results:
print(json.dumps(result, indent=2))
except Exception as e:
print(f"Query error: {e}")
Performance characteristics:
| Query Type | Response Time | Cardinality | Notes |
|---|---|---|---|
| Simple WHERE (indexed) | 50–100ms | 1 | Scales to millions of twins |
| Single JOIN | 100–200ms | ~100 | Relationship traversal overhead |
| Multi-level JOIN (3+) | 200–500ms | ~1k | Query planner optimizes traversal order |
| Deep traversal + filter | 500–2000ms | ~10k | Avoid on every read; cache results |
| COUNT aggregation | 100–1000ms | N/A | Full scan; avoid in hot loops |
Bottlenecks and coping strategies:
- Query timeout (>60s): Break complex queries into batches. Query “all machines in line A” separately, then “all sensors per machine.”
- Memory explosion: Queries that return millions of twins can OOM. Use
OFFSET/LIMITfor pagination; stream results in batches. - Stale results: Queries lag 1–5 seconds behind real-time patches. Don’t use query results for immediate safety decisions; subscribe to Event Grid for real-time reactions.
Edge Cases, Failure Modes, and Production Readiness
Rate limits and throttling:
ADT enforces a 100 requests/second quota per instance and 10,000 relationships per twin. Under-estimate your telemetry volume initially; overages are expensive and slow.
- Mitigation: Batch telemetry. If 100 sensors emit every 1 second, that’s 100 requests/second. Batch them into 10 Function invocations of 10 sensors each, each Function makes one multi-patch call. Azure Function can batch 100 messages in < 100ms.
Telemetry lag and eventual consistency:
Patches are applied in < 1 second, but queries may lag 1–5 seconds under load. A sensor emits 72.3°C, the twin is patched, but a query 100ms later may still show the old value.
- Mitigation: For real-time alerts, subscribe to Event Grid
twin.property.changeevents, not queries. Queries are eventually consistent; events are near-real-time.
Twin model version mismatch:
If you deploy a new model version (Sensor;2 with new properties), existing twins are still Sensor;1. Queries that filter on Sensor;2 properties fail silently.
- Mitigation: Version model IDs. Keep the old model available for 30+ days. Migrate twins gradually: create new
Sensor;2twins, decommissionSensor;1twins on a schedule. Use deployment pipelines to manage this.
DTDL schema validation failures:
If a telemetry payload doesn’t match the schema (e.g., sending a string instead of a double), the Function silently fails (or fails with a vague HTTP 400).
- Mitigation: Add strict validation in the Function before calling ADT. Log schema mismatches to Application Insights. Set up alerts on Function errors.
Cold starts in serverless functions:
Azure Functions on the Consumption plan take 30–60 seconds to initialize after idle time. The first telemetry message is slow.
- Mitigation: For latency-sensitive workloads, use a Premium or Dedicated plan (5x cost). Or schedule a “keep-warm” invocation every 5 minutes.
Graph bloat and query performance:
A fleet of 1 million sensors creates 1 million twins. A deep traversal (factory → line → machine → sensor) touches millions of edges. Query time balloons.
- Mitigation: Keep the graph shallow (3–4 levels). Use aggregate twins: instead of a twin per sensor, create a twin per machine with an embedded sensor array in a property. Query performance is sub-second; storage cost drops 10x.
Cost surprises:
ADT metering is per operation: queries cost, relationship traversals cost, ingestion costs. A 100-twin graph with 10k events/hour and 10 queries/hour costs ~$5/month. A 100k-twin graph with 1M events/hour and 1k queries/hour costs ~$300/month.
- Mitigation: Estimate cost before production. Monitor consumption in the Azure portal. Set up budget alerts.
Real-World Implications: Beyond the Toy Example
Digital twins are not just reporting dashboards. A twin is a runtime entity that can:
- Receive commands: A twin can expose a command, e.g.,
setTemperatureTarget(newTarget). The command handler (a Function listening to ADT command events) relays the command to the physical device. - Emit analytics: Downstream systems (Stream Analytics, Databricks) subscribe to Event Grid and compute aggregations (average temperature per machine per hour) and anomalies (vibration > 5Hz triggers an alert).
- Drive control loops: A feedback system reads twin state, computes a correction, and sends it back to the physical device. E.g., “gearbox temp is 95°C; reduce turbine output 10% for the next 5 minutes.”
- Federate across domains: Multiple systems (MES, ERP, SCADA) can reference the same twin graph as a shared source of truth. No more data silos.
Integration with AI/LLM systems:
LLMs can query ADT twins to understand asset topology before making recommendations. Example: “The gearbox temperature is 105°C. Retrieve the gearbox maintenance history and predict remaining useful life.” An LLM plugin calls the ADT query API, retrieves the gearbox twin and related maintenance logs, and grounds the analysis in real data.
Data residency and compliance:
ADT instances are regional. If you operate globally, you deploy multiple ADT instances (US, EU, APAC) and sync critical twins via API. Each region has its own Event Grid and storage. This satisfies data residency requirements (GDPR, etc.) but requires app logic to handle eventual consistency across regions.
FAQ
Q: DTDL v3 vs. v2—should I migrate?
Migrate if you need semantic annotations or component support. v3 is backward-compatible for most schemas. Start new projects in v3. Migration is low-risk; old v2 twins continue to work alongside v3 twins.
Q: Can I use Cosmos DB instead of ADT?
You can store the twin graph in Cosmos DB as JSON documents and build relationship queries in code. Downside: no native relationship traversal, no Event Grid integration, higher latency for graph queries. ADT is purpose-built for this. Use Cosmos if you need ACID transactions or complex joins across unrelated domains.
Q: How do I sync ADT with a physical PLC or SCADA system?
Add an Azure Function that reads from OPC-UA (industrial protocol) on an interval (every 5 seconds) and pushes patches to ADT. Another Function subscribes to ADT command events and writes command outputs back to the PLC via OPC-UA. Latency is ~100–500ms round-trip depending on network and PLC response time.
Q: What if I need sub-second real-time analytics?
Use Event Grid to fan telemetry to Azure Stream Analytics or Apache Kafka alongside ADT. Stream Analytics computes windowed aggregations (average, max) in sub-second latency; ADT maintains the entity graph and metadata. They complement each other.
Q: How do I version twin models in production?
Create versioned model IDs: dtmi:com:example:factory:Machine;1, Machine;2, etc. When you release a new schema, you deploy the new model but leave the old one active. New twins use the new model. Old twins continue on the old model. On a schedule (e.g., 90 days), decommission old twins. Keep model definitions in version control for audit trails.
Q: How do I handle stale twins (assets decommissioned)?
Mark twins with a decommissionedDate property. Queries filter them out: WHERE Sensor.decommissionedDate IS NULL. After 30 days, delete them from ADT. Event Grid publishes a twin.deleted event; downstream systems can react (remove from dashboards, etc.).
Further Reading and Related Topics
- Azure Digital Twins Documentation — Official reference; covers all APIs, query language, and deployment patterns.
- DTDL Specification (GitHub) — DTDL language reference; includes v3 additions and semantic annotations.
- Digital Twins Query Language Reference — SQL-like query syntax, performance tuning, pagination.
- IoT Ontology (W3C) — Standard URIs for IoT semantic types (iot:Temperature, iot:Humidity, etc.).
- See also: Unified Namespace Architecture for Industrial IoT — Complements ADT with MQTT pub/sub for brownfield PLC networks.
- See also: OPC-UA vs. MQTT Sparkplug B: Protocol Trade-Offs — Protocol-level integration options for PLC data.
- See also: Event Grid Patterns for IoT — Real-time event fanout and processing; pairs with ADT for downstream reactions.
Conclusion
You now have the conceptual foundation (layers 1–5), the first-principles reasoning (why DTDL, why ADT’s design), the code to deploy and query twins, and the gotchas to avoid. A digital twin is not a static model—it’s a runtime entity that stays synchronized with physical state, can be queried and commanded, and feeds analytics, control loops, and AI systems.
Start with a small pilot (one factory, 10–100 twins) to validate your telemetry ingestion and query patterns. Monitor costs and performance under realistic load. Scale to thousands of twins only after you have a repeatable deployment and operations playbook.
Ship it this quarter.
