Azure Time Series Databases: Data Explorer & Cosmos DB Architecture (2026)

Azure Time Series Insights (TSI) is gone. For good. After a deprecation notice in 2023, Microsoft officially retired Azure TSI on March 31, 2025—and with it went one of Azure’s most opinionated time-series platforms. If your IoT pipeline depended on Gen2 environments, the forced migration is now complete, and your options have splintered across three very different Microsoft products: Azure Data Explorer (ADX), Cosmos DB with intelligent sharding, and Real-Time Intelligence in Microsoft Fabric.

This is not a seamless story. Each platform optimizes for different workload signatures, pricing models, and operational complexity tiers. This post decodes the post-TSI landscape and provides a working reference architecture to help you pick the right Azure time-series database for 2026 IoT analytics—whether that’s ingesting 10,000 events per second from industrial sensors, querying multi-region device state with sub-millisecond latency, or running unified analytics over months of compressed telemetry.

We’ll focus on Azure time series databases in production, covering the exact KQL queries you’ll use in ADX, the sharding patterns that make Cosmos work for time-series, and how Fabric’s new Eventhouse pricing and topology differ from the standalone clusters you may have run.

What this post covers:

The three-platform successor strategy to Azure TSI and why Microsoft split the market.
A reference architecture for IoT telemetry ingestion → storage → query.
Deep dives into ADX (Kusto) column stores, materialized views, and time-series–specific operators.
Cosmos DB patterns for low-latency multi-region time-series reads.
Fabric Real-Time Intelligence Eventhouse pricing and migration paths.
Trade-offs and failure modes in production.
Sizing guidance and a decision tree to pick your platform.

The Post-Time-Series-Insights Landscape

Azure Time Series Insights launched in 2016 as a fully managed, schema-less time-series query engine. Gen2 environments could ingest millions of events per second, and the JavaScript SDK hid complex ingestion details behind a simple REST API. It was opinionated: the service owned your schema, your retention, your query language. It was simple to use—until the bill arrived or you needed a JOIN to Cosmos data.

Microsoft deprecated TSI in March 2023 and shut down all Gen2 environments on March 31, 2025. The announcement marked a strategic pivot: instead of owning a single time-series silo, Microsoft folded time-series workloads into three existing platforms, each with its own strengths and pricing model.

The Three-Pronged Successor Strategy

1. Azure Data Explorer (ADX / Kusto)

ADX is the heavyweight: a distributed, column-oriented database optimized for analytics workloads. Born as an internal Microsoft tool (Project Kusto), it went GA in 2019 and underpins much of Azure’s own observability (Application Insights, Log Analytics, Sentinel). For time-series, ADX shines because:

It ingests at massive scale (100k–1M+ events/sec per cluster) at <$0.50 per GB stored.
Kusto Query Language (KQL) includes native time-series operators: make-series, series_decompose, series_outliers, and statistical functions built for windowed aggregations.
Materialized views pre-compute rolling aggregations, eliminating the need to scan raw data for common queries.
Hot/cold cache policies let you keep recent days in-memory (SSD) and archive older data cheaply.
Multi-region failover is built in; a single Kusto cluster can replicate to standby clusters in other regions.

The trade-off: query latency for cold data can reach 1–2 seconds; there’s a KQL learning curve (it’s not SQL); and cluster management isn’t zero-touch.

2. Cosmos DB with Time-Series Optimizations

Cosmos DB is Azure’s multi-model NoSQL database, and it’s not designed for time-series per se. But in 2024, Microsoft added official time-series SDKs and partitioning strategies, and some teams find Cosmos irresistible because:

You get <10 ms point reads and writes globally, with automatic multi-region replication.
The change feed is a built-in event stream; you can attach an Azure Function to roll up summaries in real time.
If your queries are primarily “get the last 5 minutes of Device X” rather than “scan 6 months of all devices,” Cosmos costs far less than ADX.

The catch: Cosmos charges per request unit (RU), and a time-window scan (e.g., SELECT * WHERE deviceId = 'X' AND timestamp BETWEEN yesterday AND today) can burn thousands of RUs. You need to shard carefully by device + time bucket. Cold-tier reads are effectively impossible; Cosmos is a hot-data store.

3. Real-Time Intelligence in Microsoft Fabric

Fabric (launched publicly in Nov 2024) is Microsoft’s unified analytics platform, and Real-Time Intelligence is the time-series pillar. It bundles:

Eventhouse: a KQL database (same engine as ADX, different hosting).
Eventstream: managed streaming with connectors to Event Hub, IoT Hub, and custom apps.
KQL Querysets and Real-Time Dashboards for ad-hoc exploration.
Tight integration with Power BI and Copilot.

Fabric pricing is capacity-based (F2–F2048 SKUs) and can be cheaper than ADX clusters if you’re willing to share compute with BI workloads. Migration from standalone ADX to Fabric Eventhouse is one-way (you move the cluster; you don’t rebuild), and Fabric manages scaling and patching for you.

Why the TSI Shutdown Forced a Choice

The short answer: TSI was a feature, not a platform. It couldn’t grow beyond its original scope (pure time-series with no joins, no long-term storage, no BI integration). ADX, Cosmos, and Fabric are all platforms—they can do time-series as one workload among many. Microsoft’s bet is that you’ll use time-series as part of a larger cloud analytics footprint, so bundling it into general-purpose platforms saves R&D and licenses.

Reference Architecture for Azure IoT Time Series

Here’s the canonical flow for an IoT telemetry pipeline on Azure:

Data flow:

IoT Devices (sensors, gateways, edge nodes) emit telemetry: temperature, pressure, vibration, error codes.
Azure IoT Hub (or Event Hubs) buffers and routes messages at scale (millions/sec).
Stream Analytics or Event Grid enriches, filters, or correlates events in flight. (Stream Analytics is heavier but supports windowed aggregations; Event Grid is lighter and integrates with serverless.)
Azure Data Explorer cluster (or Eventhouse in Fabric) ingests enriched events. Data lands in a raw table (e.g., telemetry_raw) and flows through update policies to roll up into materialized aggregates (telemetry_1min_summary, telemetry_hourly).
Query layer: Power BI connected to ADX for dashboards, Grafana via the KQL connector, or custom apps calling the KQL REST API.
Cold tier: Older data (>30–90 days) moves to Azure Data Lake Storage Gen2 via an external table, accessed on-demand for forensic queries or model training.

Why this architecture:

Separation of concerns: ingestion (IoT Hub) → processing (Stream Analytics) → storage (ADX) → consumption (Power BI/Grafana).
Scalability: each layer scales independently. IoT Hub can handle 1M msgs/sec; ADX can ingest them in parallel batches; Power BI can query down to the summary layer.
Cost: raw event data is stored once in ADX at ~$0.40/GB/month; summaries use materialized views so you never re-scan raw data. Cold data in ADLS costs ~$0.02/GB/month—a 20x savings.
Compliance: ADLS external tables are audit-logged separately; ADX access controls map to Azure RBAC.

Azure Data Explorer (Kusto) Deep Dive

Azure Data Explorer is the Swiss Army knife for time-series on Azure. Its column-store architecture (borrowed from Vertica and DuckDB) makes it extraordinarily fast for range scans.

Cluster Topology and Hot/Cold Cache

An ADX cluster is divided into ingestion nodes and query nodes. Ingestion nodes accept data from Event Hub, IoT Hub, or direct REST pushes; they batch incoming events (default 1 MB or 1 min) and write to the distributed storage layer. Query nodes serve read requests and have a hot cache (recent SSD data, kept in memory) and persistent storage (older data on Azure Storage blobs).

By default, the last 30 days of data live in hot cache; queries on recent dates complete in 10–100 ms. Queries that touch persistent storage (data >30 days old) may take 1–2 seconds because the query engine must fetch data from blobs, decompress, and stream it.

Kusto Query Language (KQL) for Time Series

KQL is SQL-adjacent but not SQL. Time-series queries are where KQL shines. Here are the essentials:

Basic aggregation by time window:

telemetry_raw
| where deviceId == "sensor-42" and Timestamp >= ago(7d)
| summarize AvgTemp = avg(temperature), MaxPressure = max(pressure) 
    by bin(Timestamp, 1h)
| render timechart

This groups all events for sensor-42 in the last 7 days into 1-hour buckets and computes rolling averages.

Time-series decomposition and anomaly detection:

telemetry_raw
| where deviceId == "sensor-42"
| where Timestamp >= ago(30d)
| make-series AvgTemp = avg(temperature) on Timestamp from ago(30d) to now() step 1h
| extend (Baseline, Trend, Residual) = series_decompose(AvgTemp)
| extend Anomalies = series_outliers(Residual, 3)
| render timechart with (title="30-day temperature anomalies for sensor-42")

make-series fills missing time buckets with zeros, so you get a regular grid—essential for decomposition. series_decompose splits the signal into long-term trend, seasonal pattern, and residual noise. series_outliers flags points >3 standard deviations from the trend.

Materialized Views for Continuous Aggregation:

Instead of re-scanning raw data on every query, you can define a materialized view that pre-aggregates as data arrives:

.create-or-alter materialized-view telemetry_1h_summary on table telemetry_raw
{
    telemetry_raw
    | summarize 
        AvgTemp = avg(temperature),
        MaxTemp = max(temperature),
        MinTemp = min(temperature),
        Count = count()
      by deviceId, bin(Timestamp, 1h)
}

When a new batch of events lands in telemetry_raw, the view automatically updates. A query against telemetry_1h_summary runs in <100 ms instead of scanning GB of raw data.

Retention and Caching Policies

You control how long data stays hot and when it gets archived:

.alter-merge table telemetry_raw policy caching '[{"CachingPolicy": {"DataHotSpan": "30d", "IsEnabled": true}}]'

This keeps 30 days in hot cache. Older data stays in persistent storage but can be queried (slower). You can also set a soft retention limit:

.alter table telemetry_raw policy retention softrule '730d'

This tells ADX to delete data older than 730 days (2 years).

Update Policies for ETL in Place

An update policy is a trigger that fires when new data lands in a source table and automatically updates another table (usually a summary):

.alter table telemetry_1h_summary policy update
@'[{"Source": "telemetry_raw", "Query": "telemetry_raw | summarize AvgTemp = avg(temperature), Count = count() by deviceId, bin(Timestamp, 1h)", "IsEnabled": true, "IsTransactional": false}]'

This is how you build a real-time aggregation pipeline inside ADX without external orchestration (Synapse, Databricks, etc.).

Ingestion Performance Tuning

ADX batches incoming events to avoid tiny writes. You control batch timing:

.alter table telemetry_raw policy ingestionbatching @'{"MaximumBatchingTimeoutInSeconds": 30, "MaximumNumberOfItems": 500000, "DataFormatType": "multijson"}'

Larger batch sizes reduce fragmentation and improve query performance but increase latency. For real-time dashboards, you may lower the timeout to 5–10 seconds; for batch reporting, you can extend it to 60 seconds.

Cosmos DB Time-Series Patterns

Cosmos DB is a multi-model NoSQL store: it speaks SQL (Core API), MongoDB, Cassandra, and Gremlin. None of those APIs are optimized for time-series out of the box, but you can make Cosmos work for time-series if you shard intelligently and accept its constraints.

Sharding Strategy: Device + Time Bucket

The key insight: partition by deviceId + time_bucket(hour) instead of just deviceId. A single hour of data for a single device is small (usually <10 MB); a single device’s entire year is massive and will fragment across multiple physical partitions.

Example document:

{
  "id": "sensor-42#2026-04-24T00:00:00Z",
  "pk": "sensor-42#2026-04-24T00",
  "deviceId": "sensor-42",
  "hour": "2026-04-24T00:00:00Z",
  "events": [
    {"timestamp": "2026-04-24T00:00:15Z", "temperature": 23.4, "pressure": 1013.2},
    {"timestamp": "2026-04-24T00:00:30Z", "temperature": 23.5, "pressure": 1013.1},
    ...
  ],
  "summary": {
    "avgTemp": 23.45,
    "maxTemp": 24.1,
    "minTemp": 22.8,
    "eventCount": 3600
  }
}

Partition key is pk = "sensor-42#2026-04-24T00". A query for the last hour of sensor-42 targets a single partition:

SELECT * FROM c 
WHERE c.pk = "sensor-42#2026-04-24T00" 
ORDER BY c.hour DESC

Cost: ~10 RUs. Without time-bucketing (just deviceId), the same query would require scanning 365 days of data and cost thousands of RUs.

Real-Time Aggregation via Change Feed

Cosmos’s change feed is an ordered log of all inserts and updates. You can attach an Azure Function to listen for new events:

def aggregate_events(events):
    for event in events:
        raw_doc = event["body"]
        rollup_id = f"{raw_doc['deviceId']}#daily#{raw_doc['hour'].split('T')[0]}"

        # Increment daily aggregate
        rollup_container.upsert({
            "id": rollup_id,
            "pk": raw_doc["deviceId"],
            "date": raw_doc["hour"].split("T")[0],
            "eventCount": raw_doc["events"].count(),
            "avgTemp": mean([e["temperature"] for e in raw_doc["events"]])
        })

The function reads from the raw events container’s change feed and writes to a rollups container. This is a poor man’s materialized view, but it works and costs only the RUs for the rollup write, not the raw data scan.

When NOT to Use Cosmos for Time Series

Queries that span weeks or months over many devices: Cosmos will burn RUs scanning thousands of partitions.
Analytical queries with JOIN: NoSQL lacks the optimizer that makes ADX fast.
Audit trails and immutability: Cosmos supports soft deletes, but ADX’s multi-tier storage and retention policies are better for compliance.

Cosmos is best for:

Operational time-series: “get the latest reading from device X” (10 RUs).
Multi-region low-latency writes: turn on multi-region writes, and every region writes independently.
Hybrid workloads: you’re already using Cosmos for user profiles or inventory, and you want to append time-series without a new database.

Real-Time Intelligence in Microsoft Fabric

Microsoft Fabric is a unified analytics platform launched in public preview (Nov 2024). Its Real-Time Intelligence component is built on the same Kusto engine as ADX but hosted differently, priced differently, and integrated with Power BI and Copilot.

Architecture: Eventstream → Eventhouse → Real-Time Dashboard

Eventstream is a managed event broker. It connects to:

Azure Event Hubs (your existing pipelines feed here).
Azure IoT Hub (direct device connectivity).
Kafka clusters (if you’re already Kafka-native).
Custom apps (REST ingestion).

An Eventstream is just a routing layer; it doesn’t store data.

Eventhouse is the KQL database. It’s identical to an ADX cluster under the hood—same column store, same KQL, same materialized views. But instead of paying for a dedicated cluster (D11_v2 = ~$2k/month), you pay for Fabric capacity units:

SKU	Cost	vCPU Equiv.	Good For
F2	$315/month	0.5	Dev/test (1–10 GB/day ingestion)
F4	$630/month	1	Small production (10–100 GB/day)
F8	$1,260/month	2	Medium production (100–500 GB/day)
F16	$2,520/month	4	Large production (500 GB–2 TB/day)
F32+	$5,040+/month	8+	Enterprise (>2 TB/day)

Capacity is shared: if you’re also running Power BI reports or Synapse analytics on the same F4 capacity, they all contend for compute. That’s a feature (unified billing) and a bug (blast radius).

KQL Querysets and Real-Time Dashboards

A KQL Queryset is an ad-hoc explorer—you write KQL directly in Fabric and see results instantly. It’s like a Jupyter notebook but for time-series queries.

A Real-Time Dashboard is a Fabric native (Power BI integrated) dashboard that queries an Eventhouse. It refreshes every few seconds and can show live KQL tables, charts, and time-series decompositions.

Migration Path from ADX to Fabric

Create an Eventhouse in Fabric and configure it with the same ingestion settings as your ADX cluster.
Pause your ADX cluster ingestion (but leave it running for backup).
Reroute Eventstream to the Eventhouse.
Export your KQL queries (they’re 100% compatible).
Run historical backfill: use the ADX export tool to bulk-copy old data into Eventhouse (takes hours for TB-scale datasets).
Validate queries on Eventhouse, then decommission the ADX cluster (saves $2k/month).

The migration is not a lift-and-shift: you’re buying capacity, not a cluster. Fabric owns the infrastructure; you own the Eventhouse.

Trade-Offs and Failure Modes

Azure Data Explorer

Pros: Cheapest per-GB for large-scale ingestion, best-in-class KQL, hot/cold tiering, multi-region failover.

Cons:

Cold queries (>30 days old) take 1–2 seconds.
Cluster management overhead: scaling, patching, heartbeat monitoring.
Batch ingestion latency (default 1 min); real-time ingest requires aggressive batch tuning and costs more.
KQL learning curve; SQL teams will struggle with summarize and make-series.

Cosmos DB

Pros: <10 ms point reads, global multi-region replication, change feed automation.

Cons:

RU costs blow up on aggregation queries. A single SELECT * FROM c WHERE c.deviceId = 'X' AND c.timestamp BETWEEN ... AND ... on 30 days of data costs 10k+ RUs (~$5).
No built-in time-series operators (no make-series, no anomaly detection).
Time-bucketing requires application-side logic; no native time-window functions.
Cold tier reads are impractical (same RU cost as hot data, but slow).

Real-Time Intelligence in Fabric

Pros: Unified billing with Power BI, automatic scaling, Copilot integration (coming soon).

Cons:

Shared capacity contention; a Power BI refresh can starve your real-time queries.
Fabric is still evolving; some ADX features (external tables, cluster federation) are missing.
Pricing is opaque: capacity burst costs vary; long-running queries can spike unexpectedly.
No granular access control; Fabric RBAC is workspace-level, not query-level.

Production Recommendations

Pick ADX if:

Ingestion is >1k events/sec across any number of devices.
Queries need time-series operators (decomposition, anomaly detection, forecasting).
You need cold-tier analytics (6–12 month queries for forensics or model training).
Audit and compliance are strict (retention policies, external table audit trails).

Cluster sizing:

D11_v2 (2 cores, 8 GB RAM): Dev/test, <100 GB/day ingestion, <10 concurrent queries.
E2 (2 cores, 4 GB RAM): Starter production, 100–500 GB/day, multi-tenant shared queries.
L8s (8 cores, 32 GB RAM): Production at scale, 500 GB–2 TB/day, analytics workloads with 50+ concurrent users.

Pick Cosmos DB if:

Operational queries dominate (“get the last reading for device X”).
Multi-region active-active writes are non-negotiable.
You’re already using Cosmos for user data, orders, or inventory—add time-series as a secondary workload.

Partition strategy:

pk = deviceId + "#" + timeStringHour(timestamp) (e.g., "sensor-42#2026-04-24T12").
Store raw events in an events container; pre-aggregate to dailySummary and hourlySummary containers via change feed.

Pick Fabric Real-Time Intelligence if:

You’re already on Fabric for Power BI and Synapse.
Unified billing and capacity are attractive (vs. managing separate cluster costs).
Real-time dashboards in Power BI native are your primary consumption.

Capacity guidance:

Start with F4 (1 vCPU equivalent, $630/month) for <100 GB/day.
Move to F8 or F16 as you add concurrent Power BI reports and SQL queries.

Decision Tree

Trade-Offs and Failure Modes (continued)

Cross-Region Consistency in Cosmos

If you enable multi-region writes, Cosmos uses eventual consistency. A write in US-East replicates to Europe-West in ~5 seconds. An immediate query in Europe may see stale data. If your dashboards are regional (US East only), this is fine; if you’re aggregating global data, expect 5–10s lag.

KQL Batch Ingestion Latency

ADX batches events to reduce write fragmentation. With default settings, a single event might sit in memory for 60 seconds before being flushed to storage. If you need true real-time dashboards (sub-second latency), you must:

Lower the ingestionbatching timeout to 5–10 seconds.
Accept that smaller batches = more write fragmentation = slower queries.
Alternatively, use Eventhouse in Fabric with optimized batch settings (usually 10–30 sec).

Power BI on Fabric Capacity Contention

If your Eventhouse and Power BI reports share capacity, a heavy BI refresh can starve your real-time dashboard. Mitigation:

Separate capacity for operational real-time (F4) and BI reporting (F8+).
Query caching in Power BI (1–5 min TTL) to reduce load on Eventhouse.
Materialized views in Eventhouse so Power BI hits pre-aggregates, not raw telemetry.

FAQ

Q: Is Azure Time Series Insights really gone?

A: Yes. Gen2 environments were shut down March 31, 2025. Microsoft announced the deprecation in March 2023, giving a 2-year migration window. If you’re still on TSI, migrate immediately: ADX handles the workload cheaply and adds time-series operators TSI never had.

Q: Do I need ADX or can I use Cosmos for time-series?

A: Cosmos is viable only if your queries are primarily point reads (“latest value for device X”) or you’re bundling time-series with other Cosmos workloads. For analytics queries spanning days/weeks across many devices, ADX is faster and cheaper. Cosmos is a tactical choice, not a strategic one.

Q: What’s the difference between ADX and Real-Time Intelligence in Fabric?

A: They run the same KQL engine. ADX is a dedicated cluster you pay for by compute size; Fabric Real-Time is capacity you share with Power BI and Synapse. Choose ADX if you want isolation and full control; choose Fabric if you’re already paying for Fabric capacity and want unified billing.

Q: How does ADX pricing work?

A: You pay for cluster compute (e.g., D11_v2 = $2,200/month) plus storage ($40–100/GB/month depending on region and tier). For 1 TB of data on a D11_v2 cluster, expect $2.2k compute + $40k storage = $42k/month. It sounds expensive, but that’s for a fully managed, redundant, multi-region cluster with sub-100ms queries. Compare to Snowflake or BigQuery at similar scale.

Q: Can I use Postgres with TimescaleDB on Azure instead?

A: Yes, Azure Database for PostgreSQL with the TimescaleDB extension is a solid alternative. You get hypertables, continuous aggregates, and compression out of the box. TimescaleDB is cheaper than ADX for small datasets (<100 GB) but slower at scale (>1 TB). See TimescaleDB hypertables, chunks, and compression for a deep dive. Postgres doesn’t scale horizontally; ADX does.

Q: What’s the migration path from TSI Gen2?

A: Export your historical data from TSI (via REST API, 100k events/batch); ingest into ADX or Eventhouse in Fabric. Update your ingestion pipeline to point to ADX/Eventhouse instead of TSI. Rewrite queries from TSI REST API calls to KQL. For most teams, this takes 2–4 weeks.

Azure Time Series Databases: Data Explorer & Cosmos DB Architecture (2026)

Azure Time Series Databases: Data Explorer & Cosmos DB Architecture (2026)

The Post-Time-Series-Insights Landscape

The Three-Pronged Successor Strategy

Why the TSI Shutdown Forced a Choice

Reference Architecture for Azure IoT Time Series

Azure Data Explorer (Kusto) Deep Dive

Cluster Topology and Hot/Cold Cache

Kusto Query Language (KQL) for Time Series

Retention and Caching Policies

Update Policies for ETL in Place

Ingestion Performance Tuning

Cosmos DB Time-Series Patterns

Sharding Strategy: Device + Time Bucket

Real-Time Aggregation via Change Feed

When NOT to Use Cosmos for Time Series

Real-Time Intelligence in Microsoft Fabric

Architecture: Eventstream → Eventhouse → Real-Time Dashboard

KQL Querysets and Real-Time Dashboards

Migration Path from ADX to Fabric

Trade-Offs and Failure Modes

Azure Data Explorer

Cosmos DB

Real-Time Intelligence in Fabric

Production Recommendations

Pick ADX if:

Pick Cosmos DB if:

Pick Fabric Real-Time Intelligence if:

Decision Tree

Trade-Offs and Failure Modes (continued)

Cross-Region Consistency in Cosmos

KQL Batch Ingestion Latency

Power BI on Fabric Capacity Contention

FAQ

Further Reading

Related

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories