AWS Time Series Databases: Timestream & Keyspaces Architecture (2026)

AWS Time Series Databases: Timestream & Keyspaces Architecture (2026)

Last updated 2026-04-27 — fully rewritten from earlier 2024 draft. Reflects AWS Timestream rebranding, Timestream for InfluxDB GA, and 2026 production workload patterns.

AWS Time Series Databases: Timestream & Keyspaces Architecture (2026)

Teams building IoT telemetry, observability platforms, and FinOps dashboards on AWS face a confusing menu of AWS time series databases. Should you use Amazon Timestream for LiveAnalytics (serverless, two-tier storage)? Timestream for InfluxDB (managed InfluxDB v3 IOx with Parquet + S3 backend)? Cassandra-shaped Keyspaces for wide-column inserts? Or punt to S3 + Iceberg + Athena for batch analytics?

This post is the 2026 reference architecture for picking the right one. We’ll walk through each service, show you the architecture diagrams, compare cardinality limits, write throughput, query latency, and cost per GB. By the end, you’ll have a decision matrix for IoT telemetry, observability, and FinOps — and a playbook to avoid the most common pitfalls (Timestream throttling, Keyspaces hot partitions, InfluxDB regional constraints).

What this post covers:
– The AWS time-series landscape in 2026 and why AWS deprecated the old Timestream SLA.
– Timestream LiveAnalytics architecture: memory tier + magnetic tier, automatic tiering, SQL interface.
– Timestream for InfluxDB: managed InfluxDB v3 IOx, FDAP stack, Parquet-backed, better cardinality story.
– Keyspaces (serverless Cassandra) for wide-column time series and billions of row lookups.
– S3 + Iceberg + Athena for batch analytics, ML training, and cold storage.
– A decision matrix to route IoT, observability, and FinOps workloads to the right service.
– Common pitfalls and how to avoid them.


The AWS Time-Series Landscape in 2026

In 2024, AWS rebranded and split Amazon Timestream into two distinct services, each with a different architecture and trade-off profile:

1. Timestream for LiveAnalytics

Serverless, in-memory + disk tiering. Write-optimized for high-cardinality telemetry. Time-partitioned storage, automatic age-based tiering. Query via SQL. Best for: dashboards with predictable partitioning and high write throughput.

2. Timestream for InfluxDB

Managed InfluxDB v3 (IOx), built on the FDAP stack (Flight, DataFusion, Arrow, Parquet). Parquet files backed by S3. SQL + InfluxQL query language. Better cardinality story than LiveAnalytics. Best for: teams already on InfluxDB or wanting open-format storage.

3. Amazon Keyspaces (Cassandra)

Serverless Cassandra. Wide-column store. Partition key = device_id + time_bucket; clustering key = timestamp. Ideal for billions of rows + predictable point lookups. Query via CQL. Best for: Cassandra-shaped access patterns and very high volume inserts.

4. OpenSearch

Full-text + time-series. Index rollups (aggregations). Beats Timestream for logging/observability if you already have an ELK stack. Query via DSL + SQL plugin.

5. S3 + Iceberg + Athena

Batch analytics + ML training. Write Parquet partitioned by date. Query with Athena (Trino) or EMR. Iceberg for transactional semantics. Cheapest for cold storage.

6. Aurora + TimescaleDB Extension

PostgreSQL-native TSDB. Good for teams already on Postgres. Hypertables + continuous aggregates. Lower cardinality ceiling than specialized TSDB.


Timestream for LiveAnalytics: Memory + Magnetic Tier

Timestream for LiveAnalytics is a fully serverless two-tier TSDB. Recent data lives in memory (millisecond queries); older data slides into magnetic storage (disk + S3, seconds-to-minutes latency).

Architecture

Timestream LiveAnalytics Architecture

Producers (IoT Core, Kinesis, Firehose, custom agents) send records via WriteRecords API. The write path:
1. Validation + schema inference (schemaless, flexible).
2. Memory store buffering (recent: minutes to hours, configurable).
3. Automatic tiering to magnetic store (older data, days to years).
4. Distributed query engine parses SQL and routes to both tiers.
5. Results merged and returned to client (CloudWatch metrics on latency, throttles).

Strengths:
– Automatic age-based tiering — no manual archival.
– Serverless — no capacity planning.
– SQL interface — familiar to data engineers.
– CloudWatch observability (write throughput, query latency).
– IAM + KMS for encryption at rest and in transit.

Limitations:
– Write throttling if memory tier undersized. Each table has a hard limit on write throughput.
– Schemaless = overhead. You pay for schema flexibility.
– Time-partitioned storage = must design partitions (dimension attribute + time). Bad partition design = full table scans.
– Magnetic store latency (0.5–2 seconds for cold data).
– No update or delete after insertion (immutable).

Best for:
– IoT dashboards with predictable time ranges (last 24 hours in memory, backfill from magnetic).
– Observability (metrics, events) if cardinality is moderate (< 10k dimensions per metric).
– Serverless preference — no infrastructure to manage.

Pricing:
– Write throughput units (WCU) — ~$0.30 per WCU-month (on-demand, varies by region).
– Magnetic store — ~$0.30–$0.35 per GB-month (cheap).
– Memory store — ~$10–15 per GB-month (expensive).


Timestream for InfluxDB (Managed InfluxDB v3 IOx)

Timestream for InfluxDB is AWS’s managed offering of InfluxDB v3, built on the FDAP stack: Flight (columnar data transfer), DataFusion (query engine), Arrow (in-memory format), Parquet (on-disk columnar storage).

Architecture

Timestream for InfluxDB FDAP IOx Architecture

Data ingestion:
1. Producers send line protocol or HTTP API to managed InfluxDB endpoint.
2. Ingest layer buffers in-memory (mutable, sorted by time).
3. Compaction engine writes sorted Parquet files to S3 every few minutes.
4. Metadata catalog (PostgreSQL) tracks table schemas and file locations.
5. Query engine: Apache Arrow Flight + DataFusion.
– Query planner: SQL or InfluxQL → physical execution plan.
– Executors push down filters to Parquet files (predicate pushdown).
– Results streamed as Arrow arrays (efficient columnar transfer).
6. AWS managed: VPC-attached, Multi-AZ HA, automated backups, scaling handled by AWS.

Strengths:
– Open format — Parquet + S3 backend. Portable to other query engines (Trino, DuckDB, Spark).
– Better cardinality story — InfluxDB v3 designed for billions of unique tag combinations.
– FDAP architecture = no vendor lock-in to proprietary formats.
– Horizontal scaling via Data Nodes (stateless query), Ingester Nodes (buffering).
– InfluxQL compatibility for teams migrating from self-hosted InfluxDB.

Limitations:
– Regional — single AWS region per InfluxDB cluster (no multi-region replication).
– Managed version lacks some tuning options (no direct access to Parquet config).
– Compaction window (minutes) — small write amplification cost.
– Query latency on S3 Parquet (100–500ms for mid-size datasets).

Best for:
– Teams already on InfluxDB and want managed experience.
– High cardinality + long retention (trillions of data points).
– Portability — want to query Parquet from Athena/Trino later.
– Mixed query workload (time-series metrics + analytical SQL).

Pricing:
– Ingester Nodes (data ingest) + Query Nodes (query compute) — hourly + data transfer.
– S3 storage — per GB-month.
– Cheaper than LiveAnalytics per GB for large datasets.


Keyspaces (Cassandra) for Wide-Column Time Series

Amazon Keyspaces is fully managed Apache Cassandra. It’s ideal for billion-row tables with predictable access patterns: “give me all readings for device X on day Y, sorted by timestamp.”

Architecture

Keyspaces Partition Key Design

Table schema (telemetry):

CREATE TABLE telemetry (
  device_id TEXT,
  day_bucket TEXT,          -- YYYY-MM-DD
  ts TIMESTAMP,
  sensor_reading FLOAT,
  PRIMARY KEY ((device_id, day_bucket), ts)
);

Query pattern:

SELECT * FROM telemetry 
WHERE device_id = 'sensor-42' 
  AND day_bucket = '2026-04-27' 
ORDER BY ts;

How it works:
1. Partition key = (device_id, day_bucket). All rows with same (device_id, day_bucket) live in one partition.
2. Clustering key = ts. Rows within partition sorted by timestamp.
3. Writes are append-only; range queries on ts are efficient (no full table scan).
4. AWS distributes partitions across nodes. Each node stores ~100 partitions.

Strengths:
– Billions of rows, microsecond point lookups.
– Write-optimized — LSM tree (log-structured merge).
– Serverless — no capacity planning.
– Compaction automatic (AWS handles it).
– On-demand pricing or reserved capacity.

Limitations:
– Hot partition problem: if all writes hammer one device_id, that partition overloads. Mitigation: bucket by hour or minute within the day.
– No aggregation — COUNT, SUM, etc. must be done client-side or in a separate OLAP layer.
– CQL only (no SQL).
– Eventual consistency (tunable — use LOCAL_QUORUM for strong consistency).

Best for:
– High-volume IoT telemetry (millions of rows per minute).
– Legacy Cassandra apps migrating to AWS.
– Predictable access patterns (device_id + time range).

Pricing:
– On-demand: ~$1.25 per million writes, ~$0.25 per million reads.
– Reserved capacity: lower $/op for committed volume.


When to Pick S3 + Iceberg + Athena Instead

For batch analytics and machine learning, the data lake approach beats specialized TSDB.

Architecture

S3 + Iceberg + Athena Pattern

Flow:
1. Producers → Kinesis Data Firehose.
2. Firehose batches records and writes Parquet to S3.
– Partitioning: s3://bucket/year=2026/month=04/day=27/hour=10/data.parquet
– Iceberg metadata: schema, partitions, snapshots (transactional semantics).
3. Glue Data Catalog registers table.
4. Query via:
Athena (SQL on Parquet, serverless, seconds to minutes).
EMR (Spark for heavy lifting, ETL, ML).
Trino (for multi-warehouse analytics).
SageMaker (training data featurization).

Strengths:
– Cheapest per GB (S3 storage ~$0.023/GB-month).
– Open format — Parquet, Iceberg. Portable to any engine.
– Iceberg transactional semantics — schema evolution, time travel, ACID.
– Integrates with ML pipeline (SageMaker training data export).
– Retention: compress old data to Glacier (~$0.004/GB-month).

Limitations:
– Query latency: 30s–5min (Athena cold start + S3 list latency).
– No real-time queries — batch only.
– Firehose batching window (1 min default, can be reduced to 60s).
– Schema management — Glue Catalog or manual DDL.

Best for:
– Historical analytics (daily/weekly reports).
– ML training data (combine with Feature Store).
– Archive (multi-year retention at low cost).
– Cost-sensitive workloads (batch > real-time).

Pricing:
– Firehose: ~$0.029 per GB.
– S3: ~$0.023/GB-month (standard), ~$0.004/GB-month (Glacier).
– Athena: ~$5 per TB scanned.


Decision Matrix: IoT Telemetry, Observability, FinOps

Comparison Table

Dimension Timestream LiveAnalytics Timestream for InfluxDB Keyspaces S3 + Iceberg + Athena
Ingestion rate 10k–100k writes/sec (memory throttle) 100k+ writes/sec 1M+ writes/sec 100k–1M writes/sec (Firehose)
Cardinality < 10k dimensions Billions of tag combos Billions of rows, predictable partition key Trillions of rows, batch aggregation
Query latency (hot) 100–500ms 200–1000ms < 10ms point lookup 30s–5min (Athena cold start)
Query latency (cold) 1–2s (magnetic) 500ms–5s (S3 Parquet) N/A 30s–5min
Cost per GB stored/month $10–15 (memory), $0.30 (magnetic) $0.30–0.50 Variable (on-demand ops) $0.023–0.30 (depends on tier)
Schema flexibility Schemaless (flexible but slower) Schemaless (FDAP optimized) Strict CQL schema Flexible (Parquet/Iceberg)
Query model SQL SQL + InfluxQL CQL (point, range) SQL (Athena/Trino/Spark)
Aggregation (in-DB) Yes (SQL) Yes (InfluxQL + SQL) No (client-side) Yes (SQL)
Long-term retention Magnetic (months–years) S3 backend (months–years) Inefficient (append-only) Glacier (years, cold)
Setup complexity Low (serverless) Low–medium (cluster sizing) Low–medium (partition design) Medium–high (ETL pipeline)

Workload → Service Decision Tree

Workload Decision Tree

  1. IoT telemetry, high cardinality (10k+ dimensions), long retention (1+ years)?
    Timestream for InfluxDB (open format, scales cardinality, Parquet backend).

  2. Predictable dashboards, moderate cardinality (< 5k dimensions), want serverless?
    Timestream for LiveAnalytics (automatic tiering, SQL interface, serverless).

  3. Cassandra-shaped queries (device_id + time range, billions of rows), predictable partition key?
    Keyspaces (low-latency point lookups, write-optimized, serverless).

  4. Batch analytics, ML training data, multi-year retention at low cost?
    S3 + Iceberg + Athena (cheapest, portable, lakehouse semantics).

  5. Existing OpenSearch observability stack, want logs + metrics under one roof?
    OpenSearch (with index rollups, log aggregation, dashboards).

  6. Light TSDB on top of existing Postgres, small dataset (< 10 TB)?
    Aurora + TimescaleDB extension (PostgreSQL-native, hypertables, compression).


Trade-offs and Common Pitfalls

Timestream LiveAnalytics Write Throttling

Pitfall: Memory tier undersized. Producers hit write throttling (400+ error), backlog builds, producers cry.

Fix:
– Estimate writes per second. Each “write unit” = 1 KB write.
– Use describe-table to check current throughput.
– Scale provisioned WCU or switch to on-demand.
– Monitor CloudWatch: UserErrors + ConditionalCheckFailed metrics.

InfluxDB Regional Constraint

Pitfall: Cluster in us-east-1. Disaster recovery requires manual replication setup. Multi-region failover is manual.

Fix:
– Plan for single-region. Use S3 backups for recovery.
– For multi-region, maintain replicas in separate regions (InfluxDB Enterprise pattern, or pump to secondary Timestream for LiveAnalytics).

Keyspaces Hot Partition

Pitfall: Single device hammered with writes. Partition key = device_id. All writes to one node.

Fix:
– Partition by (device_id, hour_bucket) or (device_id, minute_bucket).
– Use randomized clustering keys if write throughput > 1M/sec per device.

S3 + Athena Latency Surprise

Pitfall: First query on cold partition = 30s (S3 list + HEAD).

Fix:
– Use partition pruning in WHERE clause.
– Pre-warm partitions (dummy query).
– Use Athena provisioned capacity for predictable latency SLA.


Practical Recommendations

For IoT Telemetry

Preferred: Timestream for InfluxDB.
– Reason: High cardinality (sensors × locations × device families). Long retention (1–2 years). Open Parquet format for future queries.
– Alternative: Keyspaces if your queries are pure point lookups (device_id + timestamp range), no aggregation needed.
– Avoid: LiveAnalytics if cardinality > 10k distinct (dimension explosion).

For Observability (Metrics + Events)

Preferred: OpenSearch with index rollups + S3 cold tier.
– Reason: Unified logs + metrics. Index rollups (downsample old data). Kibana dashboards.
– Alternative: Timestream for LiveAnalytics if metrics-only, predictable partitioning, serverless preference.

For FinOps Batch Analytics

Preferred: S3 + Iceberg + Athena.
– Reason: Lowest cost per GB. Parquet partitioned by date/account. Query via Athena (ad-hoc), export to SageMaker for ML.
– Example: Daily billing ingestion → S3 (Iceberg) → FinOps dashboard (Superset on Trino).

For High-Cardinality Dashboards (10k+ unique dimension combos)

Preferred: Timestream for InfluxDB.
– Reason: FDAP stack handles high cardinality natively. InfluxQL + SQL queries.
– Acceptable: Timestream for LiveAnalytics only if you can heavily partition (e.g., device_id + region + sensor_type).

For Legacy Cassandra Apps

Preferred: Keyspaces.
– Reason: Drop-in replacement. CQL unchanged. Partition design patterns unchanged.


FAQ

Q1: Can I switch from Timestream for LiveAnalytics to InfluxDB mid-project?

A: Yes, but with manual migration. Export LiveAnalytics to S3 (via Parquet export), ingest into InfluxDB v3. Schema mapping required (tag keys, field types). Expect 1–2 weeks for large datasets (> 10 GB).

Q2: Does Keyspaces support time-series aggregations (SUM, AVG)?

A: No. CQL lacks aggregate functions. Workaround: compute in application (Python, Spark) or use a separate OLAP layer (Athena, ClickHouse).

Q3: What’s the cost difference between Keyspaces on-demand and Timestream?

A: Keyspaces on-demand: ~$1.25/million writes. Timestream LiveAnalytics: ~$0.30/WCU-month (where 1 WCU ≈ 1 KB write). For bursty workloads, Keyspaces cheaper; for predictable, Timestream reserved cheaper.

Q4: Can I query Timestream for InfluxDB from Athena?

A: No, not directly. InfluxDB v3 stores Parquet in S3 but manages it internally. To query Parquet externally, you’d need to export + re-ingest into a data lake.

Q5: How long does S3 + Iceberg + Athena query take?

A: First query on a partition: 30s (cold start, S3 list). Subsequent queries: 1–5s (warm cache). With Athena provisioned capacity: < 1s guaranteed.


Further Reading

  • AWS Timestream: https://docs.aws.amazon.com/timestream/
  • InfluxDB v3 IOx Docs: https://influxdata.com/docs/
  • AWS Keyspaces: https://docs.aws.amazon.com/keyspaces/
  • Time-Series Benchmarks: https://iotdigitaltwinplm.com/architecture/influxdb-vs-timescaledb-vs-clickhouse-iot-time-series-2026/
  • Azure Alternative: https://iotdigitaltwinplm.com/azure/azure-time-series-databases-data-explorer-cosmos-db-architecture-2026/
  • TimescaleDB Deep Dive: https://iotdigitaltwinplm.com/architecture/timescaledb-hypertables-chunks-compression-continuous-aggregates/
  • AWS Pillar: https://iotdigitaltwinplm.com/aws/

Last Updated: 2026-04-27. Author: Riju.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *