Apache Iceberg vs Paimon: Lakehouse Table Formats Compared (2026)
The lakehouse architecture has matured from concept to production standard in 2025–2026. But as data teams consolidate batch and streaming workloads, they face a pivotal choice: Apache Iceberg vs Apache Paimon. Both are open-source table formats designed for ACID compliance, schema evolution, and time-travel queries—but they differ fundamentally in heritage, write strategy, and optimal use cases.
Architecture at a glance





This post compares Iceberg and Paimon across architecture, streaming primitives, catalog integration, and operational complexity. By the end, you’ll have a decision matrix to choose the right format for your lakehouse.
Origins: Netflix’s Batch Revolution vs Flink’s Streaming Inheritance
Apache Iceberg emerged from Netflix’s 2018 effort to solve the metadata scalability problem in data lakes. Netflix was running petabyte-scale Hadoop clusters with Hive tables and faced unbounded metadata blobs that made snapshots and time-travel impractical. Iceberg introduced versioned metadata as first-class citizens, decoupling table snapshots from the data files themselves.
Apache Paimon (formerly Flink Table Store, open-sourced in 2023) took a different path. It was born from the Apache Flink community’s observation that Change Data Capture (CDC) and real-time streaming demanded a table format with built-in compaction primitives, dimension lookup, and sub-second write latency. Paimon started as a streaming-first design where LSM-trees and multi-level compaction were core, not add-ons.
This genealogy shapes everything: Iceberg optimizes for batch query engines first, streaming second. Paimon optimizes for Flink ingest first, analytics second.
Architecture: Manifest Hierarchies vs LSM-Trees
Iceberg: Manifest-Based Snapshots
Iceberg stores table metadata in a manifest hierarchy:
- Table metadata points to the latest snapshot ID
- Each snapshot references a manifest list (partition spec + file metadata)
- The manifest list enumerates manifest files per partition
- Manifest files contain entries for data files (path, row count, metrics)
The key insight: metadata is immutable and versionable. Every snapshot is a complete view of the table at a point in time, enabling cheap time-travel and concurrent reads without locks.
See arch_01.mmd for the hierarchy:
– Table → Snapshot → Manifest List → Manifest Files → Data Files
Paimon: Per-Partition LSM-Trees
Paimon uses a Log-Structured Merge-tree (LSM-tree) per partition, similar to RocksDB:
- Level 0: Unsorted in-memory and spillable runs (100–200 MB)
- Level 1+: Sorted runs with exponential size growth
- Compaction merges runs when level thresholds are exceeded
- Dimension tables use a dedicated snapshot-isolated LSM for fast lookups
The LSM design was chosen because:
– Sub-second write latency: writes hit Level 0 and return immediately
– Background compaction: doesn’t block ingest
– Efficient CDC: can capture deltas between compaction levels
– Natural dimension indexing: sorted runs enable fast point lookups
See arch_02.mmd for the LSM-tree per partition, with compaction triggers.
Write Modes: CoW vs MoR, and Defaults Matter
Both formats support Copy-on-Write (CoW) and Merge-on-Read (MoR), but they diverge in philosophy:
| Aspect | Iceberg | Paimon |
|---|---|---|
| Default mode | Copy-on-Write (CoW) | Merge-on-Read (MoR) |
| CoW cost | All rows rewritten to new file | Manifest + LSM compact only |
| MoR cost | V2 requires changelog tracking | Native; no extra bookkeeping |
| MoR latency | V2 can be slow for large deltas | Sub-second; LSM handles it |
| Delete handling | Position deletes (V2) tracked separately | Integrated in LSM |
Iceberg CoW is simpler for Spark and Trino, but rewrites entire data files even for single-row updates. Iceberg V2 MoR was added in 2023 to avoid rewrites, but requires changelog tracking and extra metadata.
Paimon MoR is baked into the design: updates hit Level 0 immediately, background compaction merges them later. No changelog overhead.
Operational reality: If you’re running sub-minute CDC ingests into Iceberg, you’ll want V2 MoR; if you’re running hourly batch upserts, CoW is fine. Paimon’s MoR is always there, making streaming feel natural.
Streaming-First: Paimon’s CDC Advantage
Paimon has a native CDC connector and dimension-table logic that Iceberg added only in V3 (2025):
Paimon CDC Path
- Flink CDC operator captures MySQL binlog / Kafka topics
- Flink Paimon sink writes directly to LSM Level 0
- Background compaction merges writes asynchronously
- Dimension lookups use snapshot-isolated LSM reads
Iceberg CDC Path (V3)
- Flink or Debezium buffers CDC events
- Micro-batching or upserts append to Iceberg
- ChangelogScan (new in V3) reads change deltas without full snapshots
- No native dimension logic; you build it with Flink side-inputs
See arch_03.mmd for the ingest paths side-by-side.
Latency comparison (typical):
– Paimon CDC: 100–500 ms end-to-end (MySQL bin log → query result)
– Iceberg CDC: 1–5 seconds (micro-batch + snapshot interval)
Paimon wins for real-time operational tables (orders, inventory, customer state). Iceberg wins for analytics on immutable events (clickstream, logs).
Catalogs: REST, Polaris, Nessie, and Hive
Table formats need a catalog to store metadata pointers:
| Catalog | Iceberg | Paimon | Use Case |
|---|---|---|---|
| REST API | Full support | Beta support | Warehouse-agnostic, distributed |
| Polaris | Native (by Databricks) | Not yet | Managed service, SOC2/FedRAMP |
| Nessie | Full support | Not yet | Git-like branching, time-travel |
| Unity Catalog | Databricks managed | Not yet | Databricks Lakehouse |
| Hive Metastore | Supported | Primary | Open-source, widely deployed |
See arch_04.mmd for catalog ecosystem and compute engine integrations.
Iceberg’s catalog advantage: multiple options allow organizations to avoid vendor lock-in. Polaris (Databricks’ open REST catalog) and Nessie (branching model for data versioning) give Iceberg users governance flexibility.
Paimon’s catalog limitation: Hive Metastore is the primary production catalog. REST support is beta. This means:
– Existing Hive deployments integrate easily
– But Paimon lacks the multi-branch versioning story Nessie provides
– Governance for Paimon is still evolving
Compute Integration: Spark, Trino, Flink, and Specialized Engines
Both formats work across multiple engines, but with different degrees of maturity:
Iceberg’s Broad Ecosystem
- Apache Spark: Read + Write (CoW + V2 MoR)
- Trino / Presto: Read + Write
- Snowflake: Native read (managed Iceberg)
- Amazon Athena: Native support (AWS-managed)
- Google BigQuery: Read support (via REST)
- DuckDB: Full support (analytical SQL)
Paimon’s Focused Ecosystem
- Apache Flink: Native read + write (streaming + batch)
- Apache Spark: Read + Write (via REST)
- Trino: Read (via REST catalog beta)
- StarRocks: Native read (fast OLAP queries)
Takeaway: Iceberg is the de facto standard for SQL warehouses and cloud-native analytics. Paimon is strongest in Flink-centric organizations and real-time StarRocks deployments.
Schema Evolution: Both Safe, Both ACID
Both formats handle schema changes correctly:
- Add column: Safe, default value applied to existing rows
- Drop column: Safe, column metadata removed
- Rename column: Safe, metadata updated
- Change type: Validated (e.g., int → long allowed; string → int not)
Key difference: Iceberg schema versioning is explicit in metadata; Paimon’s LSM handles it implicitly. Operationally, both are equally safe.
Operational Complexity: Manifest Cleanup vs Compaction Management
Iceberg Operations
- Metadata cleanup: Old snapshots accumulate; must use
expire_snapshots()to garbage-collect - Orphaned files: Failed writes may leave dangling files;
remove_orphan_files()required - No background compaction: Manual file consolidation via
rewrite_data_files()if too many small files
Effort: Low if you automate snapshot expiration. High if you ignore orphaned files (they pile up and balloon your storage).
Paimon Operations
- Automatic compaction: Background job continuously merges LSM levels
- Compaction tuning: Configure Level 0 size threshold, max level count
- Compaction latency: May delay recent writes while merging
Effort: Medium. You must tune compaction (L0 size, write parallelism) for your ingest rate. Too aggressive → high CPU; too lenient → bloated L0.
Hybrid teams: Iceberg + automatic snapshot expiration is simpler; Paimon requires more hands-on tuning.
CDC Primitives: Paimon’s Dedicated Toolkit
Paimon’s CDC design includes:
- CDC connector: Kafka source directly maps to Paimon updates
- Dimension table mode: Lookup table for joins (e.g., user profiles)
- Changelog reads: Get only delta between timestamps, no full table scan
- Delete semantics: Integrated; no separate position-delete bookkeeping
Iceberg V3’s ChangelogScan added similar capabilities in 2025:
– Read change deltas without full snapshots
– Works with position deletes and append-only logs
But Iceberg’s CDC tooling is younger and less battle-tested in production than Paimon’s Flink-native CDC.
Decision Matrix: When to Pick Which
See arch_05.mmd for the decision tree. Here’s the distilled logic:
| Scenario | Recommendation | Rationale |
|---|---|---|
| Batch ETL daily/hourly, Spark + Trino | Iceberg | Mature ecosystem, simple CoW, no compaction tuning |
| Sub-minute CDC ingest, Flink primary | Paimon | Native LSM, background compaction, CDC operators ready |
| Snowflake / BigQuery primary | Iceberg | Warehouse-native support; Iceberg is managed |
| Real-time dimension tables, lookup joins | Paimon | Dimension mode, no separate side-input logic |
| Multi-branch versioning needed (Git-like) | Iceberg + Nessie | Only solution; Paimon lacks branching |
| Open-source, self-managed, all OSS engines | Iceberg | More catalog flexibility; REST + Nessie options |
| StarRocks OLAP, real-time analytics | Paimon | StarRocks native integration, fast ingests |
| Hybrid: write streaming, read batch analytics | Paimon ingest → Iceberg REST | Paimon handles CDC, Iceberg REST catalog abstracts reads |
Hybrid Architectures: Paimon + Iceberg
A practical 2026 pattern is write-side and read-side specialization:
-
Ingest tier: Apache Paimon + Flink CDC
– Capture MySQL / Kafka changes into Paimon LSM
– Background compaction keeps operational tables fresh
– Sub-second latency for operational reads -
Analytics tier: Iceberg snapshot export
– Paimon periodic snapshots exported to Iceberg REST catalog
– Iceberg snapshots consumed by Spark, Trino, Snowflake
– Time-travel and versioning at query tier -
Catalog: Shared REST catalog (Nessie or Polaris)
– Both formats register with same catalog
– Unified data discovery and governance
This pattern leverages both:
– Paimon’s streaming strength (low-latency ingest)
– Iceberg’s analytics strength (broad SQL engine support, versioning)
2026 Maturity Assessment
Apache Iceberg
- Maturity: Production-ready (v1 since 2019)
- Ecosystem: Snowflake, Databricks, AWS, Google invested
- Risk: Highest adoption; most hiring knowledge available
- Bleeding edge: V3 ChangelogScan, Puffin stats (2025)
Apache Paimon
- Maturity: Production-ready (v0.4+, gained traction in 2024–2025)
- Ecosystem: Alibaba, Tencent, ByteDance deployments; growing Flink integrations
- Risk: Smaller ecosystem; Hive Metastore primary catalog (REST beta)
- Bleeding edge: Cross-partition compaction, dynamic bucketing (2025)
Performance Benchmarks: Streaming Latency vs Batch Throughput
| Workload | Iceberg (CoW) | Iceberg (V2 MoR) | Paimon |
|---|---|---|---|
| 1M row batch insert | 15s (rewrites) | 3s (changelog) | 1s (LSM) |
| Single-row update latency | 15s (CoW) | 1–2s (MoR) | 100–500ms (LSM) |
| 1M row scan | 8s | 8s | 10s (L0 overhead) |
| CDC 100 events/sec | 5–10s batch latency | 500ms | 100ms |
Reality check: Benchmarks vary by storage (HDFS, S3, local SSD), serialization (Parquet, ORC), and compute engine. These are representative; test your workload.
Common Pitfalls
Iceberg Pitfalls
- Orphaned files: Set up snapshot expiration or risk unbounded storage
- Manifest explosion: Too many snapshots → slow metadata reads; use
rewrite_manifests() - V2 MoR immaturity: Smaller test surface; avoid if CoW sufficient
Paimon Pitfalls
- Under-compacted L0: If ingest rate exceeds compaction, L0 bloats and slows reads
- Limited catalog options: Hive Metastore constraints (no branching, no fine-grained ACLs)
- Trino integration beta: REST catalog support still maturing
Migration Path: From Hive / Delta Lake
To Iceberg
- Use
spark-sqlor Scala API to migrate Hive tables:CALL migrate_table('hive_db.table_name') - Iceberg handles partitioning, stats, and schema automatically
- Existing Spark jobs work without code changes (data source swaps to
org.apache.iceberg.spark)
To Paimon
- Use
flink sqlCLI or DataStream API - Requires Flink 1.17+
- CDC connector makes streaming migration straightforward; batch migration requires manual Spark job
- Hive Metastore integration smooth
FAQ: Five PAA Questions
1. Can we use both Iceberg and Paimon in the same data platform?
Yes. Many platforms use Iceberg for batch analytics and Paimon for streaming ingest. Use a shared REST catalog (Nessie or Polaris) so both formats appear as one logical data warehouse. Tradeoff: dual-format support adds operational complexity (two metadata systems, two compaction strategies).
2. Does Paimon’s LSM compaction require tuning every time we scale ingest?
Often. Compaction speed depends on hardware (CPU cores, disk I/O) and LSM configuration. As ingest rate grows, Level 0 size or compaction concurrency may need adjustment. Iceberg avoids this by having no background compaction, but pays the cost in slow accumulation of small files. Start conservative (small L0 thresholds) and loosen as you tune.
3. Is Iceberg’s V2 Merge-on-Read stable enough for production in 2026?
Mostly, but with caveats. Iceberg V2 MoR works well in Spark and DuckDB; Trino support is newer. If you need V2 MoR in many engines, prioritize Iceberg V1 CoW or Paimon LSM instead. Ask your engine vendor (Databricks, Starburst, etc.) for production guarantees.
4. How does Paimon handle time-travel if compaction merges files?
Via snapshot IDs. Paimon retains LSM level structure in snapshot metadata, so you can query as-of a snapshot even after files are compacted. Similar to Iceberg: metadata is immutable, files are opaque.
5. What’s the cost difference: Iceberg vs Paimon over a year?
In cloud storage (S3/GCS):
– Iceberg + lazy cleanup: Small files accumulate; expect 10–20% storage waste without active orphan cleanup
– Paimon + steady compaction: LSM keeps file count lower; expect 5–10% waste if you tune compaction
In compute:
– Iceberg: Minimal overhead; metadata reads are fast
– Paimon: Compaction jobs consume CPU continuously; budget for background workers
For a 10 TB daily ingest, Iceberg might cost 5–10% more in storage (orphaned files) but save on compute. Paimon balances both but requires operational attention. Net: similar TCO; pick based on workload fit, not cost.
Conclusion
Apache Iceberg vs Paimon is not a binary choice in 2026. Both are production-grade open-source table formats with distinct strengths:
- Choose Iceberg if you are batch-first (Spark/Trino/Snowflake), value ecosystem breadth, or need multi-branch versioning
- Choose Paimon if you are streaming-first (Flink CDC), need sub-second write latency, or prioritize background compaction
- Combine both if you have distinct streaming and analytics tiers, using a shared REST catalog
The lakehouse architecture has matured to the point where table format choice should be driven by workload fit, not hype. Understand your ingest patterns, query engines, and operational constraints—then pick the format (or formats) that align.
Related reads:
– Iceberg vs Delta vs Hudi: Lakehouse Table Formats Compared (2026)
– Iceberg Catalogs: Polaris vs Nessie vs Unity Comparison (2026)
– Flink vs Spark Streaming vs Kafka Streams: Real-Time Processing (2026)
Last Updated: 2026-04-29
This post is part of the IoT Digital Twin PLM content series on cloud data platforms and lakehouse architectures.
