Iceberg vs Delta vs Hudi for Lakehouses: 2026 ADR

If you are starting a lakehouse table in 2026 and someone in the room confidently says “just use Iceberg” or “everyone is on Delta now” or “Hudi is dying”, that person is wrong in at least one direction. The Iceberg vs Delta vs Hudi 2026 picture is no longer a turf war between three roughly equivalent formats — it is a real architectural choice with very different consequences depending on who writes to the table, who reads from it, what the latency target is, and which cloud bill you are trying to flatten. This post is a working ADR you can paste into your /docs/adr/ folder and adapt. It walks through the context, the three options as they actually stand today, a weighted decision matrix, the recommended default for net-new lakehouses, the consequences you sign up for, and the very specific scenarios where Delta Lake or Apache Hudi are the better answer. It assumes you have built a few data pipelines, that “open table format” means something to you, and that you have at least heard the words “REST catalog” and “merge-on-read” before.

Context and Problem Statement

An open table format is a metadata layer that sits on top of Parquet (or ORC, or Avro) files in object storage and turns a directory of files into a real table with ACID semantics, schema evolution, time travel, and concurrent writers. Without it, you have a data lake — cheap, flexible, and structurally hostile to anything that looks like a database. With it, you have a lakehouse: the same object storage economics, but with transactional updates, deletes, merges, and snapshot isolation.

In 2026 the three credible choices are Apache Iceberg, Delta Lake, and Apache Hudi. The reason this decision is louder in 2026 than it was in 2023 is that the warehouse vendors finally picked sides — and they did not all pick the same one. Snowflake shipped native Iceberg tables to GA in 2024 and made them a first-class storage option, not a federation hack (Snowflake Iceberg announcement). Databricks responded with Delta Lake Uniform, which writes a single set of Parquet files but exposes them through both Delta and Iceberg metadata so Snowflake, BigQuery, and Trino can read the same table (Delta Lake Uniform). Onehouse — the company founded by the original Hudi authors — doubled down on Hudi 1.0 with universal data lakehouse positioning and the open-source XTable project for cross-format translation (Hudi 1.0 release notes, Apache XTable). AWS launched S3 Tables (Iceberg-backed), Google launched BigLake Iceberg, and Microsoft Fabric standardised on Delta-with-Uniform.

The result is a market where the format you pick still influences which engines you can query with, which catalogs you can register against, and how much your CDC pipeline will cost to run — even though every vendor will tell you “we support all three”. The differences are real and they matter at scale. This ADR exists because the cost of choosing wrong, two years in, is a multi-quarter rewrite of every reader, writer, catalog binding, and IAM policy that touches the table.

The Three Options

Before the matrix, here is what each format actually is in 2026 and what it is good at. The diagram below summarises the three side-by-side.

Apache Iceberg

Iceberg is a table specification (currently spec v3) that describes tables as a tree of immutable metadata files: a table metadata JSON pointing at manifest lists, which point at manifest files, which point at the actual data files. Every commit produces a new metadata file and updates a catalog pointer atomically. This indirection is what gives Iceberg its two signature features: partition evolution (you can change how a table is partitioned without rewriting history) and hidden partitioning (queries don’t need to know the partition column to get pruning).

The big 2024–2026 story for Iceberg is the REST catalog spec. Instead of every engine speaking directly to a Hive metastore or Glue, the REST catalog defines a protocol any catalog can implement — and Tabular (acquired by Databricks in 2024), Snowflake’s Polaris (open-sourced in 2024), Nessie, Lakekeeper, Unity Catalog (now open source), and AWS Glue all expose REST endpoints. That means a Spark job, a Snowflake warehouse, a Trino cluster, and a Flink streaming app can all bind to the same catalog and see the same table with the same ACL story. Real adopters: Netflix (which created the format), Apple (mixed Iceberg + Hudi estate), Stripe, Airbnb, Pinterest, and increasingly Walmart and Adobe.

Delta Lake

Delta Lake represents a table as a directory of Parquet files plus a _delta_log/ directory of JSON transaction log entries and periodic Parquet checkpoints. Each commit appends a numbered JSON file describing adds, removes, and metadata changes. The protocol — formerly Databricks-controlled and now governed under the Linux Foundation — has matured through versions 1, 2, 3, and now 4, picking up deletion vectors (delete without rewriting files), liquid clustering (incremental, dimension-aware clustering), V2 checkpoints, row tracking, and type widening along the way (Delta Lake protocol).

The strategic move in 2024 was Delta Lake Uniform: Delta writes the data once, but generates Iceberg metadata alongside the Delta log, so any Iceberg-compatible reader (Snowflake, BigQuery, Trino, Flink) can read the same table. This essentially neutralised the “Iceberg is more open” argument for Databricks customers. Delta’s home field is Databricks itself — Photon, DLT, Unity Catalog, MLflow, and Lakeflow are all tuned for Delta first — and that integration is genuinely deep. Microsoft Fabric’s OneLake also defaults to Delta, with Uniform giving it Iceberg interop. Real adopters: Databricks’s entire customer base, Microsoft Fabric customers, Comcast, Shell, and Block.

Apache Hudi

Hudi was the original “lakehouse” format — Uber built it in 2016 specifically to handle streaming upserts on petabyte trip data. The model is built around two table types: Copy-on-Write (CoW), where each update rewrites the affected file, and Merge-on-Read (MoR), where updates are written as row-level log files and merged at read time (or during compaction). This gives Hudi a fundamentally different write profile than Iceberg or Delta: it is write-optimised, with a record-level index (a metadata table mapping primary keys to file groups) that makes upserts cheap and deterministic.

Hudi 1.0, released in late 2024, added a redesigned timeline format, a functional index (indexes derived from expressions), non-blocking concurrency control, and partial updates — closing the gap with Iceberg on some operational ergonomics while keeping the streaming-first DNA. Onehouse positions Hudi as the universal data lakehouse, with XTable translating Hudi metadata to Iceberg and Delta for read interop. Real adopters: Uber (the original), Robinhood, Walmart, ByteDance, GE, and Notion-scale CDC workloads.

Decision Matrix

The matrix below is what you should adapt for your own ADR. The weights are deliberately opinionated — they assume a 2026 enterprise net-new lakehouse with multi-engine reads, moderate streaming, and a multi-cloud blast radius. If you are a Databricks shop, raise the weight on “vendor support” and “streaming writes” and the answer swings. If you are a CDC-heavy fintech, raise “streaming writes” and “CDC story” and Hudi pulls ahead.

Criterion	Weight	Iceberg	Delta + Uniform	Hudi 1.0
Ecosystem breadth (engines, language SDKs, OSS contributors)	20%	5.0	3.5	3.0
Streaming writes and upserts	15%	3.5	4.0	5.0
Time travel and branching	10%	5.0	4.0	3.5
Schema + partition evolution	10%	5.0	4.0	3.5
Catalog options + REST adoption	15%	5.0	3.5	3.0
Vendor + cloud support (Snowflake, BigQuery, AWS, Azure, Databricks)	15%	5.0	4.5	3.0
CDC + incremental query story	15%	3.5	4.0	5.0
Weighted total	100%	4.56	3.91	3.70

Notes on the scoring. Ecosystem breadth favours Iceberg because every major warehouse and query engine ships native Iceberg readers in 2026, and the REST catalog spec means Iceberg has the most catalog implementations. Streaming writes favours Hudi because record-level upserts with sub-minute SLOs are exactly what it was built for; Iceberg’s Flink integration is good but its merge cost is higher. Time travel is roughly a tie, with Iceberg slightly ahead because of branching and tagging in spec v2/v3. Schema and partition evolution is Iceberg’s strongest feature — partition evolution without rewriting is unique. Catalog options favours Iceberg because of the REST spec breadth (Polaris, Nessie, Lakekeeper, Unity, Glue, Tabular). Vendor support is close between Iceberg and Delta because of Uniform; Hudi lags because Snowflake and BigQuery do not ship native Hudi readers. CDC is Hudi’s killer feature — incremental queries return a stream of changes since a commit, which is exactly what downstream pipelines want.

The aggregate ranking is Iceberg > Delta > Hudi for a generic 2026 net-new lakehouse. But the gap between Delta and Hudi is small enough that two or three weight changes can flip it.

The Decision

For net-new lakehouses in 2026, default to Apache Iceberg with a REST catalog. The diagram below summarises the decision flow.

Specifically: pick Iceberg managed through a REST-spec catalog (Polaris, Unity OSS, Lakekeeper, Nessie, or your cloud provider’s Iceberg-REST endpoint such as AWS S3 Tables or Snowflake Polaris), with Parquet as the storage format and Spark or Flink as the primary writer. This default is right when (a) you expect more than one engine to read the table within 24 months, (b) you do not have an existing Databricks-heavy estate that would be more expensive to migrate than to extend, and (c) your streaming SLO is in the minutes-to-hour range rather than seconds.

Pick Delta Lake with Uniform enabled if your compute is more than ~70% Databricks today and your governance is built around Unity Catalog. You get the deepest engine optimisations (Photon, liquid clustering, DLT, Lakeflow), and Uniform gives you Iceberg-compatible reads for the occasional Snowflake or Trino consumer.

Pick Apache Hudi 1.0 if streaming CDC with sub-minute upsert latency on primary-keyed records is the dominant workload. Concretely: you are ingesting Kafka or Debezium streams into the lake, downstream consumers expect a continuous change feed (not just snapshot reads), and your point-lookup cost matters as much as your scan cost.

Consequences

Positive consequences of defaulting to Iceberg. You avoid vendor lock-in at the storage layer; the table outlives any single engine choice. You get the broadest reader ecosystem in 2026 — Snowflake, BigQuery, Redshift, Databricks (via Uniform), Trino, Spark, Flink, StarRocks, DuckDB, and ClickHouse all ship Iceberg readers. You get partition evolution and hidden partitioning, which are genuinely unique and pay for themselves the first time a partition strategy needs to change. The REST catalog spec means you can swap catalogs (Polaris to Unity OSS to a managed Tabular successor) without rewriting your tables. Time travel and branching open up dev/test workflows that were previously impossible on a lake.

Negative consequences. The REST catalog story is still maturing — Polaris and Unity OSS are both under 18 months old, and operational tooling (backup, multi-region, IAM federation) is thinner than mature Hive metastore deployments. CDC is more expensive than Hudi at high upsert rates; you will pay in compaction cost or query latency. Streaming writes from Flink work but require careful tuning of target-file-size, snapshot retention, and orphan file cleanup. Iceberg’s merge-on-read (added in spec v2) is not as mature as Hudi’s MoR, so a streaming-heavy workload may push you to a Hudi sidecar or a high-frequency compaction job. If your team has zero Iceberg operational experience, the on-call burden in the first six months is real — equivalent skills exist for Delta inside Databricks-shop teams.

When to choose Delta or Hudi instead

There are two specific architectures where the default flips. The diagram below summarises both. Pick the path that matches your reality, not the default.

Pick Delta Lake (with Uniform) if you are all-in on Databricks. “All-in” here means Unity Catalog is your governance system of record, more than 70% of your transformation compute runs on Databricks, you use DLT or Lakeflow for pipelines, and MLflow is your model registry. In that world, Delta is not just adequate — it is materially faster (Photon’s vectorised execution, deletion vectors, liquid clustering) and operationally simpler (one platform, one catalog, one billing line). Uniform handles the occasional cross-engine read. If you are on Microsoft Fabric, the answer is also Delta, because OneLake is Delta-first and Fabric’s DirectLake mode is Delta-native. Trying to force Iceberg into a Databricks-heavy estate in 2026 is mostly an ideology tax.

Pick Apache Hudi if streaming CDC at scale is the load-bearing workload. Hudi’s record-level index and MoR table type are designed for sub-minute upsert latencies on primary-keyed streams. Uber runs Hudi on multi-petabyte trip and dispatch data with seconds-to-minutes freshness. Robinhood uses Hudi for low-latency trade event capture. Walmart uses it for in-store inventory CDC. If you are ingesting Debezium change streams off Postgres or MySQL into the lake, Hudi’s incremental query and time-windowed compaction give you a substantially cheaper CDC pipeline than Iceberg-with-MoR. Hudi 1.0’s functional indexes and non-blocking concurrency closed many of the operational gaps. Pair Hudi with XTable if you need Iceberg or Delta readers downstream.

Trade-offs and Gotchas

Uniform compatibility caveats. Delta Uniform exposes Iceberg metadata for read-only access. Iceberg writes back to a Uniform table are not supported in 2026; if Trino or Spark needs to write, you are writing Delta. Some Delta features (column mapping mode, certain identity columns) reduce Iceberg compatibility — check the Uniform feature matrix before turning it on, not after.

Iceberg REST catalog adoption rate. The REST spec is well-defined but not every implementation is at parity. Polaris (open-sourced by Snowflake), Lakekeeper, Nessie, and Unity OSS all implement the spec, but namespace ACL semantics, view support, and Iceberg spec v3 features (variant types, geo types, deletion vectors v2) lag in some implementations. Test your specific catalog + engine combination against a non-trivial schema before committing.

Hudi compaction tuning. MoR tables in Hudi need compaction to stay query-performant. Get this wrong and your read latency degrades steadily. Inline compaction blocks ingest; async compaction needs its own resource pool. Hudi 1.0’s non-blocking concurrency helps, but tuning compaction triggers (number of commits, log file size, time-based) is operational work you cannot skip.

XTable / OneTable is read-time translation, not write-time. XTable (formerly OneTable) translates metadata between Iceberg, Delta, and Hudi by generating sidecar metadata files. It is excellent for letting an engine that only speaks Format A read a Format B table. It does not turn one format into two writers at the same time — writes still go to the source format, and translation runs on a schedule. Treat XTable as a reader compatibility shim, not a multi-master story.

Catalog migration is the hard part, not file format. Moving Parquet files between formats is mechanical. Moving the catalog binding, IAM, lineage, RBAC, and the 200 dashboards that point at the old table is the actual project. Choose your catalog with as much care as your format.

Practical Recommendations

If you are starting today on a greenfield lakehouse, the concrete recommendation is:

Format: Apache Iceberg, spec v2 minimum (v3 if your engines support it).
Storage: Parquet with ZSTD compression, target file size 256–512 MB, row group size 128 MB.
Catalog: REST catalog implementation — pick AWS S3 Tables on AWS, Polaris on Snowflake-heavy, Unity OSS on Databricks-adjacent, Lakekeeper or Nessie for multi-cloud or strong branching needs.
Writers: Spark for batch, Flink for streaming, with table-maintenance jobs (snapshot expiration, orphan file cleanup, manifest rewrites) running on a daily cadence — these are not optional.
Readers: Whatever your business uses — the point of Iceberg is you do not have to choose.
Time-travel SLA: Retain snapshots for at least 7 days for ops recovery; longer if you need it for audit.
Compaction: Schedule data file compaction weekly for batch tables, hourly for streaming tables; tune file count and size targets per table, not globally.

If you are on Databricks today, leave existing tables as Delta with Uniform enabled, and run new multi-engine tables on Iceberg through Polaris or Unity OSS — do not migrate everything at once. If you have a streaming-heavy workload that does not fit Iceberg’s cost profile, isolate it as a Hudi table behind XTable and let downstream Iceberg consumers read the translated metadata.

Finally, write the ADR. Pick a format, write down the criteria, weight them, score them, sign the document, and date it. Two years from now, when someone asks “why are we on Iceberg”, the answer should be a paragraph from the ADR, not a Slack thread no one remembers.

FAQ

Will Delta and Iceberg merge into a single format? No, not as identical specs. Uniform already gives you Iceberg-compatible reads from Delta tables, and the practical convergence point is Parquet + metadata translation, not a unified log format. Both communities have publicly stated they will continue to evolve their own specs. Expect more cross-compatibility (Iceberg writes to Delta tables, deeper Uniform features) but not a true merge.

Is Hudi dying? No. Hudi 1.0 (late 2024) was a major release with real architectural improvements, Onehouse is funded and actively shipping, and the user base at Uber, Walmart, Robinhood, ByteDance, and others is too large to walk away. What is true is that Hudi has lost the “default open format” race to Iceberg in the enterprise net-new segment. It remains the right answer for streaming-CDC-heavy workloads and has a clear future as the specialist format in a mixed-format estate.

What about Apache XTable? XTable (formerly OneTable, donated to the ASF in 2024) generates sidecar metadata that lets a reader for one format read a table written in another. It is genuinely useful for incremental migrations, mixed-format estates, and avoiding hard format lock-in. It is not a replacement for picking a primary format — every table still has one source-of-truth metadata layout.

Iceberg REST vs Glue? REST is the spec; Glue is one implementation of it (AWS Glue Iceberg REST endpoint, GA in 2024). Glue is fine as your catalog if you are on AWS and Glue ACL semantics match your governance model. If you need branching (Nessie), open-source self-hosting (Lakekeeper, Polaris), or vendor neutrality, look beyond Glue. The point of the REST spec is that you can switch catalogs without changing your readers.

Does Snowflake natively write Iceberg tables now? Yes, since 2024 GA. Snowflake-managed Iceberg tables let Snowflake be the writer with full ACID semantics, and external Iceberg tables let Snowflake read tables written by Spark/Flink/Trino. The two modes have different feature sets — Snowflake-managed gives you Time Travel inside Snowflake; externally-managed gives you cross-engine writes.

Can I run all three formats side by side? Yes, and many large estates do. The cost is operational complexity — three catalogs, three sets of maintenance jobs, three monitoring stacks. Use XTable to keep the reader story coherent. The benefit is each format does what it is best at: Iceberg for warehouse-style analytics, Delta for the Databricks domain, Hudi for streaming CDC.

Is Apache Paimon worth considering? Paimon (the Flink-native table format) is real and growing fast in the streaming community, but for a generic enterprise net-new lakehouse in 2026 it is not yet ecosystem-deep enough to displace the big three. Worth a pilot if Flink is your primary engine.

Iceberg vs Delta vs Hudi for Lakehouses: 2026 ADR

Iceberg vs Delta vs Hudi for Lakehouses: 2026 ADR

Context and Problem Statement

The Three Options

Apache Iceberg

Delta Lake

Apache Hudi

Decision Matrix

The Decision

Consequences

When to choose Delta or Hudi instead

Trade-offs and Gotchas

Practical Recommendations

FAQ

Further Reading

Related

Comments

Leave a Reply Cancel reply

Tag Cloud

Categories