Is DuckLake production-ready?

Yes. DuckLake v1.0 shipped April 2026 with backward-compatibility guarantees. Companies like Definite have been running it in production for over a year.

When should I pick Iceberg over DuckLake?

Pick Iceberg when you need multi-engine federation (Spark, Trino, Snowflake), operate at 50TB+, or have compliance mandates requiring specific catalog solutions like AWS Glue or Unity Catalog.

Can I migrate from DuckLake to Iceberg later?

Yes. The data on disk is standard Parquet. DuckLake ships COPY operations to Iceberg that handle most migration mechanics. It's a project measured in weeks, not months.

Do I need a separate catalog service for DuckLake?

DuckLake requires a database catalog (Postgres, DuckDB, MySQL, or SQLite). If you already run Postgres, the catalog is free operationally. If not, you deploy Postgres instead of deploying a Polaris or Glue catalog.

DUCKLAKE VS ICEBERG: CHOOSING YOUR LAKEHOUSE FORMAT IN 2026

20/5/2026
10-minute read
2032 words

If you’re evaluating lakehouse formats in 2026, you’re staring at the same question I was last year: DuckLake or Iceberg?

Both solve the same core problem — ACID transactions, schema evolution, and petabyte-scale analytics on object storage — but they make radically different architectural tradeoffs. Pick wrong and you’re fighting your metadata layer instead of your actual data problems.

DuckLake v1.0 shipped April 2026 with backward-compatibility guarantees. Apache Iceberg — approaching a decade of production use at Netflix, Snowflake, and AWS — is the incumbent. This isn’t a choice between good and bad. It’s a choice between two valid designs that serve different use cases. I’ll give you the decision framework so you don’t have to learn the hard way.

Who Is This Guide For?

Data engineers, platform architects, and CTOs choosing a lakehouse format for a new project or evaluating whether to migrate an existing Iceberg deployment. You know what Parquet is and you’ve probably read the Iceberg docs. What you need is a decision framework, not another explainer.

By the End of This, You’ll Know

Exactly when to pick DuckLake vs Iceberg, by data size, team count, and engine requirements
Why the catalog question is the actual question — and how each format answers it
What DuckLake v1.0’s data inlining means for streaming workloads
Where Iceberg still wins unconditionally
How to migrate between formats if you change your mind

The Verdict at a Glance

Your first question should be about scale and engine diversity. Find your row:

Workload Tier	Format	Catalog	Primary Engines
Up to 100 GB, single team	DuckLake	Postgres or DuckDB file	DuckDB-native
100 GB - 5 TB, one team	DuckLake	Managed Postgres (RDS/Cloud SQL)	DuckDB-centric + Iceberg reads via interop
1 - 50 TB, multi-team read-heavy	Either — depends on engine plans	DuckLake: Postgres. Iceberg: REST + Polaris or Lakekeeper	DuckDB-first or Spark/Trino
50 TB - 5 PB, multi-engine	Iceberg	REST catalog (Polaris, Lakekeeper) or Glue	Spark, Trino, Snowflake, Athena, BigQuery
5 PB+, regulated, multi-region	Iceberg	Your compliance-approved catalog	Whatever the org standardized on

These aren’t hard byte-count limits. The format choice tracks with how many engines, teams, and write clusters touch your data.

DuckLake v1.0: What Changed

DuckLake was first sketched as a spec in May 2025. A year later, v1.0 shipped with production guarantees. Here’s what you need to know.

Data inlining is the headline feature. When you insert fewer rows than a configurable threshold (default 10), DuckLake stores the data directly in the catalog database instead of writing a tiny Parquet file to object storage. The DuckDB Labs team published a streaming benchmark showing 926x faster queries and 105x faster ingestion compared to Iceberg on a streaming workload. Those numbers are vendor-published, not third-party validated, but the architectural advantage is real: Iceberg’s small-file problem requires compaction tooling, while DuckLake doesn’t create the problem in the first place.

Production-readiness: DuckLake v1.0 comes with backward-compatibility guarantees, a stable spec, and client implementations for DataFusion, Spark, Trino, and Pandas alongside DuckDB-native. Companies like Definite — an AI-native analytics platform — have been running it in production for over a year.

Apache Iceberg: The Incumbent

Iceberg started at Netflix in 2017 to solve a specific problem: petabyte-scale data lakes where Spark jobs needed consistent snapshots and schema evolution. It solved that problem so well that it became the industry standard.

Iceberg’s design philosophy is file-based metadata with no required external dependencies. You can put an Iceberg table on a bare S3 bucket and it works. The cost of that design freedom is that the catalog has to live somewhere, and “somewhere” turned into a five-year ecosystem race: AWS Glue, Apache Polaris, Lakekeeper, Project Nessie, Hive Metastore, Snowflake’s managed catalog, the REST catalog spec. Each one is a service you operate, integrate engines with, and monitor at 2am.

The ecosystem is Iceberg’s moat. Spark, Trino, Flink, Snowflake, Athena, BigQuery, ClickHouse (read), Dremio — every major engine reads and writes Iceberg. If you need multi-engine federation today, Iceberg is the answer, no asterisks.

The Core Architectural Difference: Catalogs

This is the single most important thing to understand about the DuckLake vs Iceberg choice.

Iceberg committed to file-based metadata. Everything — table snapshots, manifest lists, manifest files — lives as JSON and Avro files in object storage alongside your data. The catalog is just a pointer to the current metadata location. This design means Iceberg has zero required infrastructure dependencies. It also means every query traverses a tree of file reads just to figure out what to scan.

DuckLake commits to a database-backed catalog. Metadata lives in Postgres, DuckDB, MySQL, or SQLite. This single dependency buys you two things:

ACID transactions come free — they’re how databases work, not something you have to build with file-level primitives
Data inlining — small writes land directly in the catalog database instead of creating Parquet files, eliminating the small-file compaction problem entirely

Both BigQuery and Snowflake use database-as-catalog internally (Spanner and FoundationDB respectively). DuckLake is the first lakehouse format that exposes this pattern as an open spec.

graph TD subgraph "Iceberg Metadata Path" I_Catalog[REST / Glue Catalog] I_Root[Root Metadata JSON] I_ManifestList[Manifest List Avro] I_Manifest[Manifest Avro] I_Parquet[Parquet Data Files] I_Catalog --> I_Root I_Root --> I_ManifestList I_ManifestList --> I_Manifest I_Manifest --> I_Parquet end subgraph "DuckLake Metadata Path" D_Catalog[SQL Catalog - Postgres / DuckDB] D_Parquet[Parquet Data Files] D_Catalog -.-> D_Parquet end

DuckLake vs Iceberg: Side by Side

Dimension	Apache Iceberg	DuckLake
Catalog	Pointer-based (REST, Glue, Hive, Polaris)	Database-native (Postgres, DuckDB, MySQL, SQLite)
Metadata format	JSON manifests + Avro manifest lists	SQL database tables
ACID transactions	Optimistic concurrency on object storage	Database transactions
Small writes	Creates tiny Parquet files — compaction needed	Inlined in catalog — zero files
Streaming	Requires compaction tooling	Data inlining handles it natively
Engine ecosystem	Spark, Trino, Flink, Snowflake, Athena, BigQuery, Dremio, ClickHouse	DuckDB-native, DataFusion, Spark (via MotherDuck), Trino (community), Pandas
Scaling model	Horizontal through object storage	Catalog database is coordination point
Starting fresh	Deploy a catalog service (Polaris, Glue, etc.)	Bring a database you already run
Production track record	2017+, Netflix, Snowflake, AWS	2025+, Definite, select early adopters

Real-World Production Use Cases

The format debate is informed by who’s actually running each in production today. Here’s what the landscape looks like as of mid-2026.

DuckLake in Production

Definite — an AI-native analytics platform — migrated their entire infrastructure from Snowflake to DuckDB in May 2024 and adopted DuckLake as their lakehouse format. Their production system powers customer dashboards, AI agent queries, and data pipelines. Co-founder John Mark quoted the decision: “We already run Postgres for product state. Adding a Postgres-backed DuckLake catalog cost us nothing operationally — and it gave us ACID semantics over the lake without adding a service.” They published the full business case and an operator’s verdict after a year in production.

On Reddit’s r/DuckDB, multiple engineers report running DuckLake in production for analytics workloads in the “few GB per day” range, with one planning a full rollout across their data platform by end of 2026.

UK consultancy endjin published a comprehensive three-part analysis concluding DuckLake’s simplified architecture positions it as a potential disruptor to established lakehouse formats, particularly for teams that already run a database.

InfoQ covered DuckLake 1.0 as a notable data engineering milestone in May 2026, highlighting the SQL-catalog-metadata approach as a fundamental rethinking of lakehouse architecture.

Iceberg in Production

Netflix created Iceberg in 2017 to solve a specific problem: Spark jobs needing consistent snapshots across petabytes of data in S3. It worked so well they open-sourced it, and it became the industry standard. Netflix remains one of the largest Iceberg deployments, operating at multiple-petabyte scale with multi-region replication.

Apple, LinkedIn, and Airbnb all run Iceberg in production. Airbnb presented their migration journey at the 2025 Iceberg Summit, covering how they moved from Hive to Iceberg for their data lakehouse. Qlik’s report on Iceberg adoption cites these companies as reference deployments powering both analytics and AI workloads.

Snowflake natively reads and writes Iceberg tables — both managed catalogs and external tables. This integration alone makes Iceberg the default choice for any Snowflake-centric shop.

AWS Glue and Athena have deep Iceberg support. AWS doubled down on Iceberg as the open table format for their data lakehouse strategy.

The pattern is clear: Iceberg dominates at hyperscale with multi-engine, multi-team deployments. DuckLake is winning where teams run Postgres, use DuckDB as their primary engine, and value operational simplicity over ecosystem breadth. Both are legitimate choices for their respective use cases.

When to Pick Each

Pick DuckLake when:

You already run a database. If your stack includes Postgres, DuckLake’s catalog is just another schema. For a small team, “the catalog is free” is a meaningful operational unlock.

Your workload is AI-agent-driven. A human analyst runs maybe 50 queries a day. An AI agent doing schema inspection, query planning, and iterative refinement runs thousands. Iceberg’s metadata path walks S3 objects per read; DuckLake’s is a single SQL query. At human scale the difference is invisible. At agent scale, it compounds.

You’re building a DuckDB-centric stack. If DuckDB is your primary query engine and you don’t need Spark or Trino, DuckLake is the natural fit. For an in-depth look at DuckDB’s analytical capabilities, see our DuckDB guide.

You have streaming workloads with frequent small writes. DuckLake’s data inlining means you don’t need compaction tooling. The small-file problem is solved at write time, not patched by a maintenance job.

Pick Iceberg when:

You need multi-engine federation. If Spark, Trino, Snowflake, and Athena all need to read the same tables today, Iceberg is the only answer.

You’re already on Snowflake with Iceberg tables. Snowflake reads and writes Iceberg natively. Migration costs almost certainly outweigh the design wins. Run the cost numbers before you touch anything.

Your compliance team has a catalog mandate. If they’ve signed off on Glue or Unity Catalog as the system of record, you don’t get to swap in Postgres. That’s an audit decision, not a technical one.

You’re operating at 50TB+ with multi-write-cluster workloads. Iceberg’s optimistic concurrency on object storage scales horizontally without a single coordinator. DuckLake’s catalog database is a coordination point — fine at low-to-mid scale, but a bottleneck at the high end.

The Two Formats Are Converging

Here’s what doesn’t fit on a vendor slide: both formats store data as Parquet files. The bytes on disk are identical. A Parquet reader doesn’t know or care which catalog wrote them.

DuckLake 0.3 shipped Iceberg interoperability in September 2025: you can COPY data and table metadata between DuckLake and Iceberg in either direction. DuckLake’s deletion vectors are designed to be Iceberg-compatible.

On the Iceberg side, the V4 spec work is exploring pluggable catalog backends. A DuckLake-style RDBMS catalog could plausibly fit inside a future Iceberg spec. Whether that happens depends on community direction, but the architectural drift is real.

In eighteen months, the “DuckLake or Iceberg” question may matter less. The right move is to pick what fits your team now, knowing the migration cost in either direction is bounded.

Migration Path Between Formats

If you pick DuckLake and want to switch to Iceberg later, the data on disk doesn’t move — it’s Parquet. You export the catalog metadata, write it as Iceberg manifests, and point a catalog service at the result. DuckLake ships COPY operations to Iceberg that handle most of the mechanics.

It’s a real project — measured in weeks, not months — but it isn’t a rewrite. The migration cost in either direction is bounded.

If you’re inheriting an existing Snowflake-on-Iceberg deployment, the migration cost almost certainly outweighs the benefits. Stay where you are. The format wars converge anyway.

What You Can Actually Use Today

DuckDB v1.5.2 includes the ducklake extension — run FORCE INSTALL ducklake; LOAD ducklake; and you’re running
DuckLake v1.0 is the production-ready spec with backward-compatibility guarantees
Apache Iceberg is available through every major query engine and cloud vendor — no installation needed
Apache Polaris is now a top-level Apache project (as of April 2026) for Iceberg catalog management
For a complete managed lakehouse, Definite and MotherDuck offer DuckLake-native platforms

Need help choosing your lakehouse architecture?

I advise engineering teams on data platform architecture, lakehouse migration, and infrastructure strategy. If you’re evaluating DuckLake vs Iceberg for a real deployment, let’s talk.

Book a Consultation

data-engineering iceberg duckdb lakehouse comparison