Apache Iceberg Explained: The 2026 Data Lakehouse Standard

You may have heard the term “data lake” thrown around a lot lately. Maybe you’ve heard that it’s getting messy, or slow, or unreliable. You might even have heard teams describe their data lake less as a lake and more as a swamp, petabytes of Parquet files, broken Monday-morning reports, schema drift no one can explain, and pipelines that mysteriously fail halfway through the night.

That is not an exaggeration. For years, that was the lived reality for data engineers at some of the world’s largest companies. And it’s the exact problem that led a team at Netflix to build something that has since changed the entire field of data engineering.

Apache Iceberg Explained

1 / 10

Introduction

The Open Table Format

Apache Iceberg is a high-performance format for huge analytic datasets. It brings the reliability and simplicity of SQL tables to big data, working seamlessly with engines like Spark, Trino, and Flink.

⚡ The Core Concept

Iceberg acts as a metadata layer that sits between your compute engines and your underlying data files (like Parquet), providing a single, reliable view of your data.

The Problem

The Hive Legacy

Traditionally, data lakes relied on Hive tables, which used directory-based partitioning. This led to endless maintenance, broken queries if a file was accidentally moved, and poor performance when partition columns weren’t explicitly specified in queries.

🔍 Hidden Culprit

Hive’s lack of atomic transactions meant that concurrent reads and writes could result in users seeing partial or corrupted data during ETL processes.

Enter Iceberg

A Metadata Revolution

Iceberg solves these issues by moving partitioning logic into the metadata layer. It tracks every file in the table using a sophisticated tree structure, making partitions invisible to the user but highly optimized for the engine.

🛠 Action Step

By tracking data files rather than directories, Iceberg allows for schema evolution and hidden partitioning without needing to rewrite your historical data.

Architecture

The Snapshot Model

Iceberg uses a hierarchical metadata structure: Metadata file -> Manifest List -> Manifest Files -> Data Files. Every change to the table creates a new snapshot.

🧠 AI Advantage

This snapshot model means readers never see partial writes. They only ever read a consistent, committed version of the table, guaranteeing isolation.

Hidden Partitioning

No More Manual Partitions

In Iceberg, you define partition rules, but users don’t need to know them. If a query filters by a timestamp, Iceberg automatically prunes the relevant files, even if the partition is based on the day or hour.

📝 Essential List

This prevents user errors, eliminates the need to write partition-aware SQL, and drastically speeds up query performance by skipping irrelevant data files.

Schema Evolution

Evolve Without Breaking

Iceberg supports full schema evolution. You can add, drop, or rename columns without rewriting underlying data files. It even supports column ordering changes out of the box.

⚙️ Setting Change

This is achieved by using IDs for columns in the metadata rather than relying on column names or positions, ensuring backward compatibility for old data files.

Time Travel

Query The Past

Because every change creates a snapshot, Iceberg enables “Time Travel.” You can query the table exactly as it existed at a specific timestamp or snapshot ID, making auditing and reproducibility effortless.

🛡 Anti-Spam Checklist

If a bad ETL job corrupts your data, you can easily rollback the table to a previous snapshot with a single command, ensuring disaster recovery is painless.

ACID Transactions

True Data Reliability

Iceberg brings ACID (Atomicity, Consistency, Isolation, Durability) guarantees to data lakes. Concurrent writers can safely operate on the same table using optimistic concurrency control.

📉 The Impact

Say goodbye to “partial reads.” Readers always see a consistent snapshot of the data, while writers can retry failed transactions without corrupting the table state.

Multi-Engine Support

The Open Standard

Iceberg is an open standard. You can write data using Apache Spark, read it with Trino or Presto, and manage catalogs using Flink—all operating on the exact same underlying Parquet files without vendor lock-in.

💡 Mindset Shift

This decouples storage from compute. You can choose the best engine for the specific job while maintaining a single source of truth for your data.

Next Steps

Build Your Lakehouse

Apache Iceberg is the foundation of the modern Data Lakehouse. It provides the reliability of a data warehouse with the scale and cost-effectiveness of a data lake.

📖 Deep Dive

Ready to implement? Read the full, step-by-step technical guide on DSN Daily to learn how to create your first Iceberg table and start querying it today.

That something is Apache Iceberg.

But here’s the thing most articles get wrong about Iceberg. They frame it as a faster replacement for an older tool, or a cleaner way to organize files in S3. That framing misses the point entirely. Apache Iceberg is not really about files at all. It is about giving analytical data a consistent set of behaviors, transactions, versioning, schema safety, multi-engine access, that work the same way regardless of where your data lives or which tool is reading it.

Think of it as the difference between a pile of books on the floor and a proper library with a catalog system. The books are the same. The data is the same. But what you can do with them, and how reliably you can do it, is completely different.

**Data lake problems and Apache Iceberg**

In this guide, you’ll walk away understanding not just what Iceberg is, but why it was built, how it actually works under the hood, and why the choices it makes in its metadata design ripple out into real operational advantages for engineering teams every single day.

📌 What You’ll Learn in This Guide

Why legacy Hive table formats broke down at scale, and what that actually felt like
How Apache Iceberg’s three-layer metadata architecture solves those problems
What hidden partitioning, schema evolution, and time travel really mean in practice
How Iceberg compares to Delta Lake and Apache Hudi by workload type
What the V3 spec and the REST Catalog mean for the future of your data stack
The maintenance work Iceberg doesn’t eliminate (and what to do about it)

The Library That Couldn’t Keep Up: A Brief History of Big Data’s Growing Pains

To understand why Iceberg exists, you have to understand what came before it. And to understand that, it helps to think about data management the same way you’d think about running a library.

A library, when you strip it down, has three jobs. It needs storage, somewhere to put all the content. It needs processing power, some way to fulfill a visitor’s request. And it needs metadata, an organized system (like the Dewey Decimal System) that tells the librarian where everything is. These three components map almost perfectly onto any data management system. The only difference is scale.

In the early 2000s, the internet changed everything. Suddenly, organizations were generating more data than any single machine could handle. In 2005, Apache Hadoop was open-sourced in response. It gave companies a way to distribute storage and processing across dozens or hundreds of machines using the Hadoop Distributed File System (HDFS) and a parallel processing model called MapReduce. If you needed more capacity, you just added another machine to the cluster.

It was a genuine breakthrough. But MapReduce had a problem. Writing a MapReduce job meant writing a Java program. Data analysts who were comfortable with SQL suddenly found themselves facing a completely different language, like walking into a library and discovering that you and the librarian don’t speak the same language at all.

In 2008, Apache Hive arrived to fix that. Its main draw was its ability to translate SQL-like queries into MapReduce jobs. It also came with a bonus: the Hive Metastore, a metadata database that stored pointers to groups of files in the underlying file system. Now you could write a SQL query, Hive would check its metastore for a shortcut to the relevant files, and the job would go off to MapReduce. The librarian finally had a cheat sheet.

This worked well for a while. Then the 2010s arrived, and with them, a new wave of scale.

Smartphones and IoT devices were generating data at rates no one had anticipated. Organizations started migrating away from on-premises HDFS clusters toward cloud-based object storage, primarily Amazon S3, because it was cheaper and easier to scale. But Hive had been built with HDFS in mind. S3, which is fundamentally a key-value store rather than a true file system, behaved differently in ways that exposed Hive’s architectural assumptions.

Hive identified tables by their directory structure. A table partitioned by date lived in folders like /table/year=2024/month=01/day=01/. To find the right files for a query, the processing engine had to perform recursive directory listings. Against HDFS, this was slow. Against S3, which required thousands of separate API calls to simulate directory traversal, it became painful and sometimes inconsistent, because S3 at that time used an eventually consistent model where a freshly written file might not immediately appear in a listing.

There was a third problem on top of those two. Real-time, on-demand query engines like Presto were gaining popularity over traditional scheduled batch processing. Hive’s architecture wasn’t designed for that kind of fast, interactive workload. It was built for batch jobs that could tolerate minutes of overhead. And to make things harder, organizations weren’t willing to throw away their existing Hive setups entirely. They needed something that could coexist with batch jobs in Spark while also supporting fast interactive queries and cloud storage.

Three problems. One solution.

What Apache Iceberg Actually Is (And What It Isn’t)

In 2017, a team of engineers at Netflix, led by Ryan Blue and Dan Weeks, reached a breaking point. Their Hive-based data lake had grown to a scale where partition management alone was causing serious operational problems. A table with tens of thousands of partitions could take minutes just to plan a query. Worse, there was no way to atomically update a table. If an ETL job failed halfway through writing a partition, the table was left in a broken state, often requiring manual cleanup to prevent downstream corruption.

So they started building what would become Apache Iceberg, and they donated it to the Apache Software Foundation, where it has since become one of the most important open-source projects in the data engineering world.

Here’s the key insight behind Iceberg, stated plainly: rather than providing its own storage or compute layer, Iceberg is a layer of metadata that sits between your processing engines and your data files. It doesn’t care whether your data lives in S3, GCS, HDFS, or Azure Blob Storage. It doesn’t care whether your query engine is Spark, Trino, Flink, or Snowflake. As long as every component in your ecosystem understands Iceberg’s metadata language, they can all work together safely on the same tables.

📚 Recommended Insight

Azure Storage Fundamentals: Blob, File, Table, and Queue Services Explained

Master Azure Storage, Blob, File, Table, and Queue Storage with this expert guide. Learn architecture, tiers, redundancy, design patterns, failure modes, and real workload examples.

Read the Full Article →

Think of it as upgrading your library’s catalog system. The books haven’t moved. The librarians haven’t been replaced. But now the catalog is so detailed, so well-organized, and so universally understood that any librarian, no matter which department they work in, can find exactly the right books, handle reservations without conflicts, and even look back through the historical record to see what the collection looked like last Tuesday.

That’s Iceberg. And it’s a much bigger idea than “faster queries.”

The Three-Layer Architecture: How Iceberg Actually Works

The architectural brilliance of Apache Iceberg comes down to one design decision: it completely decouples the logical state of a table from the physical organization of files on disk. To understand how that works, you need to understand Iceberg’s three-layer hierarchy.

According to the official Apache Iceberg documentation on the project’s specification page, the format organizes all table information into three distinct layers: the Catalog, the Metadata Layer, and the Data Layer. Each layer has a specific job, and together they provide guarantees that Hive could never offer.

The Catalog Layer

The Catalog Layer is the entry point. Think of it as the front desk of the library. When any engine wants to read or write an Iceberg table, it starts here. The catalog maintains a single, atomic pointer, essentially a “current address”, pointing to the latest version of the table’s metadata.

When a writer commits a change, it creates a new metadata file and asks the catalog to atomically swap its pointer to the new address. Either that swap succeeds completely, or it doesn’t happen at all. There are no partial commits, no broken intermediate states.

Beneath the catalog sits the Metadata Layer, and this is where Iceberg gets genuinely clever. The metadata layer is made up of three nested file types.

The Metadata File

The Metadata File is a JSON document that holds the table’s schema, partition specifications, sort orders, and a full history of snapshots. Every time a change is committed, a new metadata file is written rather than overwriting the old one. This immutability is intentional, it’s what enables time travel and rollback, which we’ll get to shortly.

The Manifest List is an Avro-encoded file that represents a specific snapshot. It lists all the manifest files that belong to that snapshot, along with summary statistics, like the range of values for partitioned columns, for each manifest. This allows query engines to perform “manifest pruning,” skipping entire manifest files that couldn’t possibly contain the data they need.

The Manifest Files are also Avro-encoded and provide the most detailed level of metadata. Each manifest file lists individual data files, along with their partition values and column-level statistics, specifically, the minimum and maximum values for each column, plus null counts.

This is the mechanism that enables file-level pruning. If your query filters for WHERE price > 500, Iceberg can look at the statistics in the manifest file and skip any data file where the maximum price is below 500. No file is opened unnecessarily.

🗂️ Apache Iceberg: Three-Layer Architecture at a Glance

CATALOG LAYER

Atomic pointer to current metadata — the single source of truth for every engine

↓

METADATA LAYER

Metadata File (JSON)
Schema, snapshots, partition specs

Manifest List (Avro)
Snapshot index + partition bounds

Manifest Files (Avro)
Per-file stats: min/max, nulls, path

↓

DATA LAYER

Immutable data files (Parquet, ORC, Avro) + Delete files — stored in S3, GCS, ADLS, or HDFS

Data Layer

The bottom layer, the Data Layer, is where the actual data lives, typically in Apache Parquet files, though ORC and Avro are also supported. One important detail: data files in Iceberg are immutable. Once written, they are never modified. Updates and deletes are handled through “delete files” that track which rows have been logically removed, rather than rewriting the original files. That immutability is another key reason the whole system is safe to use concurrently.

Key takeaway: The query engine no longer has to scan directories. It reads metadata, prunes aggressively at each level, and only opens the exact files it needs. For large tables, this difference in planning time can be 10 to 50 times faster than the old directory-listing approach.

From Hive to Iceberg: What the Migration Actually Changes

Dimension	Apache Hive (Legacy)	Apache Iceberg (Modern)
How files are tracked	Directory paths and folder structure	Hierarchical metadata tree (Catalog → Manifest)
Query planning speed	Scales with partition count (slow at scale)	Metadata-driven — prunes at manifest level
Transaction safety	No atomic writes — partial failures common	Full ACID with snapshot isolation
Schema changes	Risky; can break jobs or require rewrites	Safe field-ID-based evolution; zero rewrites
Partitioning	Explicit — users must filter on partition columns	Hidden — engine prunes automatically
Cloud-native fit	Built for HDFS; poor S3 performance	Designed for object storage from day one

Hidden Partitioning: The Feature Most Teams Underestimate

Here is one of those ideas that sounds like a minor convenience until you work with it, and then you realize it changes how you think about table design entirely.

Iceberg hidden partitioning evolution — **Iceberg hidden partitioning evoluation**

In Hive, physical partitioning leaked into your SQL. If a table was partitioned by a date column, users had to know that and explicitly include a filter on that column in every query. Not because the query logic demanded it, but because the engine needed it to find the right folder. A query that forgot to include WHERE event_date = ‘2024-01-15‘ could end up scanning the entire table, all years, all months, all days.

This required users to essentially memorize the physical layout of every table they queried. It also created a subtle modeling trap: partition columns often had to be stored twice, once in the raw data and once as a synthetic column matching the folder structure. Schema changes to partitioning meant touching every file in the table, coordinating across every upstream and downstream team, and usually taking the table offline during migration.

Apache Iceberg eliminates all of that with hidden partitioning. You define a partition transform in the metadata, something like months(order_date) or bucket(8, user_id), and Iceberg handles everything automatically from that point on. Users query raw data columns like order_date, and Iceberg’s metadata layer silently maps that filter to the right set of manifest files and data files. No synthetic columns. No required partition filters in user SQL. No knowledge of physical layout needed.

But the deeper insight, the one most articles miss, is that this also makes tables dramatically safer to evolve over time.

With Hive, a change in query patterns (say, shifting from monthly to daily reporting) meant a full table rewrite. With Iceberg’s partition evolution feature, you can update the partition specification as a metadata-only operation. Old data files remain in their original layout, still associated with the old partition spec.

New data is written with the new spec. Iceberg applies the correct pruning logic to each set of files automatically, without any data movement, using a technique called split planning. A table can have multiple partition specs coexisting seamlessly, and the transition happens invisibly to the people running queries.

That’s not just a performance feature. That’s a governance feature. It means the people who design tables don’t have to predict the future perfectly on day one.

Key takeaway: Hidden partitioning reduces query complexity, eliminates modeling mistakes, and allows data layouts to evolve without the enormous cost of full table rewrites.

Iceberg as a Time Machine: Snapshots, Time Travel, and Rollback

What if a table could remember every version of itself?

That question sounds rhetorical, but it’s actually the core of one of Iceberg’s most practically valuable features. Because Iceberg writes a new, immutable metadata file every time a change is committed, and because it preserves the history of snapshots in that metadata, every table automatically maintains a version history.

This is called time travel, and it works exactly as the name implies. You can query the state of a table at any point in the past, down to the exact timestamp or snapshot ID, using simple SQL syntax. That’s useful for debugging (why did the report look different last Tuesday?), for auditing (what data existed before that deletion ran?), and for machine learning reproducibility (what was the exact training dataset used for this model version?).

Even more powerful is the rollback capability. If a faulty ETL job corrupts a table, or someone accidentally deletes records they shouldn’t have, an administrator can revert the table to a previous snapshot instantly. The operation is nearly instantaneous because it’s a metadata-only change. No data files are moved or deleted. The catalog pointer is simply updated to point to an earlier snapshot, and the table is immediately back to its previous state.

The research from LY Corporation’s 2025 Iceberg Summit presentation offers a striking real-world example. Operating at hundreds of petabytes with over 10 million partitions, LY’s team reported that Iceberg’s snapshot-based model enabled partial data deletion for compliance workflows and dramatically reduced their Hive Metastore pressure, two problems that had previously required custom tooling and significant manual intervention.

This “data as a sequence of immutable states” mental model is worth sitting with for a moment. It’s the same design philosophy that makes Git so powerful for code. Every commit is preserved. Every branch is trackable. Merging and rolling back are first-class operations. Iceberg brings that same philosophy to analytical data, and the operational implications are just as significant.

ACID Transactions and Multi-Engine Safety: The Real Business Case

Let’s talk about what “ACID transactions” actually means in practice for a data engineering team. Because the academic definition, Atomicity, Consistency, Isolation, Durability, can feel abstract until you’ve spent a Saturday morning manually cleaning up a table that a Friday-night ETL job left in a partially written state.

Iceberg uses snapshot isolation and optimistic concurrency control to ensure that every reader gets a consistent view of the data and that concurrent writers don’t corrupt each other’s work. When a query starts, it pins itself to the current snapshot ID. Any changes committed by other writers after that point are invisible to the running query, it sees a stable, frozen view of the table for its entire duration.

For writes, Iceberg assumes that concurrent changes won’t conflict (hence “optimistic”), and each writer attempts to commit its new snapshot against the catalog. If two writers collide, the catalog detects the conflict. If the changes are compatible, the second writer retries with the updated state. If they’re incompatible, the commit fails cleanly, no partial writes, no corruption.

A practitioner case study reported by the data engineering community found that teams running Spark batch jobs and Flink streaming jobs simultaneously against the same Iceberg tables experienced near-zero concurrent write conflicts and trivially fast failure recovery. Before Iceberg, coordinating multiple engines on the same dataset required careful scheduling, locking mechanisms, or most commonly, simply not doing it. After Iceberg, it became routine.

This is the operational friction argument, and it’s actually more compelling than the query speed argument for most engineering teams. Faster queries are nice. Not spending your Friday nights recovering from pipeline failures is transformative.

Schema Evolution Without the Fear

One more architectural detail deserves its own spotlight: stable field IDs.

In most data formats, columns are identified by their name or position. Rename a column, and older files that still use the original name might be misread. Reorder the schema, and positional readers break. Add a column in the middle of a schema, and you’ve potentially invalidated every downstream job that relied on column position.

Iceberg solves this permanently by assigning every column a unique, stable integer ID at creation time. When you rename a column called user_name to full_name, Iceberg updates the human-readable name in the metadata, but the underlying identifier , say, field ID 5, stays the same. Older Parquet files that still carry the header user_name are read correctly because the engine knows that field ID 5 is now called full_name. The data is never touched. The mapping just updates.

This makes the following schema operations completely safe, with no data rewrites required:

Adding a new column (at any position)
Renaming an existing column
Dropping a column
Reordering columns
Widening a column’s type (e.g., int to long)

Each operation is independent and free of side effects. That sounds basic, but for teams managing tables that are touched by dozens of upstream producers and downstream consumers, it’s the difference between a safe ALTER TABLE statement and a multi-team coordination project.

The Specification Roadmap: V1 Through V3 (and What’s Coming in V4)

📅 Apache Iceberg Specification Timeline

V1 (2017–2020) — The Analytic Foundation

Core metadata hierarchy, hidden partitioning, schema evolution, atomic snapshots. Optimized for append-only analytical workloads. The solid ground floor.

V2 (~2022) — Row-Level Mutations

Introduced Merge-on-Read with positional and equality delete files. Enabled efficient row-level updates and deletes without full file rewrites. Made transactional workloads practical.

V3 (Finalized May 2025) — High Performance + New Types

Deletion vectors (via Puffin files) for near-copy-on-write read speed. Row Lineage for CDC pipelines. Variant type for semi-structured data. Geospatial support. Nanosecond timestamps.

V4 (Proposed, 2026) — Metadata Scalability

Single-file commits to reduce I/O for high-frequency writers. Parquet-based metadata files to enable columnar metadata reads. Potentially another order-of-magnitude improvement in planning speed.

The progression from V1 to V3 tells a clear story: Iceberg started as a solution for large-scale analytics (append-heavy, batch-oriented), and has systematically expanded to support transactional, streaming, CDC, and now geospatial and semi-structured workloads, without abandoning the core metadata design that made it reliable in the first place.

**Apache Iceberg Specification Timeline**

The V3 deletion vector feature deserves a specific mention because it addresses one of the real performance tradeoffs of Merge-on-Read. In V2, reading a table with many small delete files required the engine to apply each delete file during the read, which increased latency. V3 replaces those small delete files with highly compressed binary bitmaps stored in Puffin files. These bitmaps are far faster for engines to apply, meaning you get most of the write efficiency of Merge-on-Read with read performance much closer to Copy-on-Write.

📚 Recommended Insight

Copy-on-Write vs Merge-on-Read: The Real Trade-Offs Nobody Explains Properly

Discover the real trade-offs between Copy-on-Write and Merge-on-Read in Apache Iceberg, Hudi, and Delta Lake, covering deletion vectors, compaction, GDPR compliance, and decision frameworks.

Read the Full Article →

Apache Iceberg vs. Delta Lake vs. Apache Hudi: Choosing by Workload, Not Brand

Data lakehouse ecosystem open table — **Data lakehouse ecosystem**

The modern data lakehouse ecosystem has three primary open table formats competing for adoption. All three solve the core problem of ACID transactions on object storage. But they reflect genuinely different design philosophies, and the right choice depends on your workload, your infrastructure, and your governance requirements.

Dimension	Apache Iceberg	Delta Lake	Apache Hudi
Origin & DNA	Netflix — open analytics	Databricks — Spark-unified	Uber — streaming/CDC
Governance	Apache Software Foundation (multi-vendor)	Linux Foundation (Databricks-led)	Apache Software Foundation
Engine Ecosystem	Broadest — Snowflake, BigQuery, Athena, Trino, Spark, Flink, Dremio	Deep Spark/Databricks; cross-format via UniForm	Strong Spark/Flink; cross-format via XTable
Best For	Multi-engine, governance-heavy, open architectures	Spark-first teams on Databricks	High-frequency ingestion, CDC pipelines
Partitioning Model	Hidden partitioning + partition evolution	Liquid Clustering (column-order based)	Coarse partitions + record-level indexes

📚 Recommended Insight

DuckLake Explained: Why Putting Metadata Back in a Database Might Fix the Lakehouse

DuckLake stores lakehouse metadata in SQL, not scattered JSON files. Here’s how it works, what it costs, and when it beats Iceberg or Delta Lake.

Read the Full Article →

The honest summary: if your team lives in Databricks and Spark, Delta Lake is the path of least resistance and has excellent tooling. If your primary need is high-frequency upserts from a CDC pipeline, Hudi’s indexing model is genuinely more mature. If you are building a multi-engine, vendor-neutral architecture, or if you need the broadest interoperability across cloud warehouses and open-source engines, Iceberg is the strongest choice.

Importantly, the three formats are converging. Delta’s UniForm feature generates Iceberg metadata on top of Delta tables for cross-format reads. Hudi’s XTable supports multi-directional translation. The practical reality for enterprises in 2026 is that you will likely encounter all three formats and need tooling that can read any of them.

📚 Recommended Insight

Apache Hudi: The Complete Guide to Transactional Data Lakehouses

Master Apache Hudi’s architecture, COW vs MOR tables, incremental processing, and indexing strategies. Build faster, smarter data pipelines with this expert guide.

Read the Full Article →

What Iceberg Does Not Solve: Maintenance, Cost, and Governance Tradeoffs

This is the section most Iceberg articles skip, and it’s arguably the most useful one if you’re about to deploy this in production.

Iceberg does not eliminate table stewardship. It changes the nature of it.

With Hive, maintenance burden was largely reactive, you cleaned up after failures, manually reorganized partitions, and dealt with metastore corruption. With Iceberg, maintenance is more deliberate and predictable, but it still requires consistent attention.

The most common operational issue in Iceberg is small file accumulation. Streaming ingestion pipelines, in particular, can produce thousands of tiny files per hour.

Without regular compaction, query performance degrades over time as the engine is forced to open and read many small files instead of a few large ones. The recommended target is data files in the 100MB to 1GB range. Organizations should build compaction jobs into their table maintenance schedule, not as a one-off, but as a regular automated workflow.

The second issue is snapshot accumulation. Because Iceberg preserves every snapshot, storage costs grow over time. Snapshot expiration, the process of removing old snapshots and the files they reference from active use, needs to be run regularly. This is not a delete operation on your live data; it only removes files that are no longer referenced by any retained snapshot.

Third, orphan file cleanup matters. If a write job crashes before it can register its files in a new snapshot, those files land in storage without being pointed to by any metadata. Over time, orphaned files can represent significant storage waste. Iceberg provides a built-in procedure to identify and remove them.

A practical maintenance schedule for a production Iceberg table might look like this:

Run compaction after every major batch load, or on a daily schedule for streaming tables
Run snapshot expiration on a weekly schedule, retaining 7–14 days of history by default
Run orphan file cleanup on a monthly schedule as storage hygiene

The key mental shift: you’re not fighting fires. You’re running a managed lifecycle. That’s a fundamentally better situation than Hive, but it is not zero work.

The REST Catalog Revolution: Why Engine Neutrality Is Now Real

One of the most consequential developments in the Iceberg ecosystem, and one of the least discussed in introductory articles, is the Iceberg REST Catalog specification.

Before the REST spec, every query engine that wanted to talk to an Iceberg catalog had to include a language-specific library for each catalog type. A Trino cluster needed Glue-specific code. A Spark job needed Hive Metastore-specific code. This created a maintenance burden and a coupling between compute engines and catalog implementations that undermined the whole point of an open format.

The REST Catalog specification, introduced in 2022, changes this entirely. It defines a simple, standardized HTTP API that any service can implement to become an Iceberg catalog. Any engine that speaks HTTP, which is all of them, can now talk to any catalog through the same interface. The catalog becomes a neutral, independently scalable service rather than a library bundled into each processing engine.

This is what makes platforms like Apache Polaris and Databricks Unity Catalog meaningful as governance layers. They implement the REST Catalog spec and can manage Iceberg tables across any cloud, any storage backend, and any engine. A table registered in Polaris can be queried from Snowflake, Spark, Trino, and Flink using the same credentials and the same endpoint, with consistent permissions enforced at the catalog level.

The credential vending feature built on top of this is equally important for enterprise security. Rather than granting every engine a broad IAM role with access to an entire S3 bucket, the REST Catalog vends temporary, scoped access tokens for the specific files required for each query. Access control is enforced centrally, consistently, and in a way that’s decoupled from the compute layer.

As the AWS documentation on S3 authorization models notes, short-lived, scoped credentials are the recommended approach for secure object storage access. Iceberg’s credential vending model aligns directly with that recommendation.

📚 Recommended Insight

Apache Iceberg Catalog Explained: REST Spec, Architecture, and How to Choose the Right Strategy

Learn how the Apache Iceberg catalog works as a control plane, why the REST spec is the new standard, and how to choose the right catalog for your lakehouse in 2026.

Read the Full Article →

Common Mistakes Teams Make When Adopting Iceberg

Even with all of Iceberg’s design advantages, there are a handful of recurring mistakes that engineering teams make during adoption.

Skipping compaction

The first and most common is skipping compaction. Teams migrate to Iceberg, enjoy the improved reliability and schema safety, and then wonder why query performance degrades over the following months. The culprit is almost always small file accumulation from streaming ingestion or frequent small batch loads. Build compaction into your pipeline from day one, not as an afterthought.

Over-partitioning

The second mistake is over-partitioning. Coming from Hive, many teams instinctively create highly granular partition schemes, partitioning by hour or by minute, because fine-grained partitioning was the primary performance lever in the old model. In Iceberg, the metadata’s column-level statistics handle a lot of that pruning work automatically.

Over-partitioning leads to enormous numbers of manifest entries and actually degrades planning performance. Start with coarser partitioning (daily or monthly) and let the statistics do the rest.

Treating the catalog as an afterthought

The third mistake is treating the catalog as an afterthought. Which catalog implementation you choose, AWS Glue, Hive Metastore, Nessie, Polaris, or a REST-compatible service, has significant implications for performance, access control, and cross-engine compatibility. Teams that start with a simple file-based catalog and try to migrate to a service-based catalog later often find the process harder than they expected. Plan your catalog strategy early.

Not setting snapshot retention policies

The fourth mistake is not setting snapshot retention policies. Running without snapshot expiration is like never clearing your email trash folder. Eventually the accumulated metadata becomes a problem. Set retention policies on day one and automate the cleanup jobs.

Finally, many teams conflate Iceberg with the lakehouse architecture broadly. Iceberg is the table format layer. It does not provide query optimization, caching, data virtualization, or data quality monitoring on its own. Those capabilities come from the engines and platforms built on top of Iceberg. Understanding the boundary between what Iceberg does and what the surrounding stack does helps teams set realistic expectations and make better architectural decisions.

Where Iceberg Fits: Batch, Streaming, CDC, and AI Pipelines

One of the strongest arguments for Iceberg as a strategic choice, rather than just a technical one, is that it can serve as the single table format for multiple temporal workload types simultaneously.

Batch analytics has always been Iceberg’s home turf. Large Spark jobs reading and writing Parquet files against Iceberg tables benefit from fast query planning, safe schema evolution, and efficient partition pruning.

Apache Iceberg - Strategic Single Table Format — **Apache Iceberg – Strategic Single Table Format**

Streaming ingestion via Apache Flink works well with Iceberg’s dynamic sink, which can automatically handle schema evolution from an incoming data stream and write data in micro-batches while preserving transactional integrity. The V3 row lineage feature makes Flink-to-Iceberg pipelines even more capable for incremental processing.

Change Data Capture pipelines, where operational database changes are continuously replicated into the lake, are a natural fit for Iceberg’s row-level mutation support, especially with the V3 deletion vectors improving the read performance of frequently updated tables.

And increasingly, Iceberg is appearing in AI and ML data pipelines. The combination of time travel (for reproducible training datasets), schema safety (for feature store management), and multi-engine access (for the diverse tooling used in ML workflows) makes it a strong foundation for governed AI data infrastructure.

According to O’Reilly’s published coverage of Iceberg deployment patterns, BI reporting, data quality workflows, and CDC represent the three most common production use cases, which aligns well with the broad workload coverage the format was designed to support.

The 2026 Horizon: What’s Coming Next

The Iceberg community is actively working on two major capabilities for the near term.

The first is server-side scan planning. Currently, when a query engine executes a plan against an Iceberg table, it downloads the manifest files and performs the file-level pruning itself. A proposed addition to the REST Catalog specification would allow the engine to delegate this work to the catalog service.

The catalog, which can maintain in-memory caches of manifest data, returns a pre-pruned list of data files directly. This reduces metadata transfer over the network and can dramatically accelerate planning on tables with millions of files.

The second is interoperable views. Today, a SQL view created in Spark may not be readable by Trino because the SQL dialects differ. The community is developing a standard for storing view definitions in an engine-agnostic intermediate format, using frameworks like SQLGlot for dialect translation.

Similarly, work is progressing on a common metadata format for materialized views, allowing one engine to maintain a pre-computed result and another engine to query it, with the metadata layer tracking freshness relative to the base tables.

Both of these developments push Iceberg further toward its ultimate vision: a complete, engine-neutral control plane for analytical data that makes the storage layer truly invisible to the tools built on top of it.

Conclusion: A New Operating Model, Not Just a Better File Format

Apache Iceberg’s journey from a Netflix engineering project in 2017 to the foundational standard of the modern data lakehouse is not a story about files or formats. It is a story about what happens when you design a system around the right abstraction.

The old abstraction was the directory. It was simple, universally understood, and completely inadequate for the demands of multi-engine, multi-cloud, transactional analytical workloads at scale.

The new abstraction is the snapshot, a complete, immutable, metadata-rich description of a table’s state at a point in time. That abstraction gives you transactions, time travel, schema safety, hidden partitioning, multi-engine access, and a governance model that scales from a single team to an entire enterprise.

The shift Iceberg enables is from managing files to managing states. And once you make that shift, the full set of capabilities becomes available: safe concurrent writes, instant rollback, reproducible queries, evolving schemas, and a table that any engine in your stack can read with equal confidence.

For teams evaluating this decision today, it is worth noting that adopting Iceberg is no longer a cutting-edge choice. It is the industry-standard choice. Snowflake, BigQuery, AWS Athena, Trino, Spark, Flink, and Dremio all support it natively. The REST Catalog specification has made cross-engine interoperability practical. The V3 specification has extended the format’s reach into streaming, CDC, and semi-structured data.

The question is no longer whether Iceberg is ready for production. The question is whether your organization is ready to stop treating your data lake as a collection of files and start treating it as something more valuable: a structured, governed, versioned system that any authorized tool can interact with safely, today and five years from now.

That is the real promise of Apache Iceberg. And based on where the ecosystem is heading, it is a promise that is being kept.

❓ FAQ: Apache Iceberg — Frequently Asked Questions

What is Apache Iceberg in simple terms?

Apache Iceberg is an open table format for large analytical datasets. It adds a structured metadata layer between your data files (stored in S3, GCS, or HDFS) and the query engines that read them (like Spark, Trino, or Snowflake). This metadata layer enables ACID transactions, schema evolution, time travel, and multi-engine access — making your data lake behave more like a reliable database.

Why is Apache Iceberg better than Apache Hive for modern workloads?

Hive tracks data via directory structures, which becomes slow and inconsistent at scale on cloud object storage. Iceberg replaces directory listing with a hierarchical metadata tree that enables fast query planning (10–50x faster in many cases), atomic commits that prevent partial write failures, safe schema evolution without data rewrites, and hidden partitioning that removes the need for users to know the physical layout of tables.

What is time travel in Apache Iceberg?

Time travel in Iceberg allows you to query the historical state of a table at any past timestamp or snapshot ID. Because Iceberg writes a new, immutable metadata file with every commit and preserves the full snapshot history, you can access previous versions of the data using SQL syntax. This is useful for debugging pipeline failures, auditing data changes, and reproducing training datasets for machine learning models.

How does Apache Iceberg compare to Delta Lake?

Both formats provide ACID transactions and schema evolution on object storage. Delta Lake is optimized for Spark and Databricks environments and uses a sequential transaction log. Iceberg uses a snapshot-driven metadata tree and was designed for engine neutrality from the beginning — it has broader native support across cloud warehouses (Snowflake, BigQuery, Athena) and open-source engines. If you’re deeply invested in Databricks and Spark, Delta is excellent. For multi-engine, vendor-neutral architectures, Iceberg is typically the stronger choice.

Does Apache Iceberg require maintenance?

Yes. Iceberg reduces the reactive, ad hoc maintenance common with Hive, but it introduces a deliberate table lifecycle management discipline. You should regularly run compaction to prevent small file accumulation, snapshot expiration to control storage growth, and orphan file cleanup to remove unreferenced files from failed writes. These are predictable, automatable workflows — but skipping them will degrade performance over time.

What is the Iceberg REST Catalog and why does it matter?

The REST Catalog specification defines a standardized HTTP API for catalog interaction. It means any query engine that speaks HTTP can talk to any Iceberg catalog — without engine-specific libraries. This enables true multi-engine access to the same tables through a neutral catalog service (like Apache Polaris or Unity Catalog), with consistent access control, credential vending, and governance regardless of which tool is running the query.

What’s new in Apache Iceberg V3?

Finalized in May 2025, the V3 specification introduced deletion vectors (stored in Puffin files) for near-copy-on-write read performance even on frequently updated tables, Row Lineage for efficient CDC pipeline support, a Variant type for semi-structured JSON-like data, native geospatial types (Geometry and Geography), and nanosecond timestamp precision for scientific and financial workloads.

Is Apache Iceberg suitable for AI and machine learning pipelines?

Yes. Iceberg’s combination of time travel (for reproducible training datasets), schema safety (for feature store management), and multi-engine access (for the diverse tooling used in ML workflows) makes it well-suited for governed AI data infrastructure. As AI workloads demand larger, more carefully curated datasets, the auditability and versioning capabilities of Iceberg become increasingly important to ensure model reproducibility and compliance.

Apache Iceberg Explained: Why It’s the New Operating Model for Modern Data

Apache Iceberg Explained

The Library That Couldn’t Keep Up: A Brief History of Big Data’s Growing Pains

What Apache Iceberg Actually Is (And What It Isn’t)

The Three-Layer Architecture: How Iceberg Actually Works

The Catalog Layer

The Metadata File

Data Layer

From Hive to Iceberg: What the Migration Actually Changes

Hidden Partitioning: The Feature Most Teams Underestimate

Iceberg as a Time Machine: Snapshots, Time Travel, and Rollback

ACID Transactions and Multi-Engine Safety: The Real Business Case

Schema Evolution Without the Fear

The Specification Roadmap: V1 Through V3 (and What’s Coming in V4)

Apache Iceberg vs. Delta Lake vs. Apache Hudi: Choosing by Workload, Not Brand

What Iceberg Does Not Solve: Maintenance, Cost, and Governance Tradeoffs

The REST Catalog Revolution: Why Engine Neutrality Is Now Real

Common Mistakes Teams Make When Adopting Iceberg

Skipping compaction

Over-partitioning

Treating the catalog as an afterthought

Not setting snapshot retention policies

Where Iceberg Fits: Batch, Streaming, CDC, and AI Pipelines

The 2026 Horizon: What’s Coming Next

Conclusion: A New Operating Model, Not Just a Better File Format

Was this article helpful?

Dsn Daily

5 Comments

Leave a ReplyCancel Reply

From Crawl Control to AI Governance: The Unintended Evolution of Robots.txt

The Accountability Vacuum: Who Is Legally Responsible When AI Crawlers Ignore Robots.txt?

Apache Iceberg Explained: Why It’s the New Operating Model for Modern Data

📚 Reading List

Apache Iceberg Explained

The Library That Couldn’t Keep Up: A Brief History of Big Data’s Growing Pains

What Apache Iceberg Actually Is (And What It Isn’t)

The Three-Layer Architecture: How Iceberg Actually Works

The Catalog Layer

The Metadata File

Data Layer

From Hive to Iceberg: What the Migration Actually Changes

Hidden Partitioning: The Feature Most Teams Underestimate

Iceberg as a Time Machine: Snapshots, Time Travel, and Rollback

ACID Transactions and Multi-Engine Safety: The Real Business Case

Schema Evolution Without the Fear

The Specification Roadmap: V1 Through V3 (and What’s Coming in V4)

Apache Iceberg vs. Delta Lake vs. Apache Hudi: Choosing by Workload, Not Brand

What Iceberg Does Not Solve: Maintenance, Cost, and Governance Tradeoffs

The REST Catalog Revolution: Why Engine Neutrality Is Now Real

Common Mistakes Teams Make When Adopting Iceberg

Skipping compaction

Over-partitioning

Treating the catalog as an afterthought

Not setting snapshot retention policies

Where Iceberg Fits: Batch, Streaming, CDC, and AI Pipelines

The 2026 Horizon: What’s Coming Next

Conclusion: A New Operating Model, Not Just a Better File Format

Was this article helpful?

Dsn Daily

Related Posts

5 Comments

Leave a ReplyCancel Reply

Trending now

📚 Reading List