16 March 2026

#Apache Iceberg

#Apache Iceberg

Key Concepts


S.No Topic Sub-Topics
1Apache Iceberg Data Lake vs Data Warehouse, Table Format Concept, Iceberg Architecture Overview, Iceberg vs Hive Tables
2Iceberg EcosystemIceberg with Apache Spark, Apache Flink, Trino/Presto, Hive, Databricks
4Iceberg Table ConceptsTable Metadata, Data Files, Manifest Files, Manifest Lists, Snapshot Concept
5Catalogs in IcebergHadoop Catalog, Hive Metastore Catalog, REST Catalog, Glue Catalog, Catalog Configuration
6Creating Iceberg TablesSQL Table Creation, Spark API Table Creation, Table Properties, Table Location, Schema Definition
7Schema EvolutionAdd Columns, Rename Columns, Delete Columns, Change Column Types, Backward Compatibility
8PartitioningHidden Partitioning, Partition Transforms, Partition Spec Evolution, Partition Pruning, Best Practices
9Data OperationsInsert Data, Overwrite Data, Append Mode, Batch Writes, Streaming Writes
10Querying Iceberg TablesSQL Queries, Spark SQL Queries, Filtering Data, Aggregation Queries, Query Optimization
11Time TravelSnapshot IDs, Querying Past Data, Snapshot Retention, Use Cases, SQL Examples
12SnapshotsSnapshot Creation, Snapshot Metadata, Snapshot Expiration, Snapshot Rollback, Snapshot History
13Data VersioningVersioned Tables, Commit Operations, Metadata Tracking, Branching Concepts, Tagging Data Versions
14ACID TransactionsAtomic Operations, Consistency Guarantees, Isolation Levels, Concurrent Writes, Failure Recovery
15File FormatsParquet Support, ORC Support, Avro Support, Compression Options, File Size Optimization
16Manifest FilesManifest Structure, Metadata Tracking, File Listings, Performance Benefits, Manifest Merging
17Metadata ManagementMetadata Files, Metadata Evolution, Metadata Size Optimization, Metadata Cleanup, Metadata Storage
18Data CompactionSmall File Problem, Compaction Strategies, Rewrite Data Files, Optimize Table, Scheduling Compaction
19Delete OperationsEquality Deletes, Position Deletes, Delete Files, Row-Level Deletes, Delete Performance
20Update OperationsRow Updates, Update Strategies, Merge Statements, Update Performance, Handling Conflicts
21Merge OperationsMERGE INTO Syntax, Upserts, CDC Processing, Incremental Updates, Merge Optimization
22Streaming IntegrationSpark Structured Streaming, Flink Streaming, Kafka Integration, Streaming Writes, Streaming Reads
23Performance OptimizationPartition Design, File Size Tuning, Metadata Optimization, Query Planning, Caching Strategies
24SecurityAccess Control, Authentication, Authorization, Data Encryption, Governance Policies
25MonitoringQuery Monitoring, Table Metrics, Logging, Alerts Setup, Performance Tracking
26Iceberg MaintenanceExpire Snapshots, Remove Orphan Files, Rewrite Manifests, Table Optimization, Maintenance Scheduling
27Integration with Data LakesAWS S3 Storage, Azure Data Lake Storage, Google Cloud Storage, HDFS Storage, Hybrid Storage
28ComparisonIceberg vs Delta Lake, Iceberg vs Apache Hudi, Performance Comparison, Feature Comparison, Use Case Differences
29Production Best PracticesData Layout Design, Partition Strategies, Governance, Performance Monitoring, Disaster Recovery
30Real World Use CasesData Lakehouse Architecture, Incremental Data Pipelines, CDC Data Processing, Data Warehousing, Machine Learning Data Pipelines

Interview question

    BASIC

  • What is Apache Iceberg?
  • Why was Apache Iceberg created?
  • What problems does Iceberg solve in data lakes?
  • What is a table format?
  • Difference between Iceberg and Hive tables?
  • What are Iceberg snapshots?
  • What is schema evolution?
  • What is hidden partitioning?
  • Which engines support Iceberg?
  • What file formats are supported by Iceberg?
  • What is a catalog in Iceberg?
  • What is Hadoop catalog?
  • What is Hive Metastore catalog?
  • How do you create an Iceberg table?
  • What is append operation?
  • What is overwrite operation?
  • What is time travel in Iceberg?
  • What is partition pruning?
  • What metadata does Iceberg maintain?
  • What is ACID compliance?
  • What is snapshot isolation?
  • How does Iceberg handle large tables?
  • What is a manifest file?
  • What is a manifest list?
  • What is table metadata file?
  • INTERMEDIATE

  • Explain Iceberg table architecture.
  • How does Iceberg enable schema evolution safely?
  • Difference between partition spec and partition field?
  • How does Iceberg support concurrent writes?
  • Explain snapshot lifecycle.
  • How does rollback work in Iceberg?
  • What is metadata versioning?
  • Explain equality deletes.
  • Explain position deletes.
  • How does Iceberg manage small files?
  • What is compaction in Iceberg?
  • How does Iceberg optimize query performance?
  • Explain partition transforms.
  • How does Iceberg support streaming workloads?
  • What is snapshot expiration?
  • Explain orphan file removal.
  • What is metadata cleanup?
  • Explain Iceberg commit process.
  • How does Iceberg guarantee atomic commits?
  • Explain Iceberg table properties.
  • How do you query historical data?
  • What are branches and tags in Iceberg?
  • How does Iceberg integrate with Spark?
  • How does Iceberg integrate with Flink?
  • Explain Iceberg REST catalog.
  • ADVANCED

  • Explain Iceberg metadata tree structure.
  • How does Iceberg avoid partition explosion?
  • Explain write amplification in Iceberg.
  • How are manifests merged?
  • Explain scan planning in Iceberg.
  • What is vectorized reading in Iceberg?
  • How does Iceberg handle schema ID tracking?
  • Explain optimistic concurrency control.
  • How does Iceberg manage multi-engine consistency?
  • Explain incremental reads.
  • How does Iceberg support CDC pipelines?
  • Explain MERGE INTO implementation.
  • How does Iceberg handle updates internally?
  • Explain delete file application during reads.
  • How does Iceberg reduce metadata bottlenecks?
  • Explain file pruning mechanisms.
  • How are statistics stored in Iceberg?
  • Explain snapshot lineage.
  • How does Iceberg support branching workflows?
  • Explain Iceberg table rewrite operations.
  • How does Iceberg interact with object storage?
  • Explain commit retries and conflict resolution.
  • How does Iceberg enable reproducible analytics?
  • Explain metadata caching strategies.
  • How does Iceberg scale to petabyte datasets?
  • EXPERT

  • Design a lakehouse architecture using Iceberg.
  • Explain Iceberg internals during query planning.
  • How would you tune Iceberg for low-latency analytics?
  • Explain metadata file compaction strategy.
  • How does Iceberg compare with Delta Lake internals?
  • How does Iceberg compare with Apache Hudi architecture?
  • Design CDC ingestion using Iceberg + Kafka.
  • How do you implement multi-tenant Iceberg tables?
  • Explain Iceberg failure recovery mechanisms.
  • How do you manage schema governance at scale?
  • Explain Iceberg security integration patterns.
  • How would you optimize cost in cloud object storage?
  • Design real-time analytics using Iceberg.
  • Explain snapshot branching for experimentation.
  • How do you debug corrupted Iceberg metadata?
  • Explain Iceberg upgrade compatibility strategy.
  • How would you implement data governance using Iceberg?
  • Explain cross-engine transaction guarantees.
  • How do you monitor Iceberg tables in production?
  • Design a petabyte-scale Iceberg deployment.
  • Explain Iceberg performance benchmarking methodology.
  • How do you handle massive small-file ingestion pipelines?
  • Explain disaster recovery strategy for Iceberg tables.
  • How would you implement data version auditing?
  • Explain future roadmap and emerging features of Apache Iceberg.

Related Topics