Prime_Questions: #Apache Iceberg

#Apache Iceberg

Key Concepts

S.No	Topic	Sub-Topics
1	Apache Iceberg	Data Lake vs Data Warehouse, Table Format Concept, Iceberg Architecture Overview, Iceberg vs Hive Tables
2	Iceberg Ecosystem	Iceberg with Apache Spark, Apache Flink, Trino/Presto, Hive, Databricks
4	Iceberg Table Concepts	Table Metadata, Data Files, Manifest Files, Manifest Lists, Snapshot Concept
5	Catalogs in Iceberg	Hadoop Catalog, Hive Metastore Catalog, REST Catalog, Glue Catalog, Catalog Configuration
6	Creating Iceberg Tables	SQL Table Creation, Spark API Table Creation, Table Properties, Table Location, Schema Definition
7	Schema Evolution	Add Columns, Rename Columns, Delete Columns, Change Column Types, Backward Compatibility
8	Partitioning	Hidden Partitioning, Partition Transforms, Partition Spec Evolution, Partition Pruning, Best Practices
9	Data Operations	Insert Data, Overwrite Data, Append Mode, Batch Writes, Streaming Writes
10	Querying Iceberg Tables	SQL Queries, Spark SQL Queries, Filtering Data, Aggregation Queries, Query Optimization
11	Time Travel	Snapshot IDs, Querying Past Data, Snapshot Retention, Use Cases, SQL Examples
12	Snapshots	Snapshot Creation, Snapshot Metadata, Snapshot Expiration, Snapshot Rollback, Snapshot History
13	Data Versioning	Versioned Tables, Commit Operations, Metadata Tracking, Branching Concepts, Tagging Data Versions
14	ACID Transactions	Atomic Operations, Consistency Guarantees, Isolation Levels, Concurrent Writes, Failure Recovery
15	File Formats	Parquet Support, ORC Support, Avro Support, Compression Options, File Size Optimization
16	Manifest Files	Manifest Structure, Metadata Tracking, File Listings, Performance Benefits, Manifest Merging
17	Metadata Management	Metadata Files, Metadata Evolution, Metadata Size Optimization, Metadata Cleanup, Metadata Storage
18	Data Compaction	Small File Problem, Compaction Strategies, Rewrite Data Files, Optimize Table, Scheduling Compaction
19	Delete Operations	Equality Deletes, Position Deletes, Delete Files, Row-Level Deletes, Delete Performance
20	Update Operations	Row Updates, Update Strategies, Merge Statements, Update Performance, Handling Conflicts
21	Merge Operations	MERGE INTO Syntax, Upserts, CDC Processing, Incremental Updates, Merge Optimization
22	Streaming Integration	Spark Structured Streaming, Flink Streaming, Kafka Integration, Streaming Writes, Streaming Reads
23	Performance Optimization	Partition Design, File Size Tuning, Metadata Optimization, Query Planning, Caching Strategies
24	Security	Access Control, Authentication, Authorization, Data Encryption, Governance Policies
25	Monitoring	Query Monitoring, Table Metrics, Logging, Alerts Setup, Performance Tracking
26	Iceberg Maintenance	Expire Snapshots, Remove Orphan Files, Rewrite Manifests, Table Optimization, Maintenance Scheduling
27	Integration with Data Lakes	AWS S3 Storage, Azure Data Lake Storage, Google Cloud Storage, HDFS Storage, Hybrid Storage
28	Comparison	Iceberg vs Delta Lake, Iceberg vs Apache Hudi, Performance Comparison, Feature Comparison, Use Case Differences
29	Production Best Practices	Data Layout Design, Partition Strategies, Governance, Performance Monitoring, Disaster Recovery
30	Real World Use Cases	Data Lakehouse Architecture, Incremental Data Pipelines, CDC Data Processing, Data Warehousing, Machine Learning Data Pipelines

Interview question

BASIC

What is Apache Iceberg?
Why was Apache Iceberg created?
What problems does Iceberg solve in data lakes?
What is a table format?
Difference between Iceberg and Hive tables?
What are Iceberg snapshots?
What is schema evolution?
What is hidden partitioning?
Which engines support Iceberg?
What file formats are supported by Iceberg?
What is a catalog in Iceberg?
What is Hadoop catalog?
What is Hive Metastore catalog?
How do you create an Iceberg table?
What is append operation?
What is overwrite operation?
What is time travel in Iceberg?
What is partition pruning?
What metadata does Iceberg maintain?
What is ACID compliance?
What is snapshot isolation?
How does Iceberg handle large tables?
What is a manifest file?
What is a manifest list?
What is table metadata file?

INTERMEDIATE

Explain Iceberg table architecture.
How does Iceberg enable schema evolution safely?
Difference between partition spec and partition field?
How does Iceberg support concurrent writes?
Explain snapshot lifecycle.
How does rollback work in Iceberg?
What is metadata versioning?
Explain equality deletes.
Explain position deletes.
How does Iceberg manage small files?
What is compaction in Iceberg?
How does Iceberg optimize query performance?
Explain partition transforms.
How does Iceberg support streaming workloads?
What is snapshot expiration?
Explain orphan file removal.
What is metadata cleanup?
Explain Iceberg commit process.
How does Iceberg guarantee atomic commits?
Explain Iceberg table properties.
How do you query historical data?
What are branches and tags in Iceberg?
How does Iceberg integrate with Spark?
How does Iceberg integrate with Flink?
Explain Iceberg REST catalog.

ADVANCED

Explain Iceberg metadata tree structure.
How does Iceberg avoid partition explosion?
Explain write amplification in Iceberg.
How are manifests merged?
Explain scan planning in Iceberg.
What is vectorized reading in Iceberg?
How does Iceberg handle schema ID tracking?
Explain optimistic concurrency control.
How does Iceberg manage multi-engine consistency?
Explain incremental reads.
How does Iceberg support CDC pipelines?
Explain MERGE INTO implementation.
How does Iceberg handle updates internally?
Explain delete file application during reads.
How does Iceberg reduce metadata bottlenecks?
Explain file pruning mechanisms.
How are statistics stored in Iceberg?
Explain snapshot lineage.
How does Iceberg support branching workflows?
Explain Iceberg table rewrite operations.
How does Iceberg interact with object storage?
Explain commit retries and conflict resolution.
How does Iceberg enable reproducible analytics?
Explain metadata caching strategies.
How does Iceberg scale to petabyte datasets?

EXPERT

Design a lakehouse architecture using Iceberg.
Explain Iceberg internals during query planning.
How would you tune Iceberg for low-latency analytics?
Explain metadata file compaction strategy.
How does Iceberg compare with Delta Lake internals?
How does Iceberg compare with Apache Hudi architecture?
Design CDC ingestion using Iceberg + Kafka.
How do you implement multi-tenant Iceberg tables?
Explain Iceberg failure recovery mechanisms.
How do you manage schema governance at scale?
Explain Iceberg security integration patterns.
How would you optimize cost in cloud object storage?
Design real-time analytics using Iceberg.
Explain snapshot branching for experimentation.
How do you debug corrupted Iceberg metadata?
Explain Iceberg upgrade compatibility strategy.
How would you implement data governance using Iceberg?
Explain cross-engine transaction guarantees.
How do you monitor Iceberg tables in production?
Design a petabyte-scale Iceberg deployment.
Explain Iceberg performance benchmarking methodology.
How do you handle massive small-file ingestion pipelines?
Explain disaster recovery strategy for Iceberg tables.
How would you implement data version auditing?
Explain future roadmap and emerging features of Apache Iceberg.

Prime_Questions

Popular Posts

16 March 2026

#Apache Iceberg

Key Concepts

Interview question

BASIC

INTERMEDIATE

ADVANCED

EXPERT

Related Topics