Prime_Questions: #Spark Streaming

#Spark Streaming

Key Concepts

S.No	Topic	Sub-Topics
1	Spark Streaming	What is Spark Streaming, Real-time data, Micro-batch processing, Advantages, Use cases
2	Spark Streaming Architecture	Driver, Receiver, DStream, Scheduler, Executors
3	DStream Basics	Definition, Creation, Operations, RDDs, Transformations
4	Creating DStreams	From sources: Kafka, Flume, TCP sockets, File streams, Custom receivers
5	Transformations on DStreams	map(), flatMap(), filter(), reduceByKey(), window()
6	Window Operations	window(), slideDuration, reduceByKeyAndWindow(), aggregateByKeyAndWindow(), Examples
7	Stateful Transformations	updateStateByKey(), mapWithState(), Example, Use cases, Performance
8	Actions on DStreams	print(), count(), saveAsTextFiles(), foreachRDD(), Examples
9	Data Sources Integration	Kafka, Flume, HDFS, Socket, Custom sources
10	Sinks / Output Operations	print(), saveAsTextFiles(), saveAsObjectFiles(), foreachRDD(), write to DB
11	Checkpointing	Definition, Directory setup, Purpose, Examples, Fault tolerance
12	Receiver Types	Reliable receiver, Unreliable receiver, Custom receiver, Receiver lifecycle, Examples
13	Transformations: map vs flatMap	map(), flatMap(), Use cases, Examples, Differences
14	Transformations: reduceByKey	reduceByKey(), reduceByKeyAndWindow(), Examples, Use cases, Performance
15	Transformations: join in streaming	join(), leftOuterJoin(), rightOuterJoin(), fullOuterJoin(), Example
16	Transformations: union & transform	union(), transform(), Example, Use cases, Combining multiple streams
17	Handling Late Data	Watermarks, Window operations, State management, dropLateData(), Examples
18	Kafka Integration	DirectStream vs ReceiverStream, Kafka parameters, Offset management, Example, Best practices
19	Flume Integration	Spark Streaming + Flume, Push vs Pull, Receiver setup, Example, Best practices
20	File Stream Source	HDFS integration, Local files, Monitoring new files, Examples, Performance considerations
21	Structured Streaming Introduction	Differences from DStream, High-level API, DataFrames & Datasets, Fault-tolerance, Example
22	Structured Streaming Sources	Kafka, File, Socket, Rate source, Custom sources
23	Structured Streaming Sinks	Console, File, Kafka, ForeachBatch, Memory
24	Event Time & Watermarks	Definition, Handling late data, withWatermark(), Examples, Use cases
25	Window Operations in Structured Streaming	window(), slideDuration, groupBy window(), Examples, Performance tips
26	Stateful Operations in Structured Streaming	mapGroupsWithState(), flatMapGroupsWithState(), Examples, Use cases, Performance
27	Performance Tuning	Batch interval, Partitioning, Backpressure, Checkpointing, Resource tuning
28	Fault Tolerance & Reliability	Checkpointing, Write-ahead logs, Replay, Receiver reliability, Structured Streaming guarantees
29	Monitoring & Debugging	Spark UI, Streaming metrics, Logs, Executor monitoring, Performance tuning
30	Real-world Examples	Log analytics, IoT data processing, Real-time dashboards, Clickstream analysis, Recommendations

Interview question

Basic

What is Spark Streaming?
Explain real-time data processing.
What is a micro-batch in Spark Streaming?
Difference between batch and streaming.
What is a DStream?
How is a DStream created?
What are the basic DStream transformations?
What are the basic DStream actions?
Explain map() transformation in streaming.
Explain flatMap() transformation in streaming.
Explain filter() transformation in streaming.
Explain reduceByKey() transformation in streaming.
Explain count() action in streaming.
Explain print() action in streaming.
How to read from a socket stream?
How to read from a file stream?
Difference between reliable and unreliable receivers.
What is the role of the driver in Spark Streaming?
What is the role of executors in streaming?
How is batch interval configured?
What is the default checkpointing mechanism?
How do you stop a streaming context?
Explain foreachRDD() action.
What is the Spark Streaming UI?
Explain the use cases of Spark Streaming.

Intermediate

Explain window operations in Spark Streaming.
What is slide interval?
Difference between window duration and slide duration.
Explain reduceByKeyAndWindow().
Explain aggregateByKeyAndWindow().
What are stateful transformations?
Explain updateStateByKey().
Explain mapWithState().
How do you integrate Spark Streaming with Kafka?
What is DirectKafkaStream?
What is Receiver-based Kafka stream?
How do you handle offsets in Kafka?
Explain Spark Streaming integration with Flume.
Explain push-based vs pull-based Flume integration.
How to read from HDFS in streaming?
How to read from S3 in streaming?
Explain streaming file source options.
Explain output operations: saveAsTextFiles().
Explain output operations: saveAsObjectFiles().
Explain output operations: foreachRDD() to database.
Explain fault tolerance in Spark Streaming.
What is write-ahead logs (WAL)?
Explain receiver reliability.
Explain backpressure mechanism in Spark Streaming.
What is the role of batch scheduling?

Advanced

Explain structured streaming.
Difference between DStream API and Structured Streaming API.
What are the sources in Structured Streaming?
What are the sinks in Structured Streaming?
Explain event-time processing.
Explain watermarks in streaming.
How to handle late data using watermarks?
Explain streaming aggregation.
Explain window aggregation in structured streaming.
Explain stateful aggregations.
Explain mapGroupsWithState().
Explain flatMapGroupsWithState().
Explain join operations in streaming.
Explain stream-stream join vs stream-static join.
Explain stream-stream outer joins.
Explain checkpointing in structured streaming.
Explain exactly-once semantics in streaming.
Explain output modes: append, complete, update.
Explain processing-time triggers.
Explain continuous processing mode.
Explain schema inference in streaming.
Explain custom sources in structured streaming.
Explain foreachBatch() in structured streaming.
Explain streaming aggregation with watermarking.
Explain performance tuning for structured streaming.

Expert

Explain state store in structured streaming.
Explain recovery from failures in streaming.
Explain backpressure in structured streaming.
Explain memory and executor tuning for streaming.
Explain shuffle optimization in streaming joins.
Explain handling skewed streaming data.
Explain checkpointing and lineage recovery.
Explain streaming aggregation optimizations.
Explain watermarks with multiple streams.
Explain latency vs throughput trade-offs.
Explain using Kafka offsets with checkpointing.
Explain exactly-once vs at-least-once delivery.
Explain stateful streaming performance tuning.
Explain streaming joins with large datasets.
Explain stream-stream join optimization.
Explain integrating streaming with machine learning.
Explain handling late-arriving events.
Explain multi-window aggregations.
Explain structured streaming with event time vs processing time.
Explain monitoring streaming jobs with Spark UI.
Explain streaming metrics and logs.
Explain resource allocation and dynamic scaling.
Explain memory spill and disk management in streaming.
Explain streaming ETL pipelines.
Explain real-world streaming applications and case studies.

Prime_Questions

Popular Posts

14 January 2026

#Spark Streaming

Key Concepts

Interview question

Basic

Intermediate

Advanced

Expert

Related Topics