Prime_Questions: #SparkSession & SparkContext

#SparkSession & SparkContext

Key Concepts

S.No	Topic	Sub-Topics
1	Spark	What is Apache Spark, Spark components, Advantages, Use cases, Ecosystem overview
2	SparkSession	Definition, Role, Features, Entry point for APIs, Supported languages
3	SparkContext	Definition, Role, Driver & Executor, Cluster Manager interaction, SparkConf
4	SparkSession vs SparkContext	Difference, When to use, Entry points, APIs handled, Example code
5	Creating SparkSession	builder(), getOrCreate(), appName(), master(), config(), enableHiveSupport()
6	Accessing SparkContext	spark.sparkContext, Usage, Properties, Configuration, Example code
7	SparkConf Configuration	Setting app name, Master URL, Executor memory, Shuffle partitions, Dynamic allocation
8	Managing SparkContext	start, stop, active contexts, reuse, pitfalls of multiple contexts
9	SparkSession Builder Options	appName(), master(), config(), enableHiveSupport(), getOrCreate()
10	SparkSession Properties	spark.sql.shuffle.partitions, spark.executor.memory, spark.driver.memory, spark.serializer, spark.sql.catalogImplementation
11	Accessing Spark Version	spark.version, sparkContext.version, examples, compatibility check, best practices
12	SparkContext Properties	applicationId, applicationName, defaultParallelism, defaultMinPartitions, uiWebUrl
13	Local vs Cluster Mode	local, local[n], standalone cluster, YARN, Mesos, Kubernetes
14	Stopping & Restarting SparkContext	stop(), getOrCreate(), pitfalls, multiple SparkContexts, examples
15	Creating DataFrames via SparkSession	from RDD, from JSON, from CSV, from Parquet, from Hive table
16	Creating Datasets via SparkSession	from case class, from RDD, typed vs untyped, Scala/Java API, examples
17	Accessing Hive via SparkSession	enableHiveSupport(), createDatabase, useDatabase, createTable, SQL queries
18	Spark Logging & Monitoring	setLogLevel(), UI Web Port, Executor logs, Application metrics, Event logging
19	SparkSession Temporary Views	createOrReplaceTempView(), createGlobalTempView(), SQL queries, examples, lifespan
20	Spark Configuration at Runtime	spark.conf.set(), spark.conf.get(), examples, tuning, best practices
21	Broadcast Variables via SparkContext	Definition, Usage, Example, Benefits, Best practices
22	Accumulators via SparkContext	Definition, Usage, Numeric & custom, Example, Pitfalls
23	Parallelism & Partitions	defaultParallelism, numPartitions, repartition(), coalesce(), examples
24	RDD Creation via SparkContext	parallelize(), textFile(), wholeTextFiles(), from existing collections, key-value RDDs
25	Checkpointing via SparkContext	Purpose, Directory setup, Example, Difference from caching, Use cases
26	Dynamic Resource Allocation	Configuration, enable/disable, SparkConf, Executors, Scaling
27	SparkSession Integration with Data Sources	CSV, JSON, Parquet, JDBC, Hive tables
28	Best Practices for SparkSession & SparkContext	Single session per app, Reuse SparkContext, Proper stop(), Config tuning, Logging
29	Performance Tuning	shuffle partitions, memory config, caching, checkpointing, partitioning strategies
30	Wrap-up & Real-world Examples	Sample applications, ETL pipelines, ML pipelines, Streaming context, Integration examples

Interview question

Basic

What is SparkSession?
What is SparkContext?
Difference between SparkSession and SparkContext?
How do you create a SparkSession in Spark 2.x?
How do you access SparkContext from SparkSession?
Which Spark version introduced SparkSession?
Can SparkContext run without SparkSession?
What is the role of SparkContext?
What are the main components initialized by SparkSession?
How do you stop a SparkSession and SparkContext?
What is SparkConf?
How do you configure Spark properties?
What is the default master in SparkSession?
Can you have multiple SparkSessions in the same JVM?
What is the role of SparkSession.builder()?
How do you enable Hive support in SparkSession?
Difference between SparkContext.textFile() and SparkSession.read.text()
What is the difference between local and cluster mode?
What is the default Spark UI port?
What happens if you try to create a second SparkContext?
What is the difference between SparkSession and SQLContext?
What is the difference between SparkSession and HiveContext?
What languages does SparkSession support?
What is the entry point for DataFrame API?
What is the entry point for RDD API?

Intermediate

What is the relationship between SparkSession and SparkContext?
How can you configure Spark properties at runtime?
How does SparkSession manage HiveContext and SQLContext?
Explain the use of spark.conf in SparkSession.
What is the lifecycle of SparkContext?
Can multiple SparkContexts exist in the same JVM?
How does SparkSession simplify DataFrame and Dataset creation?
What is the difference between local[*] and local[1] in SparkContext?
Explain dynamic allocation of executors.
How do you retrieve Spark version from SparkSession?
How do you set Spark logging level programmatically?
Difference between stop() and close() methods in SparkSession.
How does SparkSession handle temporary views?
Explain the use of getOrCreate() method in SparkSession.
What is the default storage level for caching in SparkContext?
How do you broadcast a variable using SparkContext?
How do you read JSON files using SparkSession?
How do you read CSV files using SparkSession?
How do you read Parquet files using SparkSession?
What is the difference between cache() and persist()?
How do you retrieve the master URL from SparkContext?
How do you programmatically set the number of shuffle partitions?
Explain the difference between SparkSession and SparkContext for streaming.
How do you enable Arrow optimization for Pandas in SparkSession?
How do you configure checkpoint directory in SparkContext?

Advanced

How does SparkContext interact with the Cluster Manager?
Explain how SparkContext schedules tasks on executors.
How does SparkSession handle Catalyst optimization?
What are the internal components of SparkSession?
Explain the DAG creation process in SparkContext.
How does SparkContext manage executor memory?
How does SparkSession handle multiple catalogs?
Difference between SparkSession.builder().getOrCreate() and new SparkSession()
How does SparkContext handle failures and retries?
Explain dynamic allocation and its configuration in SparkSession.
How does SparkContext manage RDD lineage and recomputation?
Explain broadcast variables and accumulators in SparkContext.
How does SparkSession optimize joins internally?
How do you configure memory fractions for execution and storage?
Explain lazy evaluation in SparkContext and its benefits.
How does SparkSession integrate with Hive metastore?
How do you monitor SparkSession and SparkContext metrics programmatically?
Explain how SparkContext handles shuffle and stage tasks.
What happens when SparkSession enables Hive support internally?
Explain checkpointing in SparkContext and its use cases.
How does SparkContext manage task locality?
Difference between SparkSession and SparkContext for DataFrame caching.
How does SparkSession interact with Spark SQL Engine?
Explain the role of SparkContext in RDD partitioning.
How does SparkSession handle streaming queries?

Expert

Explain internal communication between SparkSession and SparkContext.
How does SparkSession handle multiple users in the same JVM?
Explain memory management roles of SparkContext during caching.
How does SparkContext optimize shuffle operations internally?
How can SparkSession be extended for custom DataSource APIs?
Explain the DAG scheduler interaction with SparkContext.
How does SparkContext handle speculative execution?
Explain fault tolerance mechanisms managed by SparkContext.
How does SparkSession handle query plan caching?
How does SparkContext manage dynamic resource allocation?
Explain SparkContext internal task retry mechanism.
How does SparkSession handle advanced optimization like whole-stage codegen?
Explain how SparkContext schedules wide vs narrow transformations.
How does SparkSession integrate with external catalogs and Hive?
Explain checkpointing internals and lineage truncation in SparkContext.
How does SparkContext handle broadcast variable serialization?
How does SparkSession handle complex aggregations?
Explain SparkContext interactions with shuffle service for fault tolerance.
How does SparkSession optimize query plans for joins?
Explain task scheduling strategies used by SparkContext.
How does SparkContext manage executor blacklisting?
How does SparkSession handle session isolation across applications?
Explain internal mechanisms of DataFrame caching in SparkSession.
How does SparkContext manage memory spill to disk?
Explain advanced tuning techniques for SparkSession and SparkContext in large clusters.

Prime_Questions

Popular Posts

14 January 2026

#SparkSession & SparkContext

Key Concepts

Interview question

Basic

Intermediate

Advanced

Expert

Related Topics