14 January 2026

#SparkSession & SparkContext

#SparkSession & SparkContext

Key Concepts


S.No Topic Sub-Topics
1SparkWhat is Apache Spark, Spark components, Advantages, Use cases, Ecosystem overview
2SparkSessionDefinition, Role, Features, Entry point for APIs, Supported languages
3SparkContextDefinition, Role, Driver & Executor, Cluster Manager interaction, SparkConf
4SparkSession vs SparkContextDifference, When to use, Entry points, APIs handled, Example code
5Creating SparkSessionbuilder(), getOrCreate(), appName(), master(), config(), enableHiveSupport()
6Accessing SparkContextspark.sparkContext, Usage, Properties, Configuration, Example code
7SparkConf ConfigurationSetting app name, Master URL, Executor memory, Shuffle partitions, Dynamic allocation
8Managing SparkContextstart, stop, active contexts, reuse, pitfalls of multiple contexts
9SparkSession Builder OptionsappName(), master(), config(), enableHiveSupport(), getOrCreate()
10SparkSession Propertiesspark.sql.shuffle.partitions, spark.executor.memory, spark.driver.memory, spark.serializer, spark.sql.catalogImplementation
11Accessing Spark Versionspark.version, sparkContext.version, examples, compatibility check, best practices
12SparkContext PropertiesapplicationId, applicationName, defaultParallelism, defaultMinPartitions, uiWebUrl
13Local vs Cluster Modelocal, local[n], standalone cluster, YARN, Mesos, Kubernetes
14Stopping & Restarting SparkContextstop(), getOrCreate(), pitfalls, multiple SparkContexts, examples
15Creating DataFrames via SparkSessionfrom RDD, from JSON, from CSV, from Parquet, from Hive table
16Creating Datasets via SparkSessionfrom case class, from RDD, typed vs untyped, Scala/Java API, examples
17Accessing Hive via SparkSessionenableHiveSupport(), createDatabase, useDatabase, createTable, SQL queries
18Spark Logging & MonitoringsetLogLevel(), UI Web Port, Executor logs, Application metrics, Event logging
19SparkSession Temporary ViewscreateOrReplaceTempView(), createGlobalTempView(), SQL queries, examples, lifespan
20Spark Configuration at Runtimespark.conf.set(), spark.conf.get(), examples, tuning, best practices
21Broadcast Variables via SparkContextDefinition, Usage, Example, Benefits, Best practices
22Accumulators via SparkContextDefinition, Usage, Numeric & custom, Example, Pitfalls
23Parallelism & PartitionsdefaultParallelism, numPartitions, repartition(), coalesce(), examples
24RDD Creation via SparkContextparallelize(), textFile(), wholeTextFiles(), from existing collections, key-value RDDs
25Checkpointing via SparkContextPurpose, Directory setup, Example, Difference from caching, Use cases
26Dynamic Resource AllocationConfiguration, enable/disable, SparkConf, Executors, Scaling
27SparkSession Integration with Data SourcesCSV, JSON, Parquet, JDBC, Hive tables
28Best Practices for SparkSession & SparkContextSingle session per app, Reuse SparkContext, Proper stop(), Config tuning, Logging
29Performance Tuningshuffle partitions, memory config, caching, checkpointing, partitioning strategies
30Wrap-up & Real-world ExamplesSample applications, ETL pipelines, ML pipelines, Streaming context, Integration examples

Interview question

Basic

  • What is SparkSession?
  • What is SparkContext?
  • Difference between SparkSession and SparkContext?
  • How do you create a SparkSession in Spark 2.x?
  • How do you access SparkContext from SparkSession?
  • Which Spark version introduced SparkSession?
  • Can SparkContext run without SparkSession?
  • What is the role of SparkContext?
  • What are the main components initialized by SparkSession?
  • How do you stop a SparkSession and SparkContext?
  • What is SparkConf?
  • How do you configure Spark properties?
  • What is the default master in SparkSession?
  • Can you have multiple SparkSessions in the same JVM?
  • What is the role of SparkSession.builder()?
  • How do you enable Hive support in SparkSession?
  • Difference between SparkContext.textFile() and SparkSession.read.text()
  • What is the difference between local and cluster mode?
  • What is the default Spark UI port?
  • What happens if you try to create a second SparkContext?
  • What is the difference between SparkSession and SQLContext?
  • What is the difference between SparkSession and HiveContext?
  • What languages does SparkSession support?
  • What is the entry point for DataFrame API?
  • What is the entry point for RDD API?

Intermediate

  • What is the relationship between SparkSession and SparkContext?
  • How can you configure Spark properties at runtime?
  • How does SparkSession manage HiveContext and SQLContext?
  • Explain the use of spark.conf in SparkSession.
  • What is the lifecycle of SparkContext?
  • Can multiple SparkContexts exist in the same JVM?
  • How does SparkSession simplify DataFrame and Dataset creation?
  • What is the difference between local[*] and local[1] in SparkContext?
  • Explain dynamic allocation of executors.
  • How do you retrieve Spark version from SparkSession?
  • How do you set Spark logging level programmatically?
  • Difference between stop() and close() methods in SparkSession.
  • How does SparkSession handle temporary views?
  • Explain the use of getOrCreate() method in SparkSession.
  • What is the default storage level for caching in SparkContext?
  • How do you broadcast a variable using SparkContext?
  • How do you read JSON files using SparkSession?
  • How do you read CSV files using SparkSession?
  • How do you read Parquet files using SparkSession?
  • What is the difference between cache() and persist()?
  • How do you retrieve the master URL from SparkContext?
  • How do you programmatically set the number of shuffle partitions?
  • Explain the difference between SparkSession and SparkContext for streaming.
  • How do you enable Arrow optimization for Pandas in SparkSession?
  • How do you configure checkpoint directory in SparkContext?

Advanced

  • How does SparkContext interact with the Cluster Manager?
  • Explain how SparkContext schedules tasks on executors.
  • How does SparkSession handle Catalyst optimization?
  • What are the internal components of SparkSession?
  • Explain the DAG creation process in SparkContext.
  • How does SparkContext manage executor memory?
  • How does SparkSession handle multiple catalogs?
  • Difference between SparkSession.builder().getOrCreate() and new SparkSession()
  • How does SparkContext handle failures and retries?
  • Explain dynamic allocation and its configuration in SparkSession.
  • How does SparkContext manage RDD lineage and recomputation?
  • Explain broadcast variables and accumulators in SparkContext.
  • How does SparkSession optimize joins internally?
  • How do you configure memory fractions for execution and storage?
  • Explain lazy evaluation in SparkContext and its benefits.
  • How does SparkSession integrate with Hive metastore?
  • How do you monitor SparkSession and SparkContext metrics programmatically?
  • Explain how SparkContext handles shuffle and stage tasks.
  • What happens when SparkSession enables Hive support internally?
  • Explain checkpointing in SparkContext and its use cases.
  • How does SparkContext manage task locality?
  • Difference between SparkSession and SparkContext for DataFrame caching.
  • How does SparkSession interact with Spark SQL Engine?
  • Explain the role of SparkContext in RDD partitioning.
  • How does SparkSession handle streaming queries?

Expert

  • Explain internal communication between SparkSession and SparkContext.
  • How does SparkSession handle multiple users in the same JVM?
  • Explain memory management roles of SparkContext during caching.
  • How does SparkContext optimize shuffle operations internally?
  • How can SparkSession be extended for custom DataSource APIs?
  • Explain the DAG scheduler interaction with SparkContext.
  • How does SparkContext handle speculative execution?
  • Explain fault tolerance mechanisms managed by SparkContext.
  • How does SparkSession handle query plan caching?
  • How does SparkContext manage dynamic resource allocation?
  • Explain SparkContext internal task retry mechanism.
  • How does SparkSession handle advanced optimization like whole-stage codegen?
  • Explain how SparkContext schedules wide vs narrow transformations.
  • How does SparkSession integrate with external catalogs and Hive?
  • Explain checkpointing internals and lineage truncation in SparkContext.
  • How does SparkContext handle broadcast variable serialization?
  • How does SparkSession handle complex aggregations?
  • Explain SparkContext interactions with shuffle service for fault tolerance.
  • How does SparkSession optimize query plans for joins?
  • Explain task scheduling strategies used by SparkContext.
  • How does SparkContext manage executor blacklisting?
  • How does SparkSession handle session isolation across applications?
  • Explain internal mechanisms of DataFrame caching in SparkSession.
  • How does SparkContext manage memory spill to disk?
  • Explain advanced tuning techniques for SparkSession and SparkContext in large clusters.

Related Topics