| S.No |
Topic |
Sub-Topics |
| 1 | Spark | What is Apache Spark, Spark components, Advantages, Use cases, Ecosystem overview |
| 2 | SparkSession | Definition, Role, Features, Entry point for APIs, Supported languages |
| 3 | SparkContext | Definition, Role, Driver & Executor, Cluster Manager interaction, SparkConf |
| 4 | SparkSession vs SparkContext | Difference, When to use, Entry points, APIs handled, Example code |
| 5 | Creating SparkSession | builder(), getOrCreate(), appName(), master(), config(), enableHiveSupport() |
| 6 | Accessing SparkContext | spark.sparkContext, Usage, Properties, Configuration, Example code |
| 7 | SparkConf Configuration | Setting app name, Master URL, Executor memory, Shuffle partitions, Dynamic allocation |
| 8 | Managing SparkContext | start, stop, active contexts, reuse, pitfalls of multiple contexts |
| 9 | SparkSession Builder Options | appName(), master(), config(), enableHiveSupport(), getOrCreate() |
| 10 | SparkSession Properties | spark.sql.shuffle.partitions, spark.executor.memory, spark.driver.memory, spark.serializer, spark.sql.catalogImplementation |
| 11 | Accessing Spark Version | spark.version, sparkContext.version, examples, compatibility check, best practices |
| 12 | SparkContext Properties | applicationId, applicationName, defaultParallelism, defaultMinPartitions, uiWebUrl |
| 13 | Local vs Cluster Mode | local, local[n], standalone cluster, YARN, Mesos, Kubernetes |
| 14 | Stopping & Restarting SparkContext | stop(), getOrCreate(), pitfalls, multiple SparkContexts, examples |
| 15 | Creating DataFrames via SparkSession | from RDD, from JSON, from CSV, from Parquet, from Hive table |
| 16 | Creating Datasets via SparkSession | from case class, from RDD, typed vs untyped, Scala/Java API, examples |
| 17 | Accessing Hive via SparkSession | enableHiveSupport(), createDatabase, useDatabase, createTable, SQL queries |
| 18 | Spark Logging & Monitoring | setLogLevel(), UI Web Port, Executor logs, Application metrics, Event logging |
| 19 | SparkSession Temporary Views | createOrReplaceTempView(), createGlobalTempView(), SQL queries, examples, lifespan |
| 20 | Spark Configuration at Runtime | spark.conf.set(), spark.conf.get(), examples, tuning, best practices |
| 21 | Broadcast Variables via SparkContext | Definition, Usage, Example, Benefits, Best practices |
| 22 | Accumulators via SparkContext | Definition, Usage, Numeric & custom, Example, Pitfalls |
| 23 | Parallelism & Partitions | defaultParallelism, numPartitions, repartition(), coalesce(), examples |
| 24 | RDD Creation via SparkContext | parallelize(), textFile(), wholeTextFiles(), from existing collections, key-value RDDs |
| 25 | Checkpointing via SparkContext | Purpose, Directory setup, Example, Difference from caching, Use cases |
| 26 | Dynamic Resource Allocation | Configuration, enable/disable, SparkConf, Executors, Scaling |
| 27 | SparkSession Integration with Data Sources | CSV, JSON, Parquet, JDBC, Hive tables |
| 28 | Best Practices for SparkSession & SparkContext | Single session per app, Reuse SparkContext, Proper stop(), Config tuning, Logging |
| 29 | Performance Tuning | shuffle partitions, memory config, caching, checkpointing, partitioning strategies |
| 30 | Wrap-up & Real-world Examples | Sample applications, ETL pipelines, ML pipelines, Streaming context, Integration examples |