| S.No |
Topic |
Sub-Topics |
| 1 | Spark | Overview, Features, Spark vs Hadoop, Components, Use cases |
| 2 | Spark Architecture | Driver, Executors, Cluster Manager, DAG, Spark Context |
| 3 | RDD Basics | Resilient Distributed Dataset, Creation, Transformations, Actions, Persistence |
| 4 | RDD Advanced | Partitioning, Caching, Checkpointing, Narrow vs Wide transformations, lineage |
| 5 | DataFrames Basics | Creation, Schema, Columns, Show(), PrintSchema() |
| 6 | DataFrames Operations | Select, Filter, GroupBy, Aggregate, Join |
| 7 | DataFrames Advanced | UDF, Window Functions, Pivot, Explode, Sorting |
| 8 | Spark SQL | SQLContext, SparkSession, Running SQL Queries, Caching tables, tempView |
| 9 | Datasets Basics | Typed API, Creation, Transformations, Actions, Interoperability with DataFrames |
| 10 | Data Loading | CSV, JSON, Parquet, Avro, ORC |
| 11 | Data Writing | SaveAsTable, write.parquet, write.json, Overwrite, Append |
| 12 | Spark Streaming Basics | DStream, StreamingContext, Window operations, Input sources, Output operations |
| 13 | Spark Structured Streaming | SparkSession, readStream, writeStream, Triggers, Watermarking |
| 14 | Streaming Advanced | Stateful processing, Aggregations, Joins, Checkpointing, Fault tolerance |
| 15 | Spark MLlib Basics | Machine Learning Pipeline, Transformers, Estimators, Features, Label encoding |
| 16 | MLlib Algorithms | Linear Regression, Logistic Regression, Decision Trees, Random Forest, Clustering |
| 17 | MLlib Feature Engineering | StandardScaler, MinMaxScaler, OneHotEncoder, VectorAssembler, PCA |
| 18 | MLlib Model Evaluation | Train/Test Split, CrossValidation, Metrics (Accuracy, RMSE), ParamGrid, Evaluator |
| 19 | Spark GraphX Basics | Graphs, Vertices, Edges, Pregel, Graph operators |
| 20 | GraphX Advanced | PageRank, Connected Components, Triangle Count, Shortest Path, Graph Algorithms |
| 21 | Spark Performance Tuning | Partitioning, Caching, Serialization, Shuffle, Broadcast variables |
| 22 | Spark Deployment | Standalone Mode, YARN, Mesos, Kubernetes, Cluster configuration |
| 23 | Spark with Python (PySpark) | RDD, DataFrame API, UDF, MLlib, Integration with Pandas |
| 24 | Spark with Scala | RDD API, DataFrame API, Typed Datasets, MLlib, Spark SQL |
| 25 | Spark with Java | RDD API, DataFrame API, Datasets, MLlib, JavaRDD |
| 26 | Spark with R (SparkR) | DataFrames, SparkR SQL, Machine Learning, Integration with RStudio, Visualization |
| 27 | Debugging & Logging | Logs, Spark UI, DAG visualization, Event logs, Executors monitoring |
| 28 | Security in Spark | Authentication, Authorization, SSL, Encryption, Kerberos |
| 29 | Spark Ecosystem | Hive, HBase, Kafka, Cassandra, Hadoop integration |
| 30 | End-to-End Project | Data ingestion, Cleaning, Transformations, Machine Learning, Deployment |