| S.No |
Topic |
Sub-Topics |
| 1 | Spark SQL | What is Spark SQL, SQL vs DataFrame API, Use cases, Architecture overview, Components |
| 2 | Spark SQL Architecture | Catalyst optimizer, Logical plan, Physical plan, Tungsten engine, Execution flow |
| 3 | SparkSession & SQL Entry Points | SparkSession.sql(), SQLContext, HiveContext, Configurations, Best practices |
| 4 | Creating Tables & Views | Managed tables, External tables, Temporary views, Global views, CTAS |
| 5 | Data Types & Schema | Primitive types, Complex types, Struct, Array, Map |
| 6 | Reading Data Sources | CSV, JSON, Parquet, ORC, Avro basics |
| 7 | Writing Data using SQL | INSERT INTO, INSERT OVERWRITE, Save modes, Partition writes, Bucketing |
| 8 | Basic SELECT Queries | Select columns, Expressions, Aliases, DISTINCT, LIMIT |
| 9 | Filtering & WHERE Clause | WHERE conditions, AND/OR, BETWEEN, IN, LIKE |
| 10 | Sorting & Ordering | ORDER BY, SORT BY, DISTRIBUTE BY, CLUSTER BY, Performance impact |
| 11 | Aggregation Functions | COUNT, SUM, AVG, MIN, MAX |
| 12 | GROUP BY & HAVING | Grouping rules, Multiple group keys, HAVING clause, Aggregation filters, Optimization |
| 13 | Join Types in Spark SQL | Inner join, Left join, Right join, Full join, Cross join |
| 14 | Join Optimization Techniques | Broadcast joins, Shuffle joins, Join hints, Skew handling, AQE |
| 15 | Subqueries & CTEs | Scalar subqueries, Correlated subqueries, WITH clause, Nested queries, Optimization |
| 16 | Window Functions | OVER clause, PARTITION BY, ORDER BY, Ranking functions, Analytical functions |
| 17 | String & Date Functions | String manipulation, Date arithmetic, Timestamp functions, Formatting, Parsing |
| 18 | Handling NULLs | IS NULL, IS NOT NULL, COALESCE, NVL, NULLIF |
| 19 | Complex Data Types | Arrays, Maps, Structs, explode, lateral view, JSON functions |
| 20 | User Defined Functions (UDF) | Creating UDFs, Registering UDFs, Performance impact, When to avoid UDFs, Alternatives |
| 21 | Partitioning & Bucketing | Table partitioning, Static vs dynamic partitions, Bucketing concepts, Query pruning, Performance |
| 22 | Performance Optimization | Predicate pushdown, Column pruning, Caching tables, AQE, Cost-based optimizer |
| 23 | Execution Plans & Debugging | EXPLAIN, Logical vs physical plans, DAG stages, Common bottlenecks, Tuning |
| 24 | Integration with Hive | Hive metastore, HiveQL support, External tables, SerDe, Compatibility issues |
| 25 | Transactional Tables | ACID tables, Delta Lake basics, MERGE, UPDATE, DELETE, Time travel |
| 26 | Structured Streaming with SQL | Streaming tables, Continuous queries, Watermarking, Window aggregations, Triggers |
| 27 | Security & Access Control | Table permissions, Column masking, Row-level security, Auditing, Best practices |
| 28 | Error Handling & Data Quality | Bad records handling, Schema mismatch, Try-catch patterns, Data validation, Logging |
| 29 | Best Practices & SQL Standards | Naming conventions, Query readability, Anti-patterns, Reusability, Testing |
| 30 | Real-world Use Cases & Projects | ETL pipelines, Data warehousing, Reporting, Optimization review, End-to-end project |