| S.No |
Topic |
Sub-Topics |
| 1 | Scikit-learn | What is scikit-learn, Installation, Key features, ML workflow, Supported algorithms |
| 2 | Scikit-learn API Basics | Estimators, fit(), predict(), transform(), Pipelines, Model persistence |
| 3 | Data Loading & Inspection | Built-in datasets, load_*, fetch_*, Data shapes, Feature names, Target variables |
| 4 | Data Preprocessing | Scaling, Normalization, Encoding categorical data, Missing values, Feature transformation |
| 5 | Feature Scaling Techniques | StandardScaler, MinMaxScaler, RobustScaler, Normalizer, When to scale |
| 6 | Handling Missing Data | SimpleImputer, Strategies, Missing indicators, Pipeline usage, Best practices |
| 7 | Encoding Categorical Variables | LabelEncoder, OneHotEncoder, OrdinalEncoder, Handling unknowns, Sparse output |
| 8 | Train-Test Split | train_test_split, Stratification, Random state, Data leakage, Validation sets |
| 9 | Linear Regression | LinearRegression, Assumptions, Coefficients, Evaluation metrics, Use cases |
| 10 | Logistic Regression | Binary vs multiclass, Regularization, Solver options, Class weights, Evaluation |
| 11 | Model Evaluation Metrics | Accuracy, Precision, Recall, F1-score, Confusion matrix |
| 12 | Cross-Validation | K-Fold, StratifiedKFold, cross_val_score, cross_validate, Bias-variance tradeoff |
| 13 | k-Nearest Neighbors | KNN classifier, KNN regressor, Distance metrics, Choosing K, Performance |
| 14 | Support Vector Machines | SVC, SVR, Kernels, Hyperparameters, Margin maximization |
| 15 | Decision Trees | Tree structure, Gini vs entropy, Overfitting, Pruning, Feature importance |
| 16 | Ensemble Learning | Bagging, Boosting, Random Forest, Extra Trees, Voting classifiers |
| 17 | Random Forest | RandomForestClassifier, Hyperparameters, Feature importance, OOB score, Use cases |
| 18 | Gradient Boosting | GradientBoosting, XGBoost intro, LightGBM intro, Learning rate, Trees depth |
| 19 | Naive Bayes | GaussianNB, MultinomialNB, BernoulliNB, Assumptions, Applications |
| 20 | Clustering Algorithms | KMeans, Hierarchical clustering, DBSCAN, Silhouette score, Use cases |
| 21 | Dimensionality Reduction | PCA, Kernel PCA, Explained variance, Feature compression, Visualization |
| 22 | Anomaly Detection | Isolation Forest, One-Class SVM, LOF, Use cases, Evaluation challenges |
| 23 | Model Selection & Tuning | GridSearchCV, RandomizedSearchCV, Hyperparameters, Scoring, Best estimators |
| 24 | Pipelines & ColumnTransformer | Pipeline, Feature unions, ColumnTransformer, End-to-end ML, Avoid leakage |
| 25 | Imbalanced Datasets | Class imbalance, SMOTE, Class weights, Evaluation metrics, Best practices |
| 26 | Text Feature Extraction | CountVectorizer, TF-IDF, N-grams, Stop words, Sparse matrices |
| 27 | Model Persistence | joblib, pickle, Saving models, Loading models, Versioning |
| 28 | Model Interpretation | Coefficients, Feature importance, Permutation importance, Partial dependence, SHAP intro |
| 29 | Scikit-learn with Pipelines in Production | Reproducibility, Monitoring, Data drift, Model updates, Best practices |
| 30 | Scikit-learn Best Practices | Code structure, Experiment tracking, Documentation, Common pitfalls, Next steps |