11 January 2026

#Scikit

#Scikit

Key Concepts


S.No Topic Sub-Topics
1Scikit-learnWhat is scikit-learn, Installation, Key features, ML workflow, Supported algorithms
2Scikit-learn API BasicsEstimators, fit(), predict(), transform(), Pipelines, Model persistence
3Data Loading & InspectionBuilt-in datasets, load_*, fetch_*, Data shapes, Feature names, Target variables
4Data PreprocessingScaling, Normalization, Encoding categorical data, Missing values, Feature transformation
5Feature Scaling TechniquesStandardScaler, MinMaxScaler, RobustScaler, Normalizer, When to scale
6Handling Missing DataSimpleImputer, Strategies, Missing indicators, Pipeline usage, Best practices
7Encoding Categorical VariablesLabelEncoder, OneHotEncoder, OrdinalEncoder, Handling unknowns, Sparse output
8Train-Test Splittrain_test_split, Stratification, Random state, Data leakage, Validation sets
9Linear RegressionLinearRegression, Assumptions, Coefficients, Evaluation metrics, Use cases
10Logistic RegressionBinary vs multiclass, Regularization, Solver options, Class weights, Evaluation
11Model Evaluation MetricsAccuracy, Precision, Recall, F1-score, Confusion matrix
12Cross-ValidationK-Fold, StratifiedKFold, cross_val_score, cross_validate, Bias-variance tradeoff
13k-Nearest NeighborsKNN classifier, KNN regressor, Distance metrics, Choosing K, Performance
14Support Vector MachinesSVC, SVR, Kernels, Hyperparameters, Margin maximization
15Decision TreesTree structure, Gini vs entropy, Overfitting, Pruning, Feature importance
16Ensemble LearningBagging, Boosting, Random Forest, Extra Trees, Voting classifiers
17Random ForestRandomForestClassifier, Hyperparameters, Feature importance, OOB score, Use cases
18Gradient BoostingGradientBoosting, XGBoost intro, LightGBM intro, Learning rate, Trees depth
19Naive BayesGaussianNB, MultinomialNB, BernoulliNB, Assumptions, Applications
20Clustering AlgorithmsKMeans, Hierarchical clustering, DBSCAN, Silhouette score, Use cases
21Dimensionality ReductionPCA, Kernel PCA, Explained variance, Feature compression, Visualization
22Anomaly DetectionIsolation Forest, One-Class SVM, LOF, Use cases, Evaluation challenges
23Model Selection & TuningGridSearchCV, RandomizedSearchCV, Hyperparameters, Scoring, Best estimators
24Pipelines & ColumnTransformerPipeline, Feature unions, ColumnTransformer, End-to-end ML, Avoid leakage
25Imbalanced DatasetsClass imbalance, SMOTE, Class weights, Evaluation metrics, Best practices
26Text Feature ExtractionCountVectorizer, TF-IDF, N-grams, Stop words, Sparse matrices
27Model Persistencejoblib, pickle, Saving models, Loading models, Versioning
28Model InterpretationCoefficients, Feature importance, Permutation importance, Partial dependence, SHAP intro
29Scikit-learn with Pipelines in ProductionReproducibility, Monitoring, Data drift, Model updates, Best practices
30Scikit-learn Best PracticesCode structure, Experiment tracking, Documentation, Common pitfalls, Next steps

Interview question

Basic Level

  1. What is scikit-learn?
  2. What type of library is scikit-learn?
  3. Which language is scikit-learn written in?
  4. What are estimators in scikit-learn?
  5. What is the fit() method?
  6. What is the predict() method?
  7. Difference between fit() and transform()?
  8. What is supervised learning?
  9. What is unsupervised learning?
  10. What is train_test_split?
  11. What are features and labels?
  12. What is a dataset in scikit-learn?
  13. What are built-in datasets?
  14. What is accuracy score?
  15. What is a confusion matrix?
  16. What is overfitting?
  17. What is underfitting?
  18. What is a regression problem?
  19. What is a classification problem?
  20. What is clustering?
  21. What is scaling?
  22. What is normalization?
  23. What is LabelEncoder?
  24. What is OneHotEncoder?
  25. What are model parameters?

Intermediate Level

  1. What is StandardScaler?
  2. Difference between MinMaxScaler and StandardScaler?
  3. What is logistic regression?
  4. Explain linear regression in scikit-learn.
  5. What is KNN?
  6. How does KNN work?
  7. What is SVM?
  8. What are kernels in SVM?
  9. What is decision tree?
  10. What is entropy and gini index?
  11. What is Random Forest?
  12. What is ensemble learning?
  13. What is cross-validation?
  14. What is K-Fold cross-validation?
  15. What is StratifiedKFold?
  16. What is GridSearchCV?
  17. What is RandomizedSearchCV?
  18. What are hyperparameters?
  19. What is bias-variance tradeoff?
  20. What is ROC curve?
  21. What is AUC?
  22. What is precision and recall?
  23. What is F1-score?
  24. What is feature importance?
  25. What is PCA?

Advanced Level

  1. How does PCA work internally?
  2. What is explained variance?
  3. Difference between PCA and LDA?
  4. What is Gradient Boosting?
  5. Difference between Bagging and Boosting?
  6. What is AdaBoost?
  7. What is Isolation Forest?
  8. What is DBSCAN?
  9. How does KMeans clustering work?
  10. What is silhouette score?
  11. What is feature selection?
  12. Difference between feature selection and extraction?
  13. What is Recursive Feature Elimination?
  14. What is pipeline in scikit-learn?
  15. Why are pipelines important?
  16. What is ColumnTransformer?
  17. How to handle categorical features?
  18. How does scikit-learn handle missing values?
  19. What is SimpleImputer?
  20. What is model persistence?
  21. Difference between pickle and joblib?
  22. What is partial dependence plot?
  23. What is permutation importance?
  24. How to avoid data leakage?
  25. How to handle imbalanced datasets?

Expert Level

  1. How does scikit-learn architecture work?
  2. Explain estimator, transformer, predictor design.
  3. How does scikit-learn optimize performance?
  4. What is warm_start?
  5. How does scikit-learn use NumPy internally?
  6. What are sparse matrices?
  7. How does scikit-learn handle sparse data?
  8. What is SGDClassifier?
  9. Difference between batch and online learning?
  10. How to scale scikit-learn for large datasets?
  11. What are limitations of scikit-learn?
  12. Difference between scikit-learn and TensorFlow?
  13. Difference between scikit-learn and PyTorch?
  14. How to integrate scikit-learn with pandas?
  15. What is custom estimator?
  16. How to implement custom transformer?
  17. What is scoring parameter?
  18. How to evaluate regression models?
  19. What is R² score?
  20. What is model drift?
  21. How to monitor models in production?
  22. What is reproducibility in ML?
  23. How to set random_state?
  24. Explain numerical stability issues.
  25. What are best practices in scikit-learn?

Related Topics