20 November 2025

#Data_Science

#Data_Science

Key Concepts


Topic SubTopics (comma separated) Basic Intermediate Advanced Expert
Python for Data Science NumPy, Pandas, Matplotlib, Seaborn, SciPy ✔️ ✔️ ✔️
Statistics & Probability Descriptive Stats, Inferential Stats, Probability Dist., Hypothesis Testing, Sampling ✔️ ✔️ ✔️ ✔️
Machine Learning Supervised, Unsupervised, Model Evaluation, Regularization, Feature Engineering ✔️ ✔️ ✔️ ✔️
Deep Learning Neural Networks, CNN, RNN, LSTM, Attention ✔️ ✔️ ✔️
Data Wrangling Cleaning, Missing Values, Outliers, Encoding, Transformation ✔️ ✔️ ✔️
Data Visualization Matplotlib, Seaborn, Plotly, Dash, Storytelling ✔️ ✔️ ✔️ ✔️
Big Data Technologies Hadoop, Spark, Hive, HDFS, Streaming ✔️ ✔️ ✔️
SQL & NoSQL Joins, Window Functions, Aggregations, MongoDB, Indexing ✔️ ✔️ ✔️ ✔️
Cloud for Data Science AWS S3, Lambda, SageMaker, GCP BigQuery, Azure ML ✔️ ✔️ ✔️
MLOps CI/CD, Model Deployment, Monitoring, ML Pipelines, Docker & Kubernetes ✔️ ✔️ ✔️
Feature Engineering Scaling, Encoding, PCA, Feature Selection, Time-Series Features ✔️ ✔️ ✔️ ✔️
Time Series Analysis ARIMA, SARIMA, ETS, Prophet, LSTM, Forecasting ✔️ ✔️ ✔️
NLP Tokenization, Embeddings, Transformers, LLMs, Text Classification ✔️ ✔️ ✔️
Data Engineering ETL, Data Pipelines, Airflow, DB Design, Data Lakes ✔️ ✔️ ✔️
Model Optimization Hyperparameter Tuning, Grid/Random Search, Bayesian Opt., Pruning ✔️ ✔️ ✔️
Deployment & Scaling REST APIs, Flask/FastAPI, Batch vs Real-Time, Scaling Models ✔️ ✔️ ✔️
Data Ethics & Governance Fairness, Bias, Explainability (XAI), GDPR, Security ✔️ ✔️ ✔️ ✔️
Mathematics for DS Linear Algebra, Calculus, Optimization, Vectorization ✔️ ✔️ ✔️ ✔️
Business Analytics KPI Analysis, Dashboards, A/B Testing, Insights ✔️ ✔️ ✔️
Research & Experimentation Experiment Design, AB Testing, Simulation, Causal Inference ✔️ ✔️ ✔️

Interview question

🟢 Basic Level

  1. What is Data Science?
  2. Difference between Data Science, Data Analytics, and Machine Learning?
  3. What are structured and unstructured data?
  4. What is a dataset?
  5. What is the role of a data scientist?
  6. What is sampling? Why is it used?
  7. Explain mean, median, and mode.
  8. What is standard deviation?
  9. What is variance?
  10. What is correlation?
  11. What is a histogram?
  12. What is a box plot?
  13. What is a scatter plot used for?
  14. Difference between a population and a sample.
  15. What is a null hypothesis?
  16. What are outliers?
  17. What is data cleaning?
  18. What is feature scaling?
  19. What is normalization?
  20. What is a train-test split?
  21. What is supervised learning?
  22. What is unsupervised learning?
  23. What is a classification problem?
  24. What is a regression problem?
  25. What is overfitting?

🟡 Intermediate Level

  1. What is underfitting?
  2. What is cross-validation?
  3. What is k-fold cross-validation?
  4. What is a confusion matrix?
  5. Explain precision and recall.
  6. What is F1-score?
  7. What is ROC-AUC?
  8. What is logistic regression?
  9. How does k-NN work?
  10. What is Naive Bayes?
  11. What is decision tree?
  12. Explain Random Forest.
  13. What is gradient boosting?
  14. What is XGBoost?
  15. What is feature engineering?
  16. What is one-hot encoding?
  17. What is label encoding?
  18. What is PCA (Principal Component Analysis)?
  19. What is multicollinearity?
  20. What is regularization?
  21. Difference between L1 and L2 regularization.
  22. What is a cost function?
  23. What is bias?variance tradeoff?
  24. What is EDA (Exploratory Data Analysis)?
  25. What are missing value handling techniques?

🔵 Advanced Level

  1. What is a neural network?
  2. What is backpropagation?
  3. Explain gradient descent.
  4. What is stochastic gradient descent?
  5. Explain vanishing gradient problem.
  6. Difference between CNN and RNN.
  7. What is LSTM?
  8. What is attention mechanism?
  9. What are word embeddings?
  10. Explain TF-IDF.
  11. What is time-series forecasting?
  12. What is ARIMA?
  13. What is stationarity in time series?
  14. What is autocorrelation?
  15. What is cross-correlation?
  16. What is anomaly detection?
  17. What is k-means clustering?
  18. What is hierarchical clustering?
  19. What is silhouette score?
  20. What is DBSCAN?
  21. What is model drift?
  22. What is data leakage?
  23. What is MLOps?
  24. What is Docker in ML deployment?
  25. Explain REST API for ML model.

🔴 Expert Level

  1. Explain transformer architecture.
  2. What are large language models (LLMs)?
  3. What is reinforcement learning?
  4. Explain Q-learning.
  5. What is Markov Decision Process (MDP)?
  6. What is approximate inference?
  7. Explain Bayesian networks.
  8. What is causal inference?
  9. What is counterfactual prediction?
  10. What is SHAP?
  11. What is LIME?
  12. Explain gradient boosting in detail.
  13. How does CatBoost handle categorical variables?
  14. What is feature store?
  15. Explain Data Lake vs Data Warehouse.
  16. What is delta lake?
  17. What is distributed training?
  18. What is parameter server architecture?
  19. What is vector database?
  20. What is embedding dimensionality reduction?
  21. Explain end-to-end ML lifecycle.
  22. What is model observability?
  23. What is multi-armed bandit algorithm?
  24. Explain optimization using Adam, RMSProp.
  25. Explain real-time ML pipelines (Kafka + Spark + ML model).

Related Topics