Prime_Questions: #Data

#Data_Science

Define Data Science

Define Data Profiling

What is regression? Which models can you use to solve a regression problem?

What is linear regression? When do we use it?

What is gradient descent? How does it work? ‍

What is the normal equation? ‍

What is SGD ? stochastic gradient descent? What?s the difference with the usual gradient descent? ‍

What is overfitting?

What is K-fold cross-validation?

What is classification? Which models would you use to solve a classification problem?

What is logistic regression? When do we need to use it?

What is sigmoid? What does it do?

What is accuracy?

What is the confusion table? What are the cells in this table?

What is precision, recall, and F1-score?

What is the ROC curve? When to use it? ‍

What is AUC (AU ROC)? When to use it? ‍

What is the PR (precision-recall) curve? ‍

What is the area under the PR curve? Is it a useful metric? ‍

What is regularization? Why do we need it?

What is feature selection? Why do we need it?

What is random forest?

What is gradient boosting trees? ‍

What is ReLU? How is it better than sigmoid or tanh? ‍

What is dropout? Why is it useful? How does it work? ‍

What is backpropagation? How does it work? Why do we need it? ‍

What is Adam? What?s the main difference between Adam and SGD? ‍

What is model checkpointing? ‍

What is transfer learning? How does it work? ‍

What is object detection? Do you know any architectures for that? Spring

What is object segmentation? Do you know any architectures for that? Spring

What is bag of words? How we can use it for text classification? ‍

What is TF-IDF? How is it useful for text classification? ‍

What is unsupervised learning?

What is clustering? When do we need it?

What is the curse of dimensionality? Why do we care about it? ‍

What is the ranking problem? Which models can you use to solve them? ‍

What is precision and recall at k? ‍

What is mean average precision at k? ‍

What is a recommender system?

What is collaborative filtering? ‍

What is the cold start problem? ‍

What is a time series?

What are the methods for solving linear regression do you know? ‍

What are MSE and RMSE?

What are the decision trees?

What are the main parameters of the decision tree model?

What are the benefits of a single decision tree compared to more complex models? ‍

What are the main parameters of the random forest model? ‍

What are the potential problems with many large trees? ‍

What are the main parameters in the gradient boosting model? ‍

What are the problems with sigmoid as an activation function? ‍

What are augmentations? Why do we need them? What kind of augmentations do you know? How to choose which augmentations to use? ‍

What are the advantages and disadvantages of bag of words? ‍

What are N-grams? How can we use them? ‍

What are word embeddings? Why are they useful? Do you know Word2Vec? ‍

What are the other clustering algorithms do you know? ‍

What are good unsupervised baselines for text information retrieval? ‍

What are good baselines when building a recommender system? ‍

What are the problems with using trees for solving time series problems?

What do we do with categorical variables? ‍

How do we check if a variable follows the normal distribution? ‍

How do we choose K in K-fold cross-validation? What?s your favorite K?

How do we evaluate classification models?

How do we select the right regularization parameters?

How do we interpret weights in linear models? ‍

How do we train decision trees? ‍

How do we handle categorical variables in decision trees? ‍

How do we select the depth of the trees in random forest? ‍

How do we know how many trees we need in random forest? ‍

How do you approach tuning parameters in XGBoost or LightGBM? Spring

How do you select the number of trees in the gradient boosting model? ‍

How do we use SGD (stochastic gradient descent) for training a neural net? ‍

How do we decide when to stop training a neural net?

How do you do an online evaluation of a new ranking algorithm? ‍

How to validate your models?

How to interpret the AU ROC score? ‍

How to set the learning rate? ‍

How to select K for K-means? ‍

Are CNNs resistant to rotations? What happens to the predictions of a CNN if an image is rotated? Spring

Are there any differences between continuous and discrete variables when it comes to feature importance of gradient boosting models? Spring

Can we have both L1 and L2 regularization components in a linear model? ‍

Can we use L1 regularization for feature selection? ‍

Can we use L2 regularization for feature selection? ‍

Can we formulate the search problem as a classification problem? How? ‍

Can you explain how cross-validation works?

Can you tell us how you approach the model training process? ‍

Do we want to have a constant learning rate or we better change it throughout training? ‍

Do you know any other ways to get word embeddings? Spring

Do you know how K-means works? ‍

Do you know how DBScan works? ‍

Do you know any dimensionality reduction techniques? ‍

Do you know how to use gradient boosting trees for ranking? Spring

Feature importance in gradient boosting trees ? what are possible options? ‍

How can we know which features are more important for the decision tree model? ‍

How can we use machine learning for text classification? ‍

How can you use neural nets for text classification? Spring

How can we use CNN for text classification? Spring

How can we use machine learning for search? ‍

How can we get training data for our ranking algorithms? ‍

How can we use clicks data as the training data for ranking algorithms? Spring

How does L2 regularization look like in a linear model? ‍

How does a usual fully-connected feed-forward neural network work? ‍

How does max pooling work? Are there other pooling techniques? ‍

How is time series different from the usual regression problem?

How L1 regularization looks like in a linear model? ‍

How large should be N for our bag of words when using N-grams? ‍

How we can initialize the weights of a neural network? ‍

How we can use neural nets for computer vision? ‍

How we can incorporate implicit feedback (clicks, etc) into our recommender systems? ‍

How would you evaluate your ranking algorithms? Which offline metrics would you use? ‍

If a weight for one variable is higher than for another ? can we say that this variable is more important? ‍

If there?s a trend in our series, how we can remove it? And why would we want to do it? ‍

If you have a sentence with multiple words, you may need to combine multiple word embeddings into one. How would you do it? ‍

In which cases AU PR is better than AU ROC? ‍

Is accuracy always a good metric?

Is feature selection important for linear models? ‍

Is it easy to parallelize training of a random forest model? How can we do it? ‍

Is it possible to parallelize training of a gradient boosting model? How to do it? ‍

Is logistic regression a linear model? Why?

Possible approaches to solving the cold start problem? ‍Spring

Precision-recall trade-off ‍

What happens to our linear regression model if we have three columns in our data: x, y, z ? and z is a sum of x and y? ‍

What happens to our linear regression model if the column z in the data is a sum of columns x and y and some random noise? ‍

What happens when we have correlated features in our data? ‍

What happens when the learning rate is too large? Too small?

What if we want to build a model for predicting prices? Are prices distributed normally? Do we need to do any pre-processing for prices? ‍

What if instead of finding the best split, we randomly select a few splits and just select the best from them. Will it work? Spring

What if we set all the weights of a neural network to 0? ‍

What kind of regularization techniques are applicable to linear models? ‍

What kind of problems neural nets can solve?

What kind of CNN architectures for classification do you know? Spring

What regularization techniques for neural nets do you know? ‍

What?s a convolutional layer? ‍

What?s pooling in CNN? Why do we need it? ‍

What?s singular value decomposition? How is it typically used for machine learning? ‍

What?s the normal distribution? Why do we care about it?

What?s the effect of L2 regularization on the weights of a linear model? ‍

What?s the difference between L2 and L1 regularization? ‍

What?s the interpretation of the bias term in linear models? ‍

What?s the difference between random forest and gradient boosting? ‍

What?s the difference between grid search parameter tuning strategy and random search? When to use one or another? ‍

What?s the learning rate?

When do we need to perform feature normalization for linear models? When it?s okay not to do it? ‍

When would you use Adam and when SGD? ‍

When would you choose K-means and when DBScan? ‍

Which feature selection techniques do you know? ‍

Which metrics for evaluating regression models do you know?

Which model would you use for text classification with bag of words features? ‍

Which models do you know for solving time series problems? ‍

Which optimization techniques for training neural nets do you know? ‍

Which parameter tuning strategies (in general) do you know? ‍

Which regularization techniques do you know? ‍

Why do we need to split our data into three parts: train, validation, and test?

Why do we need one-hot encoding? ‍

Why do we need randomization in random forest? ‍

Why do we need activation functions?

Why do we actually need convolutions? Can?t we use fully-connected layers for that? ‍

Would you prefer gradient boosting trees model or logistic regression when doing text classification with bag of words? ‍

Would you prefer gradient boosting trees model or logistic regression when doing text classification with embeddings? ‍

You have a series with only one variable ?y? measured at time t. How do predict ?y? at time t+1? Which approaches would you use? ‍

You have a series with a variable ?y? and a set of features. How do you predict ?y? at t+1? Which approaches would you use? ‍

Define Python Panda operations followed in Data Science technology

Define the structure of Artificial Neural Networks

Define Back Propagation and its working process

Define Lambda functions with example

Define Power Analysis in R

What are the types of machine learning?

What is the Supervised learning in machine learning?

What is the Unsupervised learning in machine learning?

What are the commonly used python packages?

What are the commonly used R packages?

What is precision?

What is recall?

What is a normal distribution?

What is overfitting?

What is underfitting?

What is a univariate analysis?

What is the Pearson correlation?

What is the common perception about visualization?

What are the time series algorithms?

What is the basic responsibility of a Data Scientist?

What does SAS stand out to be the best over other data analytics tools?

What is RUN-Group processing?

What is the right way to validate the SAS program?

What is means by precision and Recall?

What is deep learning?

What is the F1 score?

What is the difference between Machine learning Vs Data Mining?

What are confounding variables?

What are the types of biases that can occur during sampling?

What is alias in import statement? Why is it used?

What is a nonparametric test used for?

What are the pros and cons of Decision Trees algorithm?

What are pros and cons of Naive Bayes algorithm?

What are the types of Skewness?

What is skewed data?

What is the skewness of this data? 27 ; 28 ; 30 ; 32 ; 34 ; 38 ; 41 ; 42 ; 43 ; 44 ; 46 ; 53 ; 56 ; 62

What is an outlier?

What are the applications of data science?

What are the steps in exploratory data analysis?

What are the types of data available in Enterprises?

What are the various types of analysis on type of data?

What is difference between primary data and secondary data?

What is the difference between qualitative & quantitative ?

What is histogram?

What are the common measures of central tendancies?

What are quartiles?

What are the commonly used error metrics in regression tasks?

What are the commonly used error metrics for classification tasks?

What is it called when there are more than 1 explanatory variables in the regression task?

What are residuals in a regression task?

What are the main classifications in Machine learning?

What are the main types of supervised learning tasks?

What is R square value?

What are some common ways of imputation?

What is the difference between series and list

What parameter is used to update the data without explicitly assigning data to a variable.

What is the difference between a dictionary and a set?

What is the function to create test train split?

What is pickling?

What is unpickling?

What are the most common web frameworks of Python?

What are lambda function in Python and how it is different from def (defining functionsin Python?

What is your opinion on our current data process &nbs p;?

What do you know by the term Normal Distribution?

What is data visualization?

What Is a System?

What are the different benefits of language?

What are the two main elements of the hottest architecture?

What is Logistic Recession?

What are the different features of the mechanical learning process?

What Is Normal Distribution?

What is Linear Recreation?

What Is Interpolation and Extrapolation?

What is Power Analysis?

What is Q-Meaning? Can K Choose a K-Method?

What is the recommended system?

What is Linear Recreation?

What is TFT / ITF Vectation?

What is the Cluster Model?

What is the regulatory model?

What are the agenciers and the agenwals?

What is a Pilab?

What is PEP8?

What is the monkey grafting in Python?

What does it mean to understand the list?

What is the output of the code below?

What are the basic assumptions for linear backlash?

What is the benefit of dimension reduction before applying an SVM?

What is Data Science ?

What are the skills required in Data Science ?

What is Machine Learning ?

What is the difference between Traditional Programming and Machine Learning ?

What are the types of Machine Learning Algorithms?

What are the main components of a data science project ?

What percentage of time is usually required for each component in data science projects ?

What is Artifical Intelligence ?

What is Deep Learning (DL)?

What is Backpropagation?

What is Stochastic Gradient Descent?

What is Data Science?

What is the difference between iloc and loc activity?

What package is used to import data from the Oracle server?

What do the review process do?

What do Dummies do?

What is the curtain?

What is the removal of data backward in advance?

What is Unequal Data?

What is standardization?

What is Panda in Data Science and for which data it is suitable?

What is a p-value?

What is Data science? What is the role of Machine Learning in Data science?

What you mean by Type I error and Type II error in Hypothesis testing?

What is Logistic regression? How will you evaluate your Logistic regression model?

What is the difference between ANOVA and t-test?

What is the difference between Overfitting and Underfitting?

What are the steps involved in an analytics project?

What all are the main packages used in Python for Data science and Machine Learning?

What are the assumptions required for linear regression?

What is the difference between Covariance and correlation?

What is Gradient descent?

What is Regularization?

What do you mean by Imbalanced Classes?

What is bias ? Variance trade off?

What are the evaluation metrics in Classification algorithm?

What are Ensemble, Bagging and Boosting?

What are the skills are required to learn the data science with respect to python?

What are the types of joins?

What are the Types of Request database Flask allows?

What are the Various Methods for Sequential Supervised Learning?

What are the areas Pattern recognition is used.

What are the supported data types in Python?

What is Flask?Is flask equivalent to MVC Model?

What are the types of Bias?

What are the Different Data Structures in R?

Name the commonly used algorithms.

Name few methods for Missing Value Treatments.

Name some Classification Algorithms.

Name some Python Libraries used in Machine Learning .

Name some supervised and unsupervised deep learning algorithms.

Name some Python libraries used in Deep Learning

Write code to sort a DataFrame in Python in descending order.

Write a code using Panda

Write syntax for creating sting variable?

write the types of Techniques of Machine Learning?

Write a query that returns the Details of each department and a count of the number of Students in each:

write the types of Techniques of Machine Learning?

Write a syntax, how you access a module written in Python from C

Write the Components of relational evaluation techniques.

Write the types of paradigms of ensemble methods?

Write a syntax, how you access a module written in Python from C

Write syntax for creating sting variable?

write the types of Techniques of Machine Learning?

Write program to convert uppercase little to lower case

Explain about from capture of the correlation between continuous and categorical variable?

Explain what is the regulation and why it is useful. Regularization?

Explain the use of decorators?

Explain supervised and unsupervised machine learning

Explain Confusion Matrix

Explain Normal Distribution

Explain Covariance and Correlation in Data Science

Explain Linear Regression

Explain Collaborative Filtering

Explain Python Dictionary

Explain Auto Encoder

Explain Rmarkdown

Explain K-Means clustering?

Explain why data cleaning is important in analysis ?

Explain split(), sub(), subn(methods of ?re?

Explain the use of // Divisionoperator in Python?

Explain about Sequence Learning?

Explain that why data cleaning is important in analysis ?

Why you should use NumPy arrays instead of nested Python lists?

Why is an import statement required in Python?

Why is data important in data analysis?

Why data analysis is an important part of the analysis?

Why is a useful metric?

Why we need to use a python tuple is preferred over python list ?

Which metric acts like accuracy in classification problem statement?

Which Python library is used for data visualization?

Which function is used to get descriptive statistics of a dataframe?

Which function can be used to filter a DataFrame?

Which language is suitable for text analysis? R or Python?

Which tool should you use to find the bugs?

Which Python Library is used by Machine Leader?

Which symbol is used to add a comment in R language?

How and by what methods data visualizations can be effectively used?

How to understand the problems faced during data analysis?

How to choose the right chart in case of creating a viz?

How can I achieve accuracy in the first model that I built?

How do I enhance a SAS analyst?

How is F1 score is used?

How can you randomize the items of a list in place in Python?

How to get indices of N maximum values in a NumPy array?

How make you 3D plots/visualizations using NumPy/SciPy?

How to access a specific script inside a module?

How to create a series with letters as index?

How to convert n number of series to a dataframe?

How to select a section of a dataframe?

How are exceptions handled in Python?

How to differentiate from KNN and K-means clustering?

How to Clean Data is an Important Part of the Process?

How do Data Scientists use statistics?

How is Machine Learning Used in Real World Scenes?

How does data modeling change from database format?

How to sort items of the list in Python?

How do you see if a panda data information is empty or not?

How to Assign Code to the List?

How to read an Excel file without a file file in the Byndah?

How often you should update an algorithm?

How will you define supervised and unsupervised learning?

How will you evaluate your regression model based on R2, Adjusted R2 and tolerance?

How will you define your number of clusters in K-MeAnswer: clustering algorithm?

How kNN is different from K-MeAnswer: clustering?

How gradient descent is helpful in ML?

How would you create an empty NumPy array?

How would you make a Python script executable on Unix?

How would we can create an empty NumPy array?

How will you reverse a list?

How will you remove last object from a list?

How to find the best approximate solution to the knapsack problem1 in a given time by using best Algorithm

Where to seek help in case of discrepancies in Tableau?

Where we are mostly using naiveBayes algorithm for classification?

Who is a Data Scientist ?

Difference between supervised and unsupervised machine learning?

Difference between Machine learning and Data Mining?

Difference between an Array and a Linked list?

Difference between ?long? and ?wide? format data?

Difference between distinct, bivariate and multivariate analysis?

Difference between Supervised and unsupervised?

In R, how will you load a .csv file?

In R Language, provide the usage of Next statement

Advantages of Tableau Prep?

Algorithm for a sorting a number dataset in Python.

Are the aliases used for a module fixed/static ?

Can Random forest be used for classification and regression?

Can the values be replaced in tuple?

Can you briefly describe the scientific method?

Can you quote some examples of false positives that are more false than negative ones?

Can you cite some of the worst negative examples of negative negative than negative ones?

Can you quote some examples of both false positives and misinformation?

Can you explain the difference between a verification set and test set?

Can the formula be written to calculate the R-square?

Can you provide sample code for creating a data frame in order to perform slicing in Panda?

Can you explain few things about ShinyR?

Can you perform some comparison on R and Python which is useful in Data Science?

Can you write a R programming code?

Cite an example where both false negative and false positives has equal importance

Compare SAS, R, and Python programming?

Data Science Carrier

Data Science job areas

Definitions of is BY-Group processing?

Describe univariate, bivariate and multivariate analysis?

Describe the feature selection approaches that is used to pick the correct variables

Describe Batch, and Epoch in Deep Learning

Describe LSTM

Differences between overfitting and underfitting

Differentiate between univariate, bivariate and then multivariate analysis.

Differentiate between univariate, bivariate and multivariate analysis?

Do you know any SAS functions and Call Routines?

Do you explain the word Botnet?

Do you know the various components of graphics grammar in R?

Explian Naïve ? Bayes algorithm?

For a categorical variable, what is the process to check frequency distribution?

Give examples of supervised and unsupervised ML algorithms.

Give me two important tasks in the pants?

Give me the steps for an analytics project

Give an example of optimizing a python code

Give us a pictorial representation of the Decision Tree algorithm in Data Science

Give example for unzipping.

If you provide employees? first and last names, what type of data in Python stores them?

Import of Flat File / CSV in Baidan

Is multiprocessing possible in python?

Is it possible to merge two (2data frames in R? If yes, how is that done?

List out the different classification algorithms

List out the Kernel functions available in SVM

List out few functions that are available in dplyr package

List out the Supervised Learning Functions.

list having tweets, find 10 most used top hashtags.

Mention the characteristics of symmetric data distribution?

Mention few important skills to contain in Python with respect to Data Analytics

Mention the functions that are used to copy objects in Python

Mention Any Five Algorithms of Machine Learning.

Mention the Different types of sequence learning process?

Now companies are heavily investing their money and time to make the dashboards. Why?

Program for one-linear that will count the number of capital letters in a file.

Provide the Life cycle of Data Science

Provide the technical concepts handled in Supervised, Unsupervised and Reinforcement Learning

Provide the different types of Biases that occur during Sampling and give a one-line definition for each type.

Provide the various layers of CNN

Provide the Machine Learning libraries along with its benefits

Provide the basic steps to create a new R6 class

Provide an example for False Positive in Data Science

Provide the various Deep Learning frameworks

Scope and Applications of Statistics

The Difference Between Data Modeling and Database Design?

The performance of the K modular system?

What? s the difference between a Regression and a Classification problem?

You should find that data is stored in HDFS format and how the data is structured. Which command should you use to identify the names of HDFS keys?

You are given a dataset and you have build a decision tree model on top of it. You got an accuracy of 98%. Why you shouldn?t happy with your model performance?

Data Ingestion: Connect and ingest data from APIs, DBs, and logs. Data Pipeline: Build and maintain ETL/ELT workflows. Data Cleaning: Handle missing data and outliers. Data Transformation: Convert raw data using SQL or Spark. Data Storage: Manage data lakes and warehouses. Batch Processing: Utilize Spark, Hadoop, Hive for processing. Stream Processing: Real-time data with Kafka, Flink. Database Design: Create schemas and indexes. SQL Tuning: Optimize queries for performance. NoSQL Management: Use MongoDB, Cassandra.
Data Security: Implement encryption and masking. Data Governance: Ensure data compliance. Cloud Integration: Use AWS, GCP, Azure for storage/processing. Orchestration: Manage workflows with Airflow. Data Monitoring: Implement data quality checks. Logging: Track and log data flows. Data Versioning: Manage data changes with tools like DVC. IaC: Automate infrastructure setup with Terraform. Data Modeling: Design fact/dimension tables. Replication: Ensure data availability.
Backup/Recovery: Plan for data recovery. Performance Tuning: Optimize data pipelines. Data Lake Management: Organize unstructured data. Data Lineage: Track data movement. Data Archival: Implement cold storage. Version Control: Manage code with Git. Auditing: Track data access. Containerization: Use Docker/Kubernetes for scalability. Event-Driven Pipelines: Implement event-based systems. Partitioning: Manage faster reads/writes.
Monitoring/Alerting: Set up tools like Prometheus. API Development: Build data APIs. ML Integration: Help with model deployment. Data Cataloging: Document datasets with tools. Spark Tuning: Optimize Spark jobs. Distributed Computing: Handle parallel processing. Scalability: Design scalable solutions. DevOps: Collaborate for CI/CD pipelines. Sharding: Distribute data for load balancing. Metadata Management: Organize metadata.
Concurrency Control: Ensure data consistency. Load Balancing: Optimize for high throughput. Data Compression: Use Parquet/ORC for efficiency. Query Caching: Cache frequent queries. Data Discovery: Make data easy to find. Business Collaboration: Translate requirements into data solutions. Scheduling Pipelines: Automate pipeline runs. Cost Optimization: Optimize cloud performance/cost. Data Flow Understanding: Manage the full data lifecycle. Documentation: Keep pipelines/processes well-documented.

Prime_Questions

Pages

01 January 2021

#Data_Science

No comments:

Post a Comment

Most views on this month

Popular Posts

Blog Archive

Labels

Top visited page

Feedback