All Data Scientist Interview Flashcards
All 150 Data Scientist interview flashcards. Tap any question to practice it.
Easy (50)
- What is the difference between supervised and unsupervised learning?
- What is the difference between classification and regression?
- When is the median preferred over the mean as a measure of central tendency?
- What is overfitting?
- What is underfitting?
- Why do you split data into training and test sets?
- What is a feature in machine learning?
- What is a label (target) in supervised learning?
- Why does correlation not imply causation?
- What is an outlier?
- What does standard deviation measure?
- What shape does a normal distribution have?
- What does a p-value represent?
- What is a null hypothesis?
- What is mean imputation?
- What is one-hot encoding?
- What does accuracy measure in classification?
- What does SQL GROUP BY do?
- Name three common SQL aggregate functions.
- What does a histogram show?
- What is a scatter plot used to visualize?
- What is the difference between categorical and numeric variables?
- What is the difference between a sample and a population?
- What is bias in a model in simple terms?
- What does high variance mean for a model?
- What is training data?
- What is a Pandas DataFrame?
- What does mean absolute error (MAE) measure?
- Why scale features before some algorithms?
- Name two ways to handle missing values.
- What is label encoding?
- How do you find the median of a sorted list?
- What is the mode of a dataset?
- How is the range of a dataset calculated?
- What is the difference between a bar chart and a histogram?
- What is the difference between a dependent and independent variable?
- What is a cross-tabulation (contingency table)?
- What does the 90th percentile mean?
- What is data cleaning?
- What is binary classification?
- What is time series data?
- What is random sampling?
- What does boolean indexing do in pandas?
- What does min-max normalization do?
- What does it mean for a distribution to be right-skewed?
- What is data aggregation?
- What does a confusion matrix show?
- In a dataset predicting house price, which is the target?
- What are descriptive statistics?
- Why does the data type of a column matter in analysis?
Medium (50)
- What is the bias-variance tradeoff?
- What is the difference between precision and recall?
- What does the F1 score balance?
- What does ROC AUC measure?
- What is k-fold cross-validation?
- What do L1 and L2 regularization do?
- What does gradient descent do?
- What happens if the learning rate is too high?
- Name two ways to handle class imbalance.
- What is feature engineering?
- What problem does multicollinearity cause in linear regression?
- How does a decision tree make predictions?
- Why does a random forest usually beat a single decision tree?
- How does gradient boosting build a model?
- What is the purpose of an A/B test?
- What does statistical significance tell you?
- What is the difference between a Type I and Type II error?
- What does a 95% confidence interval mean?
- What is the difference between standardization and normalization?
- Why use a separate validation set in addition to a test set?
- What is data leakage?
- What does logistic regression output?
- How do you compute recall from a confusion matrix?
- What does principal component analysis (PCA) do?
- How does k-means clustering work?
- What is a hyperparameter?
- What is grid search used for?
- When might you impute with the median instead of the mean?
- What does feature importance tell you in a tree model?
- Why use stratified sampling for a train/test split?
- How can you encode a categorical feature with thousands of categories?
- What does a Pearson correlation of -0.9 indicate?
- What is a loss function?
- What is the difference between sigmoid and softmax?
- Why establish a baseline model?
- Name two ways to reduce overfitting.
- What is bootstrapping in statistics?
- Name one method to detect outliers numerically.
- Which test checks association between two categorical variables?
- How can you reduce right skew in a feature?
- What is an ensemble model?
- When would you prioritize recall over precision?
- Why must preprocessing like scaling be fit inside cross-validation folds?
- Why can't you use random k-fold CV for time series?
- What does the central limit theorem state?
- Why perform feature selection?
- How does changing the classification threshold affect precision and recall?
- How does RMSE differ from MAE?
- What is the cold start problem in recommendation systems?
- What is concept/model drift?
Hard (50)
- What does increasing the L1 penalty (lambda) in lasso do to coefficients?
- What three components make up expected prediction error?
- How does XGBoost reduce overfitting beyond standard gradient boosting?
- What do SHAP values explain?
- What does it mean for a classifier to be well-calibrated?
- What is Platt scaling used for?
- Why correct for multiple comparisons?
- How does Benjamini-Hochberg differ from Bonferroni?
- What is a confounder and why does it bias causal estimates?
- What is a propensity score used for?
- What property must a valid instrumental variable satisfy?
- What assumption underlies difference-in-differences?
- How does a linear SVM's objective differ from logistic regression?
- What does the kernel trick enable?
- What is the curse of dimensionality?
- What does the Expectation-Maximization algorithm do?
- How does a GMM differ from k-means?
- What is the core difference between Bayesian and frequentist inference?
- What is a posterior distribution?
- Why use MCMC methods like Gibbs sampling or Metropolis-Hastings?
- What causes vanishing gradients in deep networks?
- What does batch normalization do?
- How does dropout regularize a neural network?
- What does the attention mechanism compute?
- What is the time/memory complexity of vanilla self-attention in sequence length n?
- What is an embedding?
- What does word2vec's skip-gram model learn?
- What does perplexity measure for a language model?
- Why can ROC AUC be misleading on highly imbalanced data?
- What is survivorship bias?
- What is Simpson's paradox?
- What is heteroscedasticity and why does it matter in regression?
- Why shrink the covariance matrix in high dimensions?
- How do tree ensembles capture feature interactions automatically?
- Why use out-of-fold predictions for stacking?
- How do you prevent leakage in target (mean) encoding?
- What is right-censored data in survival analysis?
- What does the Cox proportional hazards model assume?
- How does a multi-armed bandit differ from an A/B test?
- How does Thompson sampling choose an action?
- What does statistical power analysis determine?
- How does the percentile bootstrap construct a confidence interval?
- How does L2 regularization relate to Bayesian inference?
- Why can too many boosting rounds overfit?
- What does quantile regression estimate?
- What is the double descent phenomenon?
- What is the difference between covariate shift and label shift?
- What problem does a feature store solve?
- Name two practices that improve ML reproducibility.
- Why must the evaluation metric align with the business objective?
Ready to practice the full interview?
Try a 10-minute interview for free!
No credit card needed.
