The ___________ test in Logistic Regression can be used to assess if the Logit link function is the correct specification for the model.

  • AIC
  • Hosmer-Lemeshow
  • Likelihood-ratio
  • Link
The Link test in Logistic Regression can be used to determine if the Logit link function is the correct specification for the model.

Which field utilizes Machine Learning to recommend products or media to consumers based on their past behavior?

  • Autonomous Driving
  • Education
  • Healthcare
  • Recommender Systems
Recommender Systems use machine learning algorithms to suggest products, media, or content to users based on their past interactions and behavior, creating personalized experiences.

You built a regression model and it's yielding a very low R-Squared value. What could be the reason and how would you improve it?

  • Data noise; Apply data cleaning
  • Incorrect model; Change the model
  • Poorly fitted; Improve the model fit
  • Too many features; Reduce features
A low R-Squared value might indicate that the model doesn't fit the data well. This could be due to an incorrect choice of model, underfitting, or other issues. Improving the model fit by selecting an appropriate algorithm, feature engineering, or hyperparameter tuning can address this problem.

What is Bootstrapping, and how does it differ from Cross-Validation?

  • A method for resampling data with replacement
  • A technique for training ensemble models
  • A technique to reduce bias
  • A type of Cross-Validation
Bootstrapping is a method for resampling data with replacement, used to estimate statistics about a population from a sample. It differs from Cross-Validation, where data is split without replacement to validate the model. Bootstrapping is more about estimating the properties of an estimator, while Cross-Validation assesses the model's performance.

What are the main challenges in training a Machine Learning model with imbalanced datasets?

  • Computational complexity
  • Dimensionality reduction
  • Lack of suitable algorithms
  • Overfitting to the majority class
Training on imbalanced datasets can lead to models that are biased towards the majority class, since they have seen more examples of it. This can make the model perform poorly on the minority class.

While estimating the coefficients in Simple Linear Regression, you find that one of the assumptions is not met. How would this affect the reliability of the predictions?

  • Increase Accuracy
  • Make Predictions More Reliable
  • Make Predictions Unreliable
  • No Effect
If the assumptions of Simple Linear Regression are not met, the reliability of the predictions may be compromised, and the model may become biased or inefficient.

You have a dataset with many correlated features, and you decide to use PCA. How would you determine which Eigenvectors to keep?

  • By choosing the eigenvectors with the highest eigenvalues
  • By randomly selecting eigenvectors
  • By selecting the eigenvectors with negative eigenvalues
  • By using all eigenvectors without exception
You would keep the eigenvectors corresponding to the highest eigenvalues, as they explain the most variance in the data. The lower the eigenvalue, the less significant the corresponding eigenvector.

You need to improve the performance of a weak learner. Which boosting algorithm would you select, and why?

  • AdaBoost
  • Any boosting algorithm will suffice
  • Gradient Boosting without considering the loss function
  • Random Boosting
AdaBoost is a boosting algorithm designed to improve the performance of weak learners. By adjusting the weights of misclassified instances and focusing on them in subsequent models, AdaBoost iteratively corrects errors and enhances the overall model's performance.

You are using KNN for a regression problem. What are the special considerations in selecting K and the distance metric, and how would you evaluate the model's performance?

  • Choose K and metric considering data characteristics, evaluate using regression metrics
  • Choose fixed K and Manhattan metric, evaluate using recall
  • Choose large K and any metric, evaluate using accuracy
  • Choose small K and Euclidean metric, evaluate using precision
Selecting K and distance metric considering the data characteristics and evaluating the model using regression metrics like RMSE or MAE is the right approach for KNN in regression.

A dataset contains both categorical and numerical features. Which ensemble method might be suitable, and what preprocessing might be required?

  • Random Forest with no preprocessing
  • Random Forest with normalization
  • Random Forest with one-hot encoding
  • Random Forest with scaling
Random Forest is an ensemble method suitable for handling both categorical and numerical features. For categorical features, one-hot encoding might be required to convert them into a numerical format that the algorithm can process.