How do ensemble methods like Random Forest and Gradient Boosting help in improving the model's performance?

  • By focusing on one strong model
  • By increasing overfitting
  • By leveraging multiple models to achieve better accuracy and robustness
  • By reducing computational complexity
Ensemble methods like Random Forest and Gradient Boosting combine the strengths of multiple models to achieve better accuracy and robustness. By leveraging a diverse set of models, they often outperform single models, especially on complex tasks, and reduce the risks of overfitting.

When applying the K-Nearest Neighbors algorithm, scaling the features is essential because it ensures that each feature contributes __________ to the distance computation.

  • differently
  • equally
  • maximally
  • minimally
Scaling features in KNN ensures that each feature contributes equally to the distance computation, preventing features with larger scales from dominating.

Describe how Machine Learning algorithms are implemented in sentiment analysis and customer feedback systems.

  • Drug Discovery
  • Image Recognition
  • Inventory Management
  • Text Classification
Sentiment analysis in customer feedback systems often involves text classification techniques. Machine learning algorithms like SVM, Naïve Bayes, or deep learning models can categorize customer comments into positive, negative, or neutral sentiment.

Why might it be important to consider interaction effects in a Multiple Linear Regression model?

  • It captures complex relationships
  • It increases accuracy independently
  • It reduces bias
  • It simplifies the model
Considering interaction effects is essential to capture complex relationships between variables that might not be apparent when considering each variable separately.

You are working on a dataset with an imbalanced class distribution. How would you apply Cross-Validation to ensure that each fold maintains the same class distribution?

  • Applying Cross-Validation without folding
  • Using Leave-One-Out Cross-Validation
  • Using k-fold Cross-Validation with random sampling
  • Using stratified k-fold Cross-Validation
Using stratified k-fold Cross-Validation ensures that each fold maintains the same class distribution by having the same proportion of each class as the entire dataset. It's a suitable choice for imbalanced class distribution, as it guarantees that each fold is a representative sample of the overall class proportions in the dataset.

What is Accuracy in the context of classification metrics?

  • False Positives / Total predictions
  • Total correct predictions / Total predictions
  • True Negatives / (True Negatives + False Positives)
  • True Positives / (True Positives + False Negatives)
Accuracy is the ratio of correct predictions to the total number of predictions. It gives an overall measure of how well the model is performing, but may not be suitable for imbalanced datasets where one class dominates.

The slope coefficient in Simple Linear Regression gives the _________ change in the dependent variable for a one-unit change in the independent variable.

  • Absolute
  • Constant
  • Incremental
  • Marginal
The slope coefficient in Simple Linear Regression gives the marginal change in the dependent variable for a one-unit change in the independent variable.

What type of learning algorithm utilizes labeled data to make predictions?

  • Reinforcement Learning
  • Semi-supervised Learning
  • Supervised Learning
  • Unsupervised Learning
Supervised Learning uses labeled data, where the output is known, to train the algorithm and make predictions.

The assumption that the relationship between the independent and dependent variable is linear in Simple Linear Regression is called the assumption of _________.

  • Homoscedasticity
  • Independence
  • Linearity
  • Normality
The assumption of linearity ensures that the relationship between the independent and dependent variable is linear, which is fundamental to Simple Linear Regression.

A set of input variables and corresponding target values used to evaluate a model's performance is referred to as a _________ set.

  • evaluation
  • testing
  • training
  • validation
A "testing" set consists of input variables and corresponding target values used to assess a machine learning model's performance on unseen data, allowing for a more robust evaluation.

What is the Confusion Matrix, and what information does it provide about a classification model?

  • A matrix representing classification errors
  • A matrix representing feature importance
  • A matrix representing model's coefficients
  • A matrix representing model's hyperparameters
The Confusion Matrix is a table that describes the performance of a classification model by categorizing predictions into True Positives, False Positives, True Negatives, and False Negatives. It gives detailed insight into where the model is making mistakes.

You've developed a Polynomial Regression model with a high-degree polynomial, and it's performing exceptionally well on the training data but poorly on the test data. What might be the issue, and how would you address it?

  • Add more features
  • Increase the degree
  • Reduce the degree or apply regularization
  • Use a different algorithm entirely
The issue likely is overfitting due to the high-degree polynomial. Reducing the degree or applying regularization techniques like Ridge or Lasso can help to reduce the model's complexity and improve generalization to unseen data.