You have built an SVM for a binary classification problem but the model is overfitting. What changes can you make to the kernel or hyperparameters to improve the model?

  • Change the kernel's color
  • Change to a simpler kernel or adjust the regularization parameter 'C'
  • Ignore overfitting
  • Increase the kernel's complexity
Overfitting can be mitigated by choosing a simpler kernel or adjusting the regularization parameter 'C', allowing for a better balance between bias and variance.

How does DBSCAN handle outliers compared to other clustering algorithms?

  • Considers them as part of existing clusters
  • Ignores them completely
  • Treats more isolated points as noise
  • Treats them as individual clusters
DBSCAN has a unique way of handling outliers, treating more isolated points as noise rather than forcing them into existing clusters or forming new clusters. This approach allows DBSCAN to identify clusters of varying shapes and sizes while ignoring sparse or irrelevant points, making it more robust to noise and outliers compared to some other clustering methods.

What could be the potential problems if the assumptions of Simple Linear Regression are not met?

  • Model May Become Biased or Inefficient
  • Model May Overfit
  • Model Will Always Fail
  • No Impact on Model
If the assumptions of Simple Linear Regression are not met, the model may become biased or inefficient, leading to unreliable estimates. It may also affect the validity of statistical tests.

Ridge regularization adds a ________ penalty to the loss function, which helps to constrain the coefficients.

  • L1
  • L1 and L2
  • L2
  • nan
Ridge regularization adds an L2 penalty to the loss function, which helps to reduce the coefficients' magnitude without setting them to zero.

Imagine you are working with a large dataset, and the Elbow Method is computationally expensive. What alternative methods might you consider for determining the number of clusters?

  • Double the number of centroids
  • Gap Statistic, Silhouette Method
  • Randomly choose the number of clusters
  • Use the Elbow Method with reduced data
Alternatives like the Gap Statistic and Silhouette Method are used to determine the optimal number of clusters when the Elbow Method is computationally expensive. These methods consider cluster cohesion and separation without requiring extensive computations.

Balancing the _________ in a training dataset is vital to ensure that the model does not become biased towards one particular outcome.

  • classes
  • features
  • models
  • parameters
Balancing the "classes" in a training dataset ensures that the model does not become biased towards one class, leading to a more accurate and fair representation of the data. This is especially crucial in classification tasks.

Overfitting in Polynomial Regression can be visualized by a graph where the polynomial curve fits even the _________ in the training data.

  • accuracy
  • linearity
  • noise
  • stability
A graph showing overfitting in Polynomial Regression will exhibit the polynomial curve fitting even the noise in the training data, not just the underlying trend.

In a case where you have a dataset with numerous outliers, which clustering algorithm would you choose and why?

  • DBSCAN due to robustness to outliers
  • DBSCAN due to sensitivity to noise
  • K-Means due to robustness to noise
  • K-Means due to sensitivity to outliers
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) would be suitable since it's robust to outliers. It can identify dense clusters and leave outliers as unclassified, making it effective in such scenarios.

How can you detect whether a model is overfitting or underfitting the data?

  • By analyzing the training and validation errors
  • By increasing model complexity
  • By looking at the model's visualizations
  • By reducing model complexity
Detecting overfitting or underfitting can be done "by analyzing the training and validation errors." Overfitting shows high training accuracy but low validation accuracy, while underfitting shows poor performance on both.

Describe a scenario where you would use the F1-Score as the main performance metric, and explain why it would be suitable.

  • In a balanced dataset, to ensure model fairness
  • In a scenario where only false negatives are important
  • In an imbalanced dataset, to balance both false positives and false negatives
  • nan
F1-Score is especially suitable for imbalanced datasets, as it balances both Precision and Recall, ensuring that the model does not bias towards the majority class. It gives an equal weight to false positives and false negatives, providing a more holistic evaluation of the model's performance.