The ________ technique in classification helps in enhancing the model's ability to generalize by using different subsets of data during training.

  • Clustering
  • Cross-validation
  • Feature extraction
  • Overfitting
Cross-validation is a technique where the dataset is partitioned into different subsets (folds), and the model is trained and tested on different combinations of these folds. It helps in assessing the model's ability to generalize to unseen data.

Consider a situation where you're applying DBSCAN to a high-dimensional dataset. What challenges might you face, and how could you address them?

  • All of the above
  • Difficulty in visualizing; Reduce dimensionality
  • High computational cost; Optimize the algorithm
  • Risk of overfitting; Increase MinPts
High-dimensional data can present several challenges in clustering, including the risk of overfitting, difficulty in visualization, and high computational costs. Increasing MinPts can help prevent overfitting, while dimensionality reduction techniques like PCA can aid visualization. Optimizing the algorithm can help to reduce computational demands.

Explain the process of selecting the number of principal components in PCA.

  • By choosing an arbitrary number
  • By selecting all eigenvectors
  • By using only the first eigenvector
  • By using the elbow method and the cumulative explained variance
The number of principal components in PCA can be selected by considering the cumulative explained variance and looking for an "elbow" in the plot, where adding more components does not significantly increase the explained variance.

The regularization parameter 'C' in SVM controls the trade-off between maximizing the margin and minimizing the _________.

  • Kernel size
  • Margin
  • Misclassification
  • Variance
The 'C' parameter controls the trade-off between maximizing the margin and minimizing misclassification.

How does clustering differ from classification?

  • Clustering and Classification are the same
  • Clustering is supervised; Classification is unsupervised
  • Clustering is unsupervised; Classification is supervised
  • Clustering uses regression
Clustering is an unsupervised learning technique that groups similar data points, whereas Classification is a supervised learning technique that assigns predefined labels to instances.

What are the limitations of Deep Learning as compared to other Machine Learning techniques?

  • Easier interpretability and requires more data
  • More interpretable and less efficient
  • Requires less data and is more complex
  • Requires more data and is often less interpretable
Deep Learning typically requires more data for effective training and often results in models that are less interpretable compared to traditional Machine Learning models.

Can you explain the assumptions underlying linear regression?

  • Independence of features, Normality of target variable, Linearity of relationship, Constant variance
  • Normal distribution of errors, Linearity of relationship, Independence of residuals, Constant variance
  • Normality of residuals, Constant variance, Independence of residuals, Linearity of relationship
  • Normality of residuals, Linearity of relationship, Multicollinearity, Independence of features
Linear regression assumes that the relationship between the dependent and independent variables is linear, errors are normally distributed, residuals are independent, and the variance of residuals is constant across all levels of the independent variables. These assumptions guide the model's performance and interpretation.

How does Polynomial Regression differ from Simple Linear Regression?

  • It fits a polynomial curve
  • It fits a straight line
  • It is used only for classification
  • It uses more variables
While Simple Linear Regression fits a straight line to the data, Polynomial Regression fits a polynomial curve, allowing for more flexibility in modeling non-linear relationships.

Autonomous vehicles rely on Machine Learning algorithms for tasks like ____________ and ____________.

  • Disease Prediction, Weather Forecasting
  • Object Detection, Path Planning
  • Risk Management, Drug Development
  • Text Classification, Fraud Detection
Autonomous vehicles use Machine Learning for Object Detection and Path Planning, recognizing obstacles and determining optimal routes.

In what scenario would the AUC be a more informative metric than simply using Accuracy?

  • When the class distribution is balanced
  • When the class distribution is imbalanced
  • When the model has only one class
  • nan
The AUC (Area Under the Curve) of the ROC Curve can be more informative than Accuracy when dealing with imbalanced class distribution. It provides a more holistic measure of the model's ability to discriminate between positive and negative classes, unlike Accuracy, which may be skewed.