In PCA, if an Eigenvalue is close to zero, it indicates that the corresponding Eigenvector may ________.
- be a principal component
- be discarded
- be of high magnitude
- explain high variance
If an Eigenvalue in PCA is close to zero, it means that the corresponding Eigenvector (principal direction) may "be discarded" as it explains very little variance within the data. This can help in reducing dimensionality while retaining essential information.
A colleague is assessing a regression model using only the Adjusted R-Squared. What considerations or additional metrics might you suggest, and why?
- Include MAE; because it's less sensitive to outliers
- Include MSE; because it's the standard metric
- Include RMSE; because it's more interpretable
- Include both RMSE and MAE; for a more comprehensive assessment
While Adjusted R-Squared is useful, including both Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) provides a more comprehensive assessment. RMSE can help in understanding how the model is penalizing larger errors, and MAE can give an indication of the model's sensitivity to outliers. Together, they offer a more nuanced view of the model's performance.
The ________ technique in classification helps in enhancing the model's ability to generalize by using different subsets of data during training.
- Clustering
- Cross-validation
- Feature extraction
- Overfitting
Cross-validation is a technique where the dataset is partitioned into different subsets (folds), and the model is trained and tested on different combinations of these folds. It helps in assessing the model's ability to generalize to unseen data.
Consider a situation where you're applying DBSCAN to a high-dimensional dataset. What challenges might you face, and how could you address them?
- All of the above
- Difficulty in visualizing; Reduce dimensionality
- High computational cost; Optimize the algorithm
- Risk of overfitting; Increase MinPts
High-dimensional data can present several challenges in clustering, including the risk of overfitting, difficulty in visualization, and high computational costs. Increasing MinPts can help prevent overfitting, while dimensionality reduction techniques like PCA can aid visualization. Optimizing the algorithm can help to reduce computational demands.
How does clustering differ from classification?
- Clustering and Classification are the same
- Clustering is supervised; Classification is unsupervised
- Clustering is unsupervised; Classification is supervised
- Clustering uses regression
Clustering is an unsupervised learning technique that groups similar data points, whereas Classification is a supervised learning technique that assigns predefined labels to instances.
What are the limitations of Deep Learning as compared to other Machine Learning techniques?
- Easier interpretability and requires more data
- More interpretable and less efficient
- Requires less data and is more complex
- Requires more data and is often less interpretable
Deep Learning typically requires more data for effective training and often results in models that are less interpretable compared to traditional Machine Learning models.
Can you explain the assumptions underlying linear regression?
- Independence of features, Normality of target variable, Linearity of relationship, Constant variance
- Normal distribution of errors, Linearity of relationship, Independence of residuals, Constant variance
- Normality of residuals, Constant variance, Independence of residuals, Linearity of relationship
- Normality of residuals, Linearity of relationship, Multicollinearity, Independence of features
Linear regression assumes that the relationship between the dependent and independent variables is linear, errors are normally distributed, residuals are independent, and the variance of residuals is constant across all levels of the independent variables. These assumptions guide the model's performance and interpretation.
How does Polynomial Regression differ from Simple Linear Regression?
- It fits a polynomial curve
- It fits a straight line
- It is used only for classification
- It uses more variables
While Simple Linear Regression fits a straight line to the data, Polynomial Regression fits a polynomial curve, allowing for more flexibility in modeling non-linear relationships.
Autonomous vehicles rely on Machine Learning algorithms for tasks like ____________ and ____________.
- Disease Prediction, Weather Forecasting
- Object Detection, Path Planning
- Risk Management, Drug Development
- Text Classification, Fraud Detection
Autonomous vehicles use Machine Learning for Object Detection and Path Planning, recognizing obstacles and determining optimal routes.
In what scenario would the AUC be a more informative metric than simply using Accuracy?
- When the class distribution is balanced
- When the class distribution is imbalanced
- When the model has only one class
- nan
The AUC (Area Under the Curve) of the ROC Curve can be more informative than Accuracy when dealing with imbalanced class distribution. It provides a more holistic measure of the model's ability to discriminate between positive and negative classes, unlike Accuracy, which may be skewed.