Can you discuss the geometric interpretation of Eigenvectors in PCA?

  • They align with the mean of the data
  • They define the direction of maximum variance
  • They define the scaling of the data
  • They represent clusters in the data
Geometrically, eigenvectors in PCA define the direction of maximum variance in the data. They are the axes along which the original data is projected, transforming it into a new coordinate system where variance is maximized.

How does Machine Learning contribute to the overall goals of Artificial Intelligence?

  • By focusing only on neural networks
  • By limiting the scope of AI
  • By providing algorithms that can learn and adapt from data
  • By reducing the need for data
Machine Learning contributes to AI by providing algorithms that can learn and adapt from data, allowing for intelligent decision-making and pattern recognition.

How does LDA specifically maximize between-class variance while minimizing within-class variance?

  • By finding the eigenvectors of the scatter matrices
  • By finding the vectors that maximize the ratio of between-class scatter to within-class scatter
  • By setting thresholds for class labels
  • By using gradient descent
LDA specifically maximizes between-class variance and minimizes within-class variance by "finding the vectors that maximize the ratio of between-class scatter to within-class scatter." This ensures optimal class separation.

The performance of an LDA model can be evaluated using ___________, which considers both within-class and between-class variances.

  • accuracy metrics
  • error rate
  • feature selection
  • principal components
"Accuracy metrics" that consider both within-class and between-class variances can be used to evaluate the performance of an LDA model. It gives a comprehensive view of how well the model has separated the classes.

What is the difference between simple linear regression and multiple linear regression?

  • Number of dependent variables
  • Number of equations
  • Number of independent variables
  • Number of observations
Simple linear regression involves one independent variable to predict a dependent variable, whereas multiple linear regression uses two or more independent variables for prediction. The inclusion of more variables in multiple linear regression allows for more complex models and can lead to a better understanding of the relationships between variables.

What are some common applications for each of the four types of Machine Learning: Supervised, Unsupervised, Semi-Supervised, and Reinforcement?

  • Specific to finance
  • Specific to healthcare
  • Specific to manufacturing
  • Varies based on the problem domain
The applications for these types of Machine Learning vary and can be tailored to various problem domains, not confined to specific industries.

Explain how Ridge and Lasso handle multicollinearity among the features.

  • Both eliminate correlated features
  • Both keep correlated features
  • Ridge eliminates correlated features; Lasso keeps them
  • Ridge keeps correlated features; Lasso eliminates them
Ridge regularization keeps correlated features but shrinks coefficients; Lasso can eliminate some by setting coefficients to zero.

Describe the relationship between the Logit function, Odds Ratio, and the likelihood function in Logistic Regression.

  • The Logit function is used for multi-class, Odds Ratio for binary, likelihood for regression
  • The Logit function maps probabilities to log-odds, Odds Ratio quantifies effect on odds, likelihood function is used for estimation
  • The Logit function maps probabilities to odds, Odds Ratio quantifies effect on odds, likelihood function maximizes probabilities
  • They are unrelated
In Logistic Regression, the Logit function maps probabilities to log-odds, the Odds Ratio quantifies the effect of predictors on odds, and the likelihood function is used to estimate the model parameters by maximizing the likelihood of observing the given data.

What is overfitting, and why is it a problem in Machine Learning models?

  • Fitting a model too loosely to training data
  • Fitting a model too well to training data, ignoring generalization
  • Ignoring irrelevant features
  • Including too many variables
Overfitting occurs when a model fits the training data too well, capturing noise rather than the underlying pattern. This leads to poor generalization to new data, resulting in suboptimal predictions on unseen data.

You've been asked to optimize the features for a given model. What strategies might you use, and why?

  • Both feature engineering and scaling
  • Feature engineering
  • Feature scaling
  • Random feature selection
Feature engineering involves creating new features or transforming existing ones to better represent the underlying patterns. Feature scaling, such as normalization or standardization, helps to standardize the range of features, enhancing the model's ability to learn. Both strategies together contribute to optimizing the model by improving convergence and interpretability.

You are working on a project where you have an abundance of features. How do you decide which features to include in your model and why?

  • Apply feature selection techniques
  • Randomly pick features
  • Use all features
  • Use only numerical features
Applying feature selection techniques like mutual information, correlation-based methods, or tree-based methods helps in removing irrelevant or redundant features. This enhances the model's performance by reducing overfitting and improving interpretability.

In K-Means clustering, the algorithm iteratively assigns each data point to the nearest _______, recalculating the centroids until convergence.

  • Centroid
  • Cluster
  • Data Point
  • Distance Metric
In K-Means, the algorithm assigns each data point to the nearest centroid and recalculates the centroids until convergence.