You have built a Logistic Regression model, but the link test indicates that the Logit link function may not be appropriate. What could be the issue?
- Incorrect loss function
- Multicollinearity
- Non-linearity between predictors and log-odds
- Overfitting
If the Logit link function is not appropriate, it might indicate that there is a non-linear relationship between the predictors and the log-odds of the response, violating the assumptions of Logistic Regression.
You notice that your KNN model is highly sensitive to outliers. What might be causing this, and how could the choice of K and distance metric help in alleviating this issue?
- Choose a larger K and an appropriate distance metric to mitigate sensitivity
- Choose a small K and ignore outliers
- Focus only on the majority class
- Outliers have no effect
Choosing a larger K and an appropriate distance metric can help mitigate the sensitivity to outliers, as it would reduce the influence of individual data points.
Explain how weighting the contributions of the neighbors can improve the KNN algorithm's performance.
- Allows more influence from nearer neighbors
- Improves sensitivity to outliers
- Increases bias
- Reduces complexity
Weighting the contributions of the neighbors allows nearer neighbors to have more influence on the prediction, often leading to improved performance in KNN.
Can you differentiate between Logistic Regression and K-Nearest Neighbors (KNN) in terms of use case and functionality?
- LR is for classification, KNN for classification; LR uses probability, KNN uses distance
- LR is for classification, KNN for regression; LR uses distance, KNN uses probability
- LR is for classification, KNN for regression; LR uses probability, KNN uses distance
- LR is for regression, KNN for classification; LR uses distance, KNN uses probability
Logistic Regression is used for classification and models the probability of a binary outcome. KNN is also used for classification but works by considering the 'K' nearest data points. The fundamental difference lies in the approach: LR uses a logistic function, while KNN uses distance metrics.
Your K-Means clustering algorithm is converging to a local minimum. What role might centroid initialization play in this, and how could you address it?
- Increase the number of clusters
- Initialize centroids based on labels
- Poor initialization; Try multiple random initializations
- Use a fixed number of centroids
Converging to a local minimum in K-Means is often due to poor initialization. Running the algorithm multiple times with different random initializations can help avoid local minima and lead to a more globally optimal solution.
How does LDA specifically maximize between-class variance while minimizing within-class variance?
- By finding the eigenvectors of the scatter matrices
- By finding the vectors that maximize the ratio of between-class scatter to within-class scatter
- By setting thresholds for class labels
- By using gradient descent
LDA specifically maximizes between-class variance and minimizes within-class variance by "finding the vectors that maximize the ratio of between-class scatter to within-class scatter." This ensures optimal class separation.
How does Machine Learning contribute to the overall goals of Artificial Intelligence?
- By focusing only on neural networks
- By limiting the scope of AI
- By providing algorithms that can learn and adapt from data
- By reducing the need for data
Machine Learning contributes to AI by providing algorithms that can learn and adapt from data, allowing for intelligent decision-making and pattern recognition.
The performance of an LDA model can be evaluated using ___________, which considers both within-class and between-class variances.
- accuracy metrics
- error rate
- feature selection
- principal components
"Accuracy metrics" that consider both within-class and between-class variances can be used to evaluate the performance of an LDA model. It gives a comprehensive view of how well the model has separated the classes.
In K-Means clustering, the algorithm iteratively assigns each data point to the nearest _______, recalculating the centroids until convergence.
- Centroid
- Cluster
- Data Point
- Distance Metric
In K-Means, the algorithm assigns each data point to the nearest centroid and recalculates the centroids until convergence.
You are working on a project where you have an abundance of features. How do you decide which features to include in your model and why?
- Apply feature selection techniques
- Randomly pick features
- Use all features
- Use only numerical features
Applying feature selection techniques like mutual information, correlation-based methods, or tree-based methods helps in removing irrelevant or redundant features. This enhances the model's performance by reducing overfitting and improving interpretability.