Can you explain the impact of different centroid initialization methods on the K-Means clustering results?
- Alters convergence and final cluster formation
- Has no impact
- Increases accuracy but reduces speed
- Increases the number of clusters
Different initialization methods in K-Means can alter the convergence rate and final cluster formation. Poor initialization may lead to suboptimal clustering or slow convergence.
Imagine you need to classify documents but have only a few labeled examples. How would you leverage semi-supervised learning in this scenario?
- Combine trial and error approaches
- Use clustering exclusively
- Utilize both labeled and unlabeled data
- Utilize only the labeled data
In this scenario, Semi-Supervised Learning would leverage both the limited labeled examples and the abundant unlabeled data to create an effective classification model.
What are the assumptions that must be met in Simple Linear Regression?
- Homogeneity, Variability, Linearity
- Independence, Homoscedasticity, Linearity, Normality
- Linearity, Categorization, Independence
- Linearity, Quadratic, Exponential
The assumptions in Simple Linear Regression include Independence (of errors), Homoscedasticity (equal variance), Linearity, and Normality (of errors).
How does LDA differ from Principal Component Analysis (PCA)?
- LDA and PCA have the same goal and method
- LDA focuses on unsupervised learning while PCA focuses on supervised learning
- LDA is concerned with maximizing between-class variance, while PCA focuses on maximizing total variance
- LDA uses Eigenvalues, while PCA uses Eigenvectors
LDA aims to maximize between-class variance and minimize within-class variance for classification, while PCA focuses on "maximizing total variance" without considering class labels. PCA is used mainly for dimensionality reduction and does not consider class separation.
The R-Squared value can be artificially inflated by adding more predictors, but the ________ helps mitigate this issue.
- Adjusted R-Squared
- MAE
- MSE
- RMSE
The R-Squared value can be artificially increased by adding irrelevant predictors. Adjusted R-Squared helps mitigate this by accounting for the number of predictors, penalizing models for including unnecessary complexity. It provides a more balanced evaluation of the model's fit and helps to avoid the trap of overfitting by adding more predictors.
The K-Means clustering algorithm iteratively updates the _________ to minimize the sum of squared distances within each cluster.
- Centroids
- Distance metric
- Learning rate
- Number of clusters
The K-Means algorithm works by iteratively updating the centroids, minimizing the sum of squared distances from each point to its assigned centroid, thus forming cohesive clusters.
How does the choice of Epsilon affect the clustering results in DBSCAN?
- It affects the minimum points in a cluster
- It changes the clustering speed
- It determines the radius of the neighborhood around a point
- It modifies the clustering algorithm's underlying formula
The choice of Epsilon in DBSCAN determines the maximum radius of the neighborhood around a data point. By adjusting this value, one can control how close points must be to form a cluster, affecting the clustering's granularity, shape, and size. It's a crucial parameter to tune for achieving desired clustering results.
A machine learning model is suffering from high computational costs and overfitting. How could dimensionality reduction be implemented to solve these problems?
- Add more features
- Apply PCA or LDA, depending on the data type
- Increase the model's complexity
- Reduce the dataset size
Applying dimensionality reduction techniques like PCA or LDA can significantly reduce the feature space and computational costs without losing important information. This can also help in addressing overfitting by simplifying the model, making it less likely to capture noise in the data. Increasing model complexity or adding more features would exacerbate the problem, and reducing the dataset size may lead to loss of information.
In unsupervised learning, the model learns to find patterns and structures from _________ data, where no specific output values are provided.
- Balanced
- Labelled
- Sparse
- Unlabelled
In unsupervised learning, the model learns from unlabeled data, finding hidden patterns and structures without specific output values or guidance.
How does the Logit function transform the probability in Logistic Regression?
- Maps odds to log-odds
- Maps odds to probability
- Maps probability to log-odds
- Maps probability to odds
The Logit function in Logistic Regression takes a probability and maps it to log-odds. It's the inverse of the Sigmoid function used to model probabilities.
What is the bias-variance tradeoff in Machine Learning?
- A tradeoff between supervised and unsupervised learning
- A tradeoff between the complexity and the size of a model
- A tradeoff between the learning rate and the number of epochs
- A tradeoff between underfitting and overfitting
The bias-variance tradeoff refers to the balancing act between underfitting (high bias, low variance) and overfitting (low bias, high variance). A model with high bias oversimplifies the problem, while high variance tends to overcomplicate it.
Explain the difference between parametric and non-parametric models.
- The ability to update parameters during training
- The flexibility in form
- The number of features used
- The use of hyperparameters
Parametric models assume a specific form for the function they're approximating, such as a linear relationship, and have a fixed number of parameters. Non-parametric models make fewer assumptions about the function's form, often resulting in more flexibility but also requiring more data.