A core point in DBSCAN is a point that has at least MinPts within _________ distance from itself.
- Epsilon
- border point
- cluster
- noise point
A core point in DBSCAN has at least MinPts within the Epsilon distance from itself. The Epsilon value defines the radius of the neighborhood around the point, and if there are enough points (MinPts or more) within this radius, the point is considered a core point.
When multicollinearity is present in a dataset, it can make the coefficients of the variables ___________ and hard to interpret.
- insignificant
- reliable
- stable
- unstable
Multicollinearity can make the coefficients of the variables unstable and sensitive to small changes in the data. This makes the interpretation of individual coefficients unreliable and the model difficult to interpret.
Can you list some applications of Machine Learning?
- Finance, Cooking
- Games, Cooking
- Games, Healthcare
- Healthcare, Finance, Marketing
Machine Learning is applied in various domains such as healthcare (for predicting diseases, personalizing treatments), finance (for fraud detection, risk management), marketing (for customer segmentation, targeted advertising), and more. Its versatility has made it an essential tool in modern technology.
You are facing an overfitting problem in a linear model. How would you use Ridge, Lasso, or ElasticNet to address this issue?
- Decrease regularization strength
- Increase regularization strength
- Remove all regularization
- nan
Increasing the regularization strength can help to prevent overfitting by constraining the model complexity and reducing variance.
In the context of Decision Trees, how can overfitting be controlled using pruning techniques?
- By increasing the number of features
- By increasing the tree complexity
- By reducing the training data
- By reducing the tree complexity
Overfitting in Decision Trees can be controlled using pruning techniques by reducing the tree's complexity. By removing branches that add little predictive power, the model becomes less sensitive to noise in the training data and generalizes better to unseen examples.
What is underfitting, and how does it differ from overfitting?
- Enhancing model complexity; similar to overfitting
- Fitting the model too closely to the training data; same as overfitting
- Fitting the model too loosely to the training data; opposite of overfitting
- Reducing model complexity; similar to overfitting
Underfitting is when a model fits the training data too loosely and fails to capture the underlying pattern, the opposite of overfitting, where the model fits too closely.
What are the challenges in imbalanced classification problems?
- Balanced data
- Equal representation of all classes
- No challenges
- Overfitting to the majority class
Imbalanced classification problems, where the classes are not equally represented, can lead to models that are biased towards the majority class. This can result in poor performance on the minority class, requiring special techniques to address.
In a scenario where dimensionality reduction is essential but preserving the original features' meaning is also crucial, how would you approach using PCA?
- You would avoid PCA and use another method
- You would carefully interpret the principal components in terms of original features
- You would perform PCA on a subset of the original features
- You would use PCA without considering the original features' meaning
In this scenario, careful interpretation of the principal components in terms of the original features would be the key to preserve their meaning while still benefiting from dimensionality reduction.
What is the primary goal of clustering algorithms?
- To classify labeled data
- To find patterns and group similar data together
- To predict outcomes
- To solve reinforcement learning problems
The primary goal of clustering algorithms is to find patterns in the data and group similar data points together without using any labeled responses.
You notice that a Decision Tree is providing inconsistent results on different runs. How might you investigate and correct the underlying issue, possibly involving entropy, Gini Index, or pruning techniques?
- Analyze the randomness in splitting and apply consistent pruning techniques
- Change to a different algorithm
- Ignore inconsistent results
- Increase tree depth
Inconsistent results may stem from the randomness in splitting the data. Analyzing this aspect and applying consistent pruning techniques can help create more stable, reproducible results. Attention to the splitting criteria, such as entropy or Gini Index, can further refine the model's behavior.
In classification, when a model is biased toward predicting one class over another, it is known as a(n) ________ problem.
- Clustering
- Imbalanced classification
- Multiclass classification
- Overfitting
When a model consistently predicts one class over another, particularly when the classes are not equally represented, this is known as an imbalanced classification problem.
How does Lasso regression differ from Ridge regression?
- Both use L1 regularization
- Both use L2 regularization
- Lasso uses L1 regularization, Ridge uses L2
- Lasso uses L2 regularization, Ridge uses L1
Lasso (Least Absolute Shrinkage and Selection Operator) regression uses L1 regularization, which can lead to some coefficients being exactly zero, thus performing feature selection. Ridge regression uses L2 regularization, which shrinks the coefficients but doesn't set them to zero. These different regularization techniques define their behavior and application.