If the relationship between variables in a dataset is best fit by a curve rather than a line, you might use _________ regression.
- Linear
- Logistic
- Polynomial
- Ridge
If the relationship between variables is best fit by a curve rather than a line, Polynomial regression would be used. It can model nonlinear relationships by including polynomial terms in the equation.
You have two models with similar Accuracy but different Precision and Recall values. How would you decide which model is better for a given application?
- Choose based on the specific application's needs and tolerance for false positives/negatives
- Choose the one with higher Precision
- Choose the one with higher Recall
- nan
When models have similar Accuracy but different Precision and Recall, the choice between them should be based on the specific application's needs. If false positives are more costly, prioritize Precision; if false negatives are more crucial, prioritize Recall.
In what situations would it be appropriate to use Logistic Regression with the Logit link function?
- All regression problems
- Binary classification with a nonlinear relationship between predictors
- Binary classification with linear relationship between predictors
- Multi-class classification
Logistic Regression with the Logit link function is particularly suited for binary classification problems where there is a linear relationship between the predictors and the log-odds of the response.
One method to mitigate multicollinearity is to apply ___________ regression, which adds a penalty term to the loss function.
- Lasso
- Logistic
- Polynomial
- Ridge
Ridge regression is a technique that can mitigate multicollinearity by adding a penalty term to the loss function. The penalty term helps in reducing the effect of correlated variables, leading to more stable coefficients.
How is the Logit function related to Logistic Regression?
- It is a type of cost function
- It is an alternative name for Logistic Regression
- It's the inverse of the Sigmoid function and maps probabilities to log-odds
- It's used for multi-class classification
In Logistic Regression, the Logit function is the inverse of the Sigmoid function. It maps probabilities to log-odds and forms the link function in logistic modeling.
In Decision Trees, the __________ is used to measure the impurity of a data partition or set.
- Accuracy
- Bias
- Gini Index
- Training set
In Decision Trees, the Gini Index is used to measure the impurity or disorder of a data partition or set. A lower Gini Index value indicates a purer node, and it is used to determine the best splits.
What is the goal of using entropy as a criterion in Decision Trees?
- Increase Complexity
- Increase Efficiency
- Measure Purity
- Predict Outcome
The goal of using entropy is to measure the purity or impurity of a split, guiding the selection of the best attribute for splitting.
How is the coefficient of determination (R-Squared) used in regression analysis?
- To describe the correlation between variables
- To detect multicollinearity
- To measure the goodness of fit of the model
- To select the best features
The coefficient of determination (R-Squared) is used to measure how well the regression model fits the observed data. It represents the proportion of variation in the dependent variable that is explained by the independent variables.
Suppose you're working on a dataset with both linear and nonlinear features predicting the target variable. What regression approach might you take?
- Combine Linear and Polynomial Regression
- Linear Regression only
- Logistic Regression
- Polynomial Regression only
When dealing with a dataset with both linear and nonlinear features, combining Linear and Polynomial Regression can be an effective approach. This allows the model to capture both the linear and nonlinear relationships in the data, providing a more accurate representation of the underlying patterns.
Can you explain the main types of clustering in Unsupervised Learning?
- Divisive, K-Means, Gaussian Mixture
- Hierarchical, Divisive
- Hierarchical, K-Means, Gaussian Mixture
- K-Means, Hierarchical, Neural Network
Clustering in Unsupervised Learning refers to grouping data points that are similar to each other. The main types include Hierarchical (building nested clusters), K-Means (partitioning data into 'K' clusters), and Gaussian Mixture (using probability distributions to form clusters).