You have two models with similar Accuracy but different Precision and Recall values. How would you decide which model is better for a given application?
- Choose based on the specific application's needs and tolerance for false positives/negatives
- Choose the one with higher Precision
- Choose the one with higher Recall
- nan
When models have similar Accuracy but different Precision and Recall, the choice between them should be based on the specific application's needs. If false positives are more costly, prioritize Precision; if false negatives are more crucial, prioritize Recall.
In what situations would it be appropriate to use Logistic Regression with the Logit link function?
- All regression problems
- Binary classification with a nonlinear relationship between predictors
- Binary classification with linear relationship between predictors
- Multi-class classification
Logistic Regression with the Logit link function is particularly suited for binary classification problems where there is a linear relationship between the predictors and the log-odds of the response.
One method to mitigate multicollinearity is to apply ___________ regression, which adds a penalty term to the loss function.
- Lasso
- Logistic
- Polynomial
- Ridge
Ridge regression is a technique that can mitigate multicollinearity by adding a penalty term to the loss function. The penalty term helps in reducing the effect of correlated variables, leading to more stable coefficients.
How is the Logit function related to Logistic Regression?
- It is a type of cost function
- It is an alternative name for Logistic Regression
- It's the inverse of the Sigmoid function and maps probabilities to log-odds
- It's used for multi-class classification
In Logistic Regression, the Logit function is the inverse of the Sigmoid function. It maps probabilities to log-odds and forms the link function in logistic modeling.
In Decision Trees, the __________ is used to measure the impurity of a data partition or set.
- Accuracy
- Bias
- Gini Index
- Training set
In Decision Trees, the Gini Index is used to measure the impurity or disorder of a data partition or set. A lower Gini Index value indicates a purer node, and it is used to determine the best splits.
What are some advanced techniques to prevent overfitting in a deep learning model?
- Regularization, Dropout, Early Stopping, Data Augmentation
- Regularization, Dropout, Early Stopping, Over-sampling
- Regularization, Dropout, Late Stopping, Data Augmentation
- Regularization, Over-sampling, Early Stopping, Data Reduction
Advanced techniques such as "Regularization, Dropout, Early Stopping, and Data Augmentation" help in preventing overfitting by adding constraints, randomly deactivating neurons, halting training, and expanding the dataset, respectively.
How is the coefficient of determination (R-Squared) used in regression analysis?
- To describe the correlation between variables
- To detect multicollinearity
- To measure the goodness of fit of the model
- To select the best features
The coefficient of determination (R-Squared) is used to measure how well the regression model fits the observed data. It represents the proportion of variation in the dependent variable that is explained by the independent variables.
Suppose you're working on a dataset with both linear and nonlinear features predicting the target variable. What regression approach might you take?
- Combine Linear and Polynomial Regression
- Linear Regression only
- Logistic Regression
- Polynomial Regression only
When dealing with a dataset with both linear and nonlinear features, combining Linear and Polynomial Regression can be an effective approach. This allows the model to capture both the linear and nonlinear relationships in the data, providing a more accurate representation of the underlying patterns.
Can you explain the main types of clustering in Unsupervised Learning?
- Divisive, K-Means, Gaussian Mixture
- Hierarchical, Divisive
- Hierarchical, K-Means, Gaussian Mixture
- K-Means, Hierarchical, Neural Network
Clustering in Unsupervised Learning refers to grouping data points that are similar to each other. The main types include Hierarchical (building nested clusters), K-Means (partitioning data into 'K' clusters), and Gaussian Mixture (using probability distributions to form clusters).
You have a Multiple Linear Regression model that is performing poorly, and you suspect multicollinearity is the issue. How would you confirm this suspicion and rectify the problem?
- Add more features
- Check the VIF and apply regularization
- Guess the correlated variables
- Increase the number of observations
You can confirm multicollinearity by checking the Variance Inflation Factor (VIF) for the variables. If high VIF values are found, applying regularization methods like Ridge regression or feature selection techniques can help rectify the problem by penalizing or removing correlated variables.