In a situation where you have limited data, how would you decide between using Cross-Validation or Bootstrapping, and why?
- Always use Bootstrapping
- Always use Cross-Validation
- Choose based on computational resources
- Choose based on the model, the nature of the data, and the analysis objectives
Deciding between Cross-Validation and Bootstrapping when dealing with limited data depends on the model, the nature of the data, and the analysis objectives. Cross-Validation provides robust validation by utilizing all data for both training and validation, while Bootstrapping can offer statistical insights. The decision should be tailored to the specific scenario.
Dimensionality reduction can help in mitigating the problem of ___________, which refers to the difficulties of analyzing data in a high-dimensional space.
- multicollinearity
- overfitting
- scaling problems
- the curse of dimensionality
The term "curse of dimensionality" refers to the difficulties that arise when analyzing and organizing data in high-dimensional spaces. Dimensionality reduction can mitigate this problem by reducing the number of dimensions, making the data more manageable.
In Simple Linear Regression, the method of _________ is often used to estimate the coefficients.
- Clustering
- Gradient Descent
- Least Squares
- Neural Networks
The method of least squares is commonly used in Simple Linear Regression to estimate the coefficients by minimizing the sum of squared errors.
What is the F1-Score, and why might you use it instead of Precision and Recall?
- Arithmetic mean of Precision and Recall
- Geometric mean of Precision and Recall
- Harmonic mean of Precision and Recall
- nan
The F1-Score is the harmonic mean of Precision and Recall. It balances both metrics and is particularly useful when you need to seek a balance between Precision and Recall and there is an uneven class distribution.
If the assumptions of Simple Linear Regression are violated, the coefficient estimates may become _________, and predictions may not be reliable.
- Biased
- Efficient
- Improved
- Optimized
If the assumptions of Simple Linear Regression are violated, the coefficient estimates may become biased, leading to unreliable predictions.
In LDA, the goal is to maximize the ___________ variance and minimize the ___________ variance.
- between-class, within-class
- data, features
- features, data
- within-class, between-class
In LDA, the goal is to "maximize the between-class variance and minimize the within-class variance" to find a decision boundary that separates classes.
How can interaction effects be included in a Multiple Linear Regression model?
- By creating new variables for interactions
- By increasing model complexity
- By reducing variables
- By using more data
Interaction effects can be included by creating new variables that represent the product of two interacting variables, allowing for combined effects to be modeled.
In a marketing campaign, you want to predict the likelihood of a customer buying a product. How might the Odds Ratio be useful in interpreting the effect of different variables?
- By quantifying the correlation between variables
- By quantifying the effect of variables on the odds of buying
- By quantifying the effect of variables on the probability of buying
- By quantifying the relationship between input variables
The Odds Ratio can be useful in interpreting the effect of different variables on the odds of buying, allowing marketers to understand which factors have the most significant impact on purchase likelihood.
You're designing a system for image recognition with a need for real-time response. Which approach would be more appropriate: Machine Learning or Deep Learning, and why?
- Both are equally appropriate
- Deep Learning, for its advanced image recognition capabilities
- Machine Learning, for its simpler models
- nan
Deep Learning, particularly Convolutional Neural Networks (CNNs), is highly effective for image recognition and is usually preferred for such tasks.
Which type of Machine Learning algorithm would be best suited for predicting a continuous value?
- Classification
- Clustering
- Regression
- Reinforcement Learning
Regression algorithms are designed to predict continuous values, such as stock prices or temperatures, by learning the relationship between independent and dependent variables.
If two attributes in a Decision Tree have the same entropy, the attribute with the __________ Gini Index would generally be preferred.
- Equal
- Higher
- Lower
- Random
If two attributes in a Decision Tree have the same entropy, the attribute with the lower Gini Index would generally be preferred. A lower Gini Index indicates a purer node and would typically result in a better split.
What is the mathematical criterion that K-Means attempts to minimize, and how does it relate to centroid initialization?
- Maximizing centroid distances to data points
- Maximizing inter-cluster distance
- Minimizing the number of clusters
- Minimizing the sum of squared distances to centroids
K-Means minimizes the sum of squared distances from each point to its assigned centroid. Centroid initialization affects how quickly this criterion is minimized and the quality of the final clusters.