What is the effect of redundant features on a machine learning model?
- All of the above
- They can lead to overfitting
- They can reduce the interpretability of the model
- They can slow down the learning process
Redundant features can lead to overfitting, slow down the learning process, and reduce the interpretability of the model.
You are required to visualize the density of data on a single continuous variable. Which type of plot would you use and why?
- Scatter plot
- Line graph
- Kernel Density Plot
- Bar graph
A Kernel Density Plot is the best option to visualize the density of data on a single continuous variable. This plot provides a smooth curve that gives a clear idea about the density of the distribution.
What is the key purpose of Predictive Modeling in data analysis?
- To confirm a pre-existing hypothesis
- To generate future data
- To make predictions based on the data
- To understand the underlying structure of the data
The key purpose of predictive modeling is to make predictions based on the data. Using various statistical techniques and machine learning algorithms, predictive modeling enables analysts to predict future outcomes based on historical data.
What are the limitations of using a modified Z-score for outlier detection?
- It assumes data is normally distributed
- It cannot handle missing values
- It is sensitive to extreme values
- It uses median instead of mean, which may not always be appropriate
A limitation of the modified Z-score is that it uses the median and MAD instead of the mean and standard deviation, which may not always be appropriate, especially for normally distributed data.
How are correlation coefficients affected when transformations are applied to the data?
- Correlation coefficients can change
- Correlation coefficients can decrease
- Correlation coefficients can increase
- Transformations do not affect correlation coefficients
Correlation coefficients can change when transformations are applied to the data. The exact effect depends on the transformation and the nature of the data. Transformations can linearize relationships, reduce skewness, or spread out data that is concentrated at a single point, all of which can change the correlation coefficient.
For a multimodal distribution, which measure of central tendency may not be very informative?
- Mean
- Median
- Mode
- nan
For a multimodal distribution (distribution with more than one peak), the "Mean" may not be very informative. In such distributions, the mean may not be representative of any central value, as it can be influenced by the multiple peaks in the data, leading to an unrepresentative measure of the center.
A ___________ skewness indicates that the data distribution is skewed to the left.
- Any of these
- Negative
- Positive
- Zero
Negative skewness refers to a distribution where the left tail is longer or fatter than the right tail. In such distributions, the majority of the values (including the median and the mode) tend to be greater than the mean.
In what scenario would you choose standardization over Min-Max scaling?
- When the algorithm requires features to be on the same scale and the data is normally distributed
- When the maximum and minimum values are unknown
- When there are no outliers in the data
- When you need to normalize the distribution
You would choose standardization over Min-Max scaling when the algorithm requires features to be on the same scale and the data is normally distributed. Standardization does not bound values to a specific range like Min-Max scaling, which can be useful for algorithms that do not require input features to be within a certain range.
What is the effect of standardization (z-score) on the mean and standard deviation of the dataset?
- It changes the mean to 0 and standard deviation to 1
- It changes the mean to 1 and standard deviation to 0
- It changes the mean to the median of the dataset and standard deviation to 1
- It doesn't affect the mean and standard deviation
The effect of standardization on a dataset is that it changes the mean to 0 and standard deviation to 1. After standardization, the dataset will have properties of a standard normal distribution with mean=0 and standard deviation=1.
Improper handling of missing data can affect the ________ of a model, thereby impacting its ability to generalize on unseen data.
- bias-variance tradeoff
- overfitting
- regularization
- underfitting
Improper handling of missing data can adversely affect the bias-variance tradeoff of a model. This can lead to issues such as overfitting or underfitting, which impact the model's ability to generalize to unseen data.