Your EDA reveals a non-normal distribution of data in your dataset. How might this insight affect your choice of machine learning models or algorithms?
- You should always normalize your data
- You should use only non-parametric models
- You should use only unsupervised learning models
- Your choice of ML models might be influenced, as some models make certain assumptions about the data distribution
The distribution of data can influence the choice of machine learning models or algorithms. Some models, such as linear and logistic regression, make certain assumptions about the data distribution (i.e., they expect the input or output to be normally distributed). If these assumptions are violated, the model may perform poorly. Therefore, understanding the data distribution can guide you in choosing the most appropriate models or in deciding whether to transform your data.
Loading...
Related Quiz
- Under what conditions would the median be a better measure of central tendency than the mean?
- You have a dataset with many tied ranks. Which correlation coefficient would you prefer to use, and why?
- In the context of handling missing data, what does 'imputation' mean?
- Which machine learning models are more susceptible to the issue of feature redundancy?
- Outliers are _________ observations that lie an abnormal distance from other values in a dataset.