Your EDA reveals a non-normal distribution of data in your dataset. How might this insight affect your choice of machine learning models or algorithms?

  • You should always normalize your data
  • You should use only non-parametric models
  • You should use only unsupervised learning models
  • Your choice of ML models might be influenced, as some models make certain assumptions about the data distribution
The distribution of data can influence the choice of machine learning models or algorithms. Some models, such as linear and logistic regression, make certain assumptions about the data distribution (i.e., they expect the input or output to be normally distributed). If these assumptions are violated, the model may perform poorly. Therefore, understanding the data distribution can guide you in choosing the most appropriate models or in deciding whether to transform your data.
Add your answer
Loading...

Leave a comment

Your email address will not be published. Required fields are marked *