Imagine you are dealing with a large dataset where outliers are sporadically distributed across multiple variables. How would you decide which outlier handling method to use?
- Apply different methods for different variables
- Use removal for all variables
- Use transformation for all variables
- nan
The best approach would be to apply different methods for different variables. The method of handling outliers may vary depending on the nature of the variable and the cause of the outliers.
Loading...
Related Quiz
- If missingness depends on unobserved data, the missing data mechanism is usually categorized as __________.
- When applying multiple imputation, increasing the number of imputations can help reduce the ____________.
- ________ correlation is more appropriate when dealing with ordinal variables.
- The _______ method of feature selection involves removing features one by one until the removal of further features decreases model accuracy.
- Suppose you have a data set with many missing values and outliers. In which step of the EDA process would you primarily deal with these issues?