Suppose your machine learning model shows a significant shift in performance when transitioning from the training set to the test set. How could mishandling missing data contribute to this issue?
- It may have caused an imbalance in the data distribution between the sets.
- It may have caused overfitting.
- It may have led to the model learning irrelevant patterns.
- It may have led to underfitting.
If the handling of missing data is not consistent between the training and test sets, it could lead to an imbalance in data distribution between the two sets, causing the model's performance to shift.
Loading...
Related Quiz
- What are the key components to focus on during the 'communicate' step in EDA?
- Which key metric of model evaluation is most affected by mishandling missing data?
- How can extreme outliers impact the interpretation of the skewness of a dataset?
- How can regularization techniques contribute to feature selection?
- A correlation matrix is a type of _____ matrix, which measures the linear relationships between variables.