Imagine you're dealing with a classification model. The dataset has a significant amount of missing data that was replaced with the mean. How could this decision have impacted the model's performance?
- It could distort the feature's statistical properties.
- It could increase the model's accuracy.
- It could lead to overfitting.
- It could lead to underfitting.
Replacing missing data with the mean can distort the feature's statistical properties (like variance), which could affect the model's learning and prediction capability.
What is the potential impact of outliers on the analysis of a dataset?
- All of these
- Can affect the statistical significance
- Can influence assumptions of the analysis
- Can lead to incorrect conclusions
Outliers can have significant effects on our conclusions and can affect the basic assumptions of our analyses. They can also impact the statistical significance of the data.
You are conducting a study on the effectiveness of a new drug. Patients rate their pain levels before and after the treatment on a scale of 1-10. What type of data are these ratings?
- Continuous data
- Nominal data
- Ordinal data
- Ratio data
Patients' pain levels are ordinal data as they're categorized into an order (1-10) but the intervals between the levels might not be equivalent.
How many variables can a heatmap typically visualize at once?
- Any number
- Four
- Three
- Two
A heatmap can visualize any number of variables at once. Each cell in the heatmap corresponds to a combination of categories from all the variables.
Which Python library is specifically useful for creating interactive plots?
- NumPy
- Plotly
- SciPy
- Seaborn
Plotly is a Python graphing library that makes interactive, publication-quality graphs online. It's perfect for interactive dashboards, data analysis, and visualizations.
Which key metric of model evaluation is most affected by mishandling missing data?
- Accuracy
- F1 Score
- Precision
- Recall
All metrics could be affected, but the accuracy of the model is often the most affected by mishandling of missing data. Incorrect imputation of missing values can lead to the model learning incorrect patterns, resulting in inaccurate predictions.
How can pairwise deletion affect the correlation between variables?
- It can cause overfitting
- It can deflate the correlation
- It can inflate the correlation
- It can lead to underfitting
Pairwise deletion might inflate the correlation between variables. This is because different pairs of data are used to compute each correlation, which might lead to inconsistencies and overly optimistic estimates of the correlations.
A _____ is a visualization tool that displays pairwise relationships in a dataset.
- bar chart
- histogram
- pairplot
- scatter plot
A pairplot is a visualization tool that displays pairwise relationships in a dataset. It shows all bivariate relationships between combinations of variables in a grid format, making it easy to visualize and compare all relationships simultaneously.
The Central Limit Theorem states that the sum of a large number of independent and identically distributed variables will approximately follow a _____ Distribution, regardless of the shape of the original distribution.
- Binomial
- Normal
- Poisson
- Uniform
The Central Limit Theorem states that the sum of a large number of independent and identically distributed variables will approximately follow a Normal Distribution, regardless of the shape of the original distribution.
You are using a box plot to analyze a dataset and observe that the upper whisker is much longer than the lower whisker. What could this indicate about your data?
- Data has negative skewness
- Data has positive skewness
- Data is evenly distributed
- Data is normally distributed
If the upper whisker in a box plot is much longer than the lower whisker, it can indicate that the data has positive skewness, meaning there are a number of data points greater than the median.