If you want to represent both the distribution and density of data, a _____ plot is a good choice.
- Bar
- Line
- Scatter
- Violin
A Violin plot is a good choice to represent both the distribution and density of data. It combines aspects of both a box plot and a density plot, giving a fuller picture of the distribution.
What are the potential downsides of removing outliers from your dataset?
- It always improves the quality of the dataset
- It might discard important information
- It might introduce noise into the dataset
- nan
Removing outliers might discard potentially important information that could significantly influence the analysis results.
A correlation coefficient of 0 implies ________.
- A strong negative relationship
- A strong positive relationship
- An uncertain relationship
- No linear relationship
A correlation coefficient of 0 implies no linear relationship between the variables. However, it doesn't necessarily mean that there is no relationship at all, as the relationship could be non-linear.
In the EDA process, what does 'wrangling' refer to?
- Cleaning and transforming data
- Formulating hypothesis
- Interpreting data
- Visualizing data
Wrangling in the EDA process refers to the cleaning and transforming of data to facilitate subsequent analysis. This could involve addressing missing values, correcting inconsistencies, or reshaping the data structure.
What happens to a model's performance when missing data is not handled correctly?
- It depends on the model.
- It deteriorates.
- It improves.
- It remains the same.
When missing data is not handled correctly, it can distort the underlying data distribution and lead to incorrect model learning, ultimately deteriorating the model's performance.
The 'style' and 'context' functions in Seaborn are used to set the ___________ of the plots.
- aesthetic and context
- axis labels
- layout and structure
- size and color
The 'style' function in Seaborn is used to set the overall aesthetic look of the plot, including background color, grids, and spines. The 'context' function allows you to set the context parameters, which adjust the scale of the plot elements based on the context in which the plot will be presented (e.g., paper, notebook, talk, poster).
How does the choice of the threshold affect the number of identified outliers using the Z-score method?
- A higher threshold identifies more outliers
- A lower threshold identifies more outliers
- It has no effect
- The threshold value is irrelevant in the Z-score method
The lower the threshold, the more data points will exceed it, and thus, more outliers will be identified.
Outliers are _________ observations that lie an abnormal distance from other values in a dataset.
- Anomalous
- Erroneous
- Random
- Statistical
Anomalous is the correct term. Outliers are anomalous observations that lie an abnormal distance from other values in a random sample from a population.
An unusually large or small value in a dataset, which does not follow the general trend of the rest of the data, is known as an ________.
- Aberration
- Anomaly
- Deviation
- Outlier
An unusually large or small value in a dataset, which does not follow the general trend of the rest of the data, is known as an outlier.
In a _____ plot, the width of the "violin" indicates the frequency or density of data.
- Bar
- Box
- Scatter
- Violin
In a Violin plot, the width of the "violin" (or the density plot on each side) varies with the estimated density of data points at a given level. The wider the plot, the higher the density of data points at that value.