How does negative kurtosis affect the tails of a data distribution?
- It has no effect on the tails of the distribution.
- It makes the distribution perfectly symmetrical.
- It makes the tails of the distribution heavier.
- It makes the tails of the distribution lighter.
Negative kurtosis, also known as platykurtic kurtosis, makes the tails of the data distribution lighter, indicating fewer extreme outliers. The distribution is flatter or more spread out than a normal distribution.
What type of plot is often used for visualizing the relationship between two continuous variables?
- Bar plot
- Box plot
- Histogram
- Scatter plot
Scatter plots are ideal for visualizing the relationship between two continuous variables. Each point in the scatter plot corresponds to the values of two variables.
What is the process of removing an entire row when any single data point within it is missing called?
- Listwise Deletion
- Mean Imputation
- Pairwise Deletion
- Regression Imputation
The process of removing an entire row when any single data point within it is missing is called 'Listwise Deletion'. Also known as 'Complete Case Analysis', this technique is straightforward and fast, but it can potentially discard valuable data and introduce bias if the missingness is not completely at random.
What functionality does the Seaborn library add over Matplotlib?
- 3D plotting
- Interactive plotting
- Real-time plotting
- Statistical plotting
While Matplotlib is a powerful library for creating a wide range of plots, Seaborn adds on to this by providing a number of high-level statistical plotting capabilities, allowing users to create more informative and attractive visualizations with fewer lines of code.
Which measure of central tendency can be used for both quantitative and qualitative data?
- Mean
- Median
- Mode
- nan
The "Mode" is the measure of central tendency that can be used for both quantitative and qualitative data. It is the value that appears most frequently in a data set, and it is the only measure of central tendency that can be used with nominal data.
Which method for dealing with missing data might introduce bias if the data is not missing completely at random?
- Listwise Deletion
- Mean/Median/Mode Imputation
- Pairwise Deletion
- Regression Imputation
Mean/Median/Mode Imputation might introduce bias if the data is not missing completely at random. If missing values have some systematic patterns, replacing them with mean, median, or mode might lead to incorrect estimation of variability and biased results.
You find that both Z-score and modified Z-score methods give different sets of outliers for the same dataset. How will you reconcile this?
- Assume the Z-score method is correct
- Assume the modified Z-score method is correct
- Consider the intersection of both methods
- Further inspect the data and the assumptions of each method
When two methods give different sets of outliers, it's best to further inspect the data and the assumptions of each method before drawing conclusions.
What is the underlying JavaScript library that Plotly uses to render its graphics?
- D3.js
- Node.js
- React.js
- jQuery
Plotly uses D3.js (Data-Driven Documents) under the hood to render its graphics. D3.js is a JavaScript library for producing dynamic and interactive data visualizations in web browsers.
Readability in data visualization refers to how easily the audience can __________.
- Analyze the underlying code
- Download the graph
- Interact with the graph
- Understand the represented data
Readability in data visualization refers to how easily the audience can understand the represented data. This includes the clarity of text elements (labels, title, caption), color scheme, and whether the choice of plot type makes sense for the represented data.
In the context of handling missing data, what does 'imputation' mean?
- Adding artificial data
- Deleting data points
- Filling in missing data with substituted values
- Transforming data
In the context of handling missing data, 'imputation' refers to the process of filling in missing data with substituted values. These values can be determined in a variety of ways such as using measures of central tendency (mean, median, mode), predictive models, or other techniques.