While analyzing a dataset using a box plot, you notice that there are several data points plotted as circles. What might these circles represent?
- Data within the interquartile range
- Data within the whiskers
- Median values
- Outliers
In a box plot, data points plotted as circles often represent outliers.
What is the key difference between 'removal' and 'transformation' of outliers?
- Removal changes the data distribution, while transformation does not
- Removal deals with extreme values, while transformation does not
- Removal discards outliers, while transformation modifies their values
- Removal is a type of data cleaning, while transformation is not
The key difference between 'removal' and 'transformation' of outliers is that removal discards outliers from the dataset, while transformation modifies the values of outliers to reduce their impact.
What is an outlier in the context of Exploratory Data Analysis?
- A data point that falls outside of the normal range
- A data point that is a duplicate
- A data point that is missing
- A frequently occurring data point
In statistics, an outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In simple terms, an outlier is a value that is significantly different from other similar values.
You're working with a data set where a few observations are vastly different from the rest. Which method, Z-score or IQR, would be more robust to use for outlier detection?
- Either would work equally well
- IQR
- Neither would be effective
- Z-score
The IQR method is more robust than Z-score for outlier detection in this scenario, as Z-scores can be significantly affected by extreme values.
What is the underlying JavaScript library that Plotly uses to render its graphics?
- D3.js
- Node.js
- React.js
- jQuery
Plotly uses D3.js (Data-Driven Documents) under the hood to render its graphics. D3.js is a JavaScript library for producing dynamic and interactive data visualizations in web browsers.
Readability in data visualization refers to how easily the audience can __________.
- Analyze the underlying code
- Download the graph
- Interact with the graph
- Understand the represented data
Readability in data visualization refers to how easily the audience can understand the represented data. This includes the clarity of text elements (labels, title, caption), color scheme, and whether the choice of plot type makes sense for the represented data.
A teacher is analyzing test scores and finds that the distribution is bimodal, with one peak at 70 and another at 90. Which measure of central tendency might not be the best choice in this situation, and why?
- Mean, because it doesn't reflect the peaks
- Median, because it doesn't reflect the bimodality
- Mode, because there are two peaks
- None, because all are suitable
The "Mean" might not be the best choice in this situation because it does not reflect the two peaks. The mean would give a single central value, which does not accurately represent the two distinct groups in a bimodal distribution.
You are given a dataset where the salaries of a company are reported. The CEO's salary is significantly higher than the rest of the employees. Which measure of central tendency would give a more representative measure of the typical salary?
- Mean
- Median
- Mode
- None would be representative
The "Median" would be a more representative measure of the typical salary. Because the CEO's salary is an outlier and would significantly skew the mean, the median provides a more accurate central measure by considering the middle value in the sorted data.
Which of the following graphs can help identify outliers in a univariate dataset?
- Bar Chart
- Box Plot
- Line Graph
- Pie Chart
A box plot is a type of graph that can help identify outliers in a univariate dataset.
How does the Spearman's correlation handle ties compared to Kendall's Tau?
- It doesn't handle ties
- It handles ties better than Kendall's Tau
- It handles ties worse than Kendall's Tau
- The method of handling ties is the same
Spearman's correlation coefficient handles ties worse than Kendall's Tau. While both are rank correlation coefficients, Kendall's Tau is better at handling ties. Ties are handled in Spearman's correlation by assigning each tied group the mean of the ranks they would have received if they weren't tied.