While analyzing a dataset using a box plot, you notice that there are several data points plotted as circles. What might these circles represent?

  • Data within the interquartile range
  • Data within the whiskers
  • Median values
  • Outliers
In a box plot, data points plotted as circles often represent outliers.

What is the key difference between 'removal' and 'transformation' of outliers?

  • Removal changes the data distribution, while transformation does not
  • Removal deals with extreme values, while transformation does not
  • Removal discards outliers, while transformation modifies their values
  • Removal is a type of data cleaning, while transformation is not
The key difference between 'removal' and 'transformation' of outliers is that removal discards outliers from the dataset, while transformation modifies the values of outliers to reduce their impact.

What is an outlier in the context of Exploratory Data Analysis?

  • A data point that falls outside of the normal range
  • A data point that is a duplicate
  • A data point that is missing
  • A frequently occurring data point
In statistics, an outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In simple terms, an outlier is a value that is significantly different from other similar values.

You're working with a data set where a few observations are vastly different from the rest. Which method, Z-score or IQR, would be more robust to use for outlier detection?

  • Either would work equally well
  • IQR
  • Neither would be effective
  • Z-score
The IQR method is more robust than Z-score for outlier detection in this scenario, as Z-scores can be significantly affected by extreme values.

What is the underlying JavaScript library that Plotly uses to render its graphics?

  • D3.js
  • Node.js
  • React.js
  • jQuery
Plotly uses D3.js (Data-Driven Documents) under the hood to render its graphics. D3.js is a JavaScript library for producing dynamic and interactive data visualizations in web browsers.

Readability in data visualization refers to how easily the audience can __________.

  • Analyze the underlying code
  • Download the graph
  • Interact with the graph
  • Understand the represented data
Readability in data visualization refers to how easily the audience can understand the represented data. This includes the clarity of text elements (labels, title, caption), color scheme, and whether the choice of plot type makes sense for the represented data.

You have a data set with a large number of outliers. Which measure of dispersion should you use to best describe the data set, and why?

  • Interquartile range (IQR) because it is robust to outliers
  • Range because it covers all values
  • Standard deviation because it gives the average spread
  • Variance because it squares the differences
When dealing with a large number of outliers in a data set, the "Interquartile range (IQR)" is the most suitable measure of dispersion. This is because it measures the statistical spread between the 25th and 75th percentiles, thus excluding outliers.

A teacher is analyzing test scores and finds that the distribution is bimodal, with one peak at 70 and another at 90. Which measure of central tendency might not be the best choice in this situation, and why?

  • Mean, because it doesn't reflect the peaks
  • Median, because it doesn't reflect the bimodality
  • Mode, because there are two peaks
  • None, because all are suitable
The "Mean" might not be the best choice in this situation because it does not reflect the two peaks. The mean would give a single central value, which does not accurately represent the two distinct groups in a bimodal distribution.

You are given a dataset where the salaries of a company are reported. The CEO's salary is significantly higher than the rest of the employees. Which measure of central tendency would give a more representative measure of the typical salary?

  • Mean
  • Median
  • Mode
  • None would be representative
The "Median" would be a more representative measure of the typical salary. Because the CEO's salary is an outlier and would significantly skew the mean, the median provides a more accurate central measure by considering the middle value in the sorted data.

Which of the following graphs can help identify outliers in a univariate dataset?

  • Bar Chart
  • Box Plot
  • Line Graph
  • Pie Chart
A box plot is a type of graph that can help identify outliers in a univariate dataset.