How does the IQR method categorize a data point as an outlier?

  • By comparing it to the mean
  • By comparing it to the median
  • By comparing it to the standard deviation
  • By seeing if it falls below Q1-1.5IQR or above Q3+1.5IQR
The IQR method categorizes a data point as an outlier by seeing if it falls below Q1-1.5IQR or above Q3+1.5IQR.

You're working with a data set that does not follow a normal distribution. Which method, Z-score or IQR, should be used for detecting outliers?

  • Both are suitable
  • IQR
  • Neither is suitable
  • Z-score
In this case, the IQR method is a better choice as it does not assume any specific data distribution unlike the Z-score method, which assumes data is normally distributed.

You are visualizing a heatmap and notice a row with colors drastically different than the rest. What might this indicate about the corresponding variable?

  • The variable has a unique distribution
  • The variable has many missing values
  • The variable is an outlier
  • The variable is unrelated to the others
If a row in a heatmap has colors that are drastically different than the rest, it might indicate that the corresponding variable is unrelated or has very different relationships with the other variables in the dataset.

How does standard deviation differ in a sample versus a population?

  • The denominator in the calculation of the sample standard deviation is (n-1)
  • The standard deviation of a sample is always larger
  • The standard deviation of a sample is always smaller
  • They are calculated in the same way
The "Standard Deviation" in a sample differs from that in a population in the way it is calculated. For a sample, the denominator is (n-1) instead of n, which is Bessel's correction to account for sample bias.

What does a correlation coefficient close to 0 indicate about the relationship between two variables?

  • A perfect negative linear relationship
  • A perfect positive linear relationship
  • A very strong linear relationship
  • No linear relationship
A correlation coefficient close to 0 indicates that there is no linear relationship between the two variables. This means that changes in one variable are not consistently associated with changes in the other variable. It does not necessarily mean that there is no relationship at all, as there may be a non-linear relationship.

What step comes after 'wrangling' in the EDA process?

  • Communicating
  • Concluding
  • Exploring
  • Questioning
Once the data has been 'wrangled' i.e., cleaned and transformed, the next step in the EDA process is 'exploring'. This stage involves examining the data through statistical analysis and visual methods.

You find that both Z-score and modified Z-score methods give different sets of outliers for the same dataset. How will you reconcile this?

  • Assume the Z-score method is correct
  • Assume the modified Z-score method is correct
  • Consider the intersection of both methods
  • Further inspect the data and the assumptions of each method
When two methods give different sets of outliers, it's best to further inspect the data and the assumptions of each method before drawing conclusions.

To create multiple plots in one figure in Matplotlib, you would use the ___________ function.

  • heatmap
  • pairplot
  • subplot
  • violinplot
The 'subplot' function in Matplotlib is used to create multiple plots in a single figure. It allows you to arrange plots in a grid structure.

What is the full form of NMAR in the context of missing data?

  • Never Missing At Random
  • No Missing At Random
  • Not Measured At Random
  • Not Missing At Random
In the context of missing data, NMAR stands for Not Missing At Random.

The _________ library in Python allows for the creation of complex animated plots and provides widgets to allow for interactive plots.

  • Bokeh
  • Matplotlib
  • Plotly
  • Seaborn
Bokeh is a powerful library for creating interactive plots, including complex animated plots, and it includes support for widgets, making it a great tool for creating dynamic, interactive visualizations.