Which of the following graphs can help identify outliers in a univariate dataset?

  • Bar Chart
  • Box Plot
  • Line Graph
  • Pie Chart
A box plot is a type of graph that can help identify outliers in a univariate dataset.

How does the Spearman's correlation handle ties compared to Kendall's Tau?

  • It doesn't handle ties
  • It handles ties better than Kendall's Tau
  • It handles ties worse than Kendall's Tau
  • The method of handling ties is the same
Spearman's correlation coefficient handles ties worse than Kendall's Tau. While both are rank correlation coefficients, Kendall's Tau is better at handling ties. Ties are handled in Spearman's correlation by assigning each tied group the mean of the ranks they would have received if they weren't tied.

In the context of outlier detection, a Z-score above or below _______ is typically considered as an outlier.

  • 1.5
  • 2
  • 2.5
  • 3
A data point with a Z-score above 3 or below -3 is usually considered an outlier. However, this threshold can vary depending on the context.

Even after concluding, it's crucial to '______' effectively in the EDA process, as this step is where your findings are shared and potentially acted upon.

  • communicate
  • conclude
  • question
  • wrangle
Even after concluding, it's crucial to 'communicate' effectively in the EDA process, as this step is where your findings are shared and potentially acted upon. Communication is not only about presenting the findings, but also about making sure that they are understood and can be acted upon.

Consider you are using a correlation matrix to understand the relationship between multiple features. You come across a correlation coefficient of -0.85 between two features. What does this indicate?

  • A strong negative linear relationship
  • A strong positive linear relationship
  • A weak positive linear relationship
  • No relationship
A correlation coefficient of -0.85 indicates a strong negative linear relationship between two features. This means as one feature increases, the other decreases.

Replacing missing values with the median of the existing values is known as _____ imputation.

  • Mean
  • Median
  • Mode
  • Pairwise
Replacing missing values with the median of the existing values is known as 'median' imputation. This technique is useful for skewed distributions as the median is less affected by outliers than the mean.

In a survey about income levels, some individuals chose not to disclose their earnings. How would you categorize this missing data?

  • MAR
  • MCAR
  • NMAR
  • Not missing data
This would also be NMAR (Not Missing at Random) because the missingness (income level) depends on the value of the unobserved data itself (i.e., people with higher or lower incomes may be more likely to omit this information).

In a correlation matrix, the value -1 signifies a perfect _____ correlation between two variables.

  • negative
  • neutral
  • positive
  • random
In a correlation matrix, a value of -1 signifies a perfect negative correlation between two variables. This means that as one variable increases, the other decreases proportionally, and vice versa.

Outliers can make a histogram appear ____, hence, distorting the true distribution of the data.

  • skewed
  • spread out
  • symmetrical
  • uniform
Outliers can cause a histogram to appear skewed or distorted as they can create bars that stand alone far from the main distribution.

Imagine you're working on a data project where the 'wrangle' phase is taking significantly longer than expected. How might this impact the rest of your EDA process?

  • It could delay subsequent steps and overall analysis timeline.
  • The communication phase will be quicker.
  • The explore phase might be shortened to make up for lost time.
  • The rest of the process will not be impacted.
If the 'wrangling' phase takes significantly longer than expected, it could delay subsequent steps and the overall timeline for the analysis. The EDA process is often iterative, and delays in one phase could impact the time available for later phases. Proper time management and planning are crucial for a successful data analysis project.