How does the IQR method categorize a data point as an outlier?

  • By comparing it to the mean
  • By comparing it to the median
  • By comparing it to the standard deviation
  • By seeing if it falls below Q1-1.5IQR or above Q3+1.5IQR
The IQR method categorizes a data point as an outlier by seeing if it falls below Q1-1.5IQR or above Q3+1.5IQR.

You're working with a data set that does not follow a normal distribution. Which method, Z-score or IQR, should be used for detecting outliers?

  • Both are suitable
  • IQR
  • Neither is suitable
  • Z-score
In this case, the IQR method is a better choice as it does not assume any specific data distribution unlike the Z-score method, which assumes data is normally distributed.

You are visualizing a heatmap and notice a row with colors drastically different than the rest. What might this indicate about the corresponding variable?

  • The variable has a unique distribution
  • The variable has many missing values
  • The variable is an outlier
  • The variable is unrelated to the others
If a row in a heatmap has colors that are drastically different than the rest, it might indicate that the corresponding variable is unrelated or has very different relationships with the other variables in the dataset.

How does standard deviation differ in a sample versus a population?

  • The denominator in the calculation of the sample standard deviation is (n-1)
  • The standard deviation of a sample is always larger
  • The standard deviation of a sample is always smaller
  • They are calculated in the same way
The "Standard Deviation" in a sample differs from that in a population in the way it is calculated. For a sample, the denominator is (n-1) instead of n, which is Bessel's correction to account for sample bias.

What does a correlation coefficient close to 0 indicate about the relationship between two variables?

  • A perfect negative linear relationship
  • A perfect positive linear relationship
  • A very strong linear relationship
  • No linear relationship
A correlation coefficient close to 0 indicates that there is no linear relationship between the two variables. This means that changes in one variable are not consistently associated with changes in the other variable. It does not necessarily mean that there is no relationship at all, as there may be a non-linear relationship.

What step comes after 'wrangling' in the EDA process?

  • Communicating
  • Concluding
  • Exploring
  • Questioning
Once the data has been 'wrangled' i.e., cleaned and transformed, the next step in the EDA process is 'exploring'. This stage involves examining the data through statistical analysis and visual methods.

Which type of analysis is most commonly used for hypothesis testing?

  • CDA
  • Data Visualization
  • EDA
  • Predictive Modeling
CDA (Confirmatory Data Analysis) is most commonly used for hypothesis testing. While EDA is used to formulate hypotheses, CDA uses statistical techniques to confirm or reject these hypotheses.

How does negative kurtosis affect the tails of a data distribution?

  • It has no effect on the tails of the distribution.
  • It makes the distribution perfectly symmetrical.
  • It makes the tails of the distribution heavier.
  • It makes the tails of the distribution lighter.
Negative kurtosis, also known as platykurtic kurtosis, makes the tails of the data distribution lighter, indicating fewer extreme outliers. The distribution is flatter or more spread out than a normal distribution.

What type of plot is often used for visualizing the relationship between two continuous variables?

  • Bar plot
  • Box plot
  • Histogram
  • Scatter plot
Scatter plots are ideal for visualizing the relationship between two continuous variables. Each point in the scatter plot corresponds to the values of two variables.

What is the process of removing an entire row when any single data point within it is missing called?

  • Listwise Deletion
  • Mean Imputation
  • Pairwise Deletion
  • Regression Imputation
The process of removing an entire row when any single data point within it is missing is called 'Listwise Deletion'. Also known as 'Complete Case Analysis', this technique is straightforward and fast, but it can potentially discard valuable data and introduce bias if the missingness is not completely at random.