What are the key components to focus on during the 'communicate' step in EDA?

  • Cleaning and transforming data
  • Ensuring the insights are effectively conveyed to relevant stakeholders
  • Only sharing the raw data
  • Reordering the EDA steps
During the communication phase of the EDA process, the key focus is to ensure that the insights, findings, or conclusions drawn from the analysis are effectively conveyed to the relevant stakeholders. This might involve presenting the insights in a simple and understandable manner, making use of visualizations, and tailoring the communication to the audience's needs and context.

If a row with at least one missing value is deleted, the process is known as _____.

  • Listwise Deletion
  • Mean Imputation
  • Mode Imputation
  • Pairwise Deletion
If a row with at least one missing value is deleted, the process is known as 'listwise deletion'. Although it is a simple method, it can result in loss of valuable information if the missing data is not completely random.

Given a set of data that follows a Binomial Distribution, how would you estimate the parameters of the distribution?

  • By applying the Central Limit Theorem
  • By computing the mean and standard deviation
  • By taking the square root of the data
  • By using a chi-squared test
The parameters of a Binomial Distribution can be estimated by computing the mean and standard deviation of the data.

The _____ of a histogram can significantly influence the representation of data.

  • Bin width
  • Color
  • Shape
  • Size
The bin width of a histogram is critical in data representation. If it's too large, it may smooth over the details of the distribution. If it's too small, the histogram may be too cluttered or noisy.

Outliers can potentially _______ the interpretation of the data.

  • Complicate
  • Improve
  • Simplify
  • Skew
Outliers can skew the interpretation of the data. They can affect the mean and standard deviation, thus distorting the overall understanding of the data.

What are the potential risks associated with incorrectly assuming that data are MCAR when they are actually MAR?

  • Bias in parameter estimates
  • Both underestimation of standard errors and bias in parameter estimates
  • No potential risks
  • Underestimation of standard errors
If data are incorrectly assumed to be MCAR when they are actually MAR, it can lead to both underestimation of standard errors and bias in parameter estimates, leading to inaccurate analyses and conclusions.

You notice that the data from some weather sensors is missing because the sensors malfunctioned when the temperature dropped below a certain level. What type of missing data does this represent?

  • MAR
  • MCAR
  • NMAR
  • Not missing data
This would be MAR (Missing at Random) because the missingness is related to an observed data (the temperature). The missing data is not random, but it doesn't depend on the unobserved data itself.

How is the shape of a Normal Distribution usually described?

  • Bell-shaped
  • Skewed to the left
  • Skewed to the right
  • Uniformly flat
A Normal Distribution is described as bell-shaped. It is symmetric around the mean, and most of the data falls close to the mean with fewer values further away.

What could be potential drawbacks of using regression imputation?

  • Can lead to an underestimation of errors
  • Can lead to biased results if relationships between variables are non-linear
  • Does not handle missing values
  • No drawbacks
The potential drawbacks of using regression imputation are that it can lead to an underestimation of errors or variances. This happens because it estimates missing values using a deterministic function (i.e., regression), but does not account for the inherent uncertainty associated with the missing values.

Which measure of central tendency is most affected by outliers in the data set?

  • All of them
  • Mean
  • Median
  • Mode
The "Mean" or the average is the measure of central tendency that is most affected by outliers in a data set. The mean considers every value in the data set, and hence, extreme values (outliers) can significantly affect its value.