In a positively skewed distribution, which is greater: mean or median?

  • Both are equal
  • Mean
  • Median
  • nan
In a positively skewed distribution, the "Mean" is generally greater than the median. Positive skewness means that the distribution has a long right tail, so extreme values in the positive direction can pull the mean upwards more than the median.

In a leptokurtic distribution, the kurtosis value is ___________ than 0.

  • Any of these
  • Equal
  • Greater
  • Less
A leptokurtic distribution has kurtosis greater than 0, indicating a sharper peak and fatter tails compared to a normal distribution.

In a model-based imputation, the choice of the model has a direct impact on the ____________ of the imputation process.

  • Accuracy
  • All of the above
  • Complexity
  • Time
The choice of the model in a model-based imputation method directly affects the accuracy of the imputation process. If the chosen model does not accurately reflect the true data generation process, the imputed values may be biased, leading to incorrect conclusions.

In the context of data visualization, what is a pairplot primarily used for?

  • Comparing multiple variables at once
  • Showing the correlation between two variables
  • Visualizing the distribution of a single variable
  • Visualizing the relationship between two variables
Pairplots are primarily used for comparing multiple variables at once. It creates a grid of scatter plots for each pair of variables, which helps in understanding the relationships between all variables.

Which category of missing data implies that the probability of missingness is related to the observed data?

  • MAR
  • MCAR
  • NMAR
  • nan
MAR, which stands for Missing At Random, implies that the probability of missingness is related to the observed data.

A company has asked you to build a model that can predict customer churn based on a set of features. Which type of data analysis will you perform?

  • All are equally suitable
  • CDA
  • EDA
  • Predictive Modeling
Predictive Modeling would be most suitable in this case. It involves the application of machine learning algorithms to the data in order to make predictions about future outcomes, in this case, customer churn.

________ correlation is more appropriate when dealing with ordinal variables.

  • Covariance
  • Kendall's Tau
  • Pearson's
  • Spearman's
Spearman's correlation is more appropriate when dealing with ordinal variables. Unlike Pearson's, Spearman's correlation works with ranks, which makes it suitable for ordinal data.

Under what circumstances is NMAR typically observed in a dataset?

  • All of the above
  • When data missingness is associated with the missing data itself
  • When data missingness is random
  • When data missingness is unrelated to observed and unobserved data
NMAR (Not Missing At Random) is typically observed when the missingness is related to the value of the missing data itself. This is the most challenging type of missingness to handle as it relies on unobserved data.

The '______' step in the EDA process involves formulating the questions you want to answer with your data.

  • communicating
  • concluding
  • questioning
  • wrangling
The first step in the EDA process, 'questioning,' involves formulating the questions that you want to answer with your data. It's during this step that you define what you want to achieve with your analysis and what problems you are trying to solve.

Anomalies or outliers in the dataset can be identified through the process of ________.

  • CDA
  • EDA
  • Machine Learning
  • Predictive Modeling
Anomalies or outliers in the dataset can be identified through the process of EDA. Various techniques such as visualization methods (like box plots and scatter plots) and statistical methods (like the IQR method or the Z-score method) can be used to detect outliers during EDA.