You are working with a dataset where participants omitted to answer sensitive questions due to personal discomfort. How would you classify this type of missing data?

MAR
MCAR
NMAR
Not missing data

This is an example of NMAR (Not Missing at Random) because the probability of missingness depends on the unobserved data itself (i.e., the sensitive information that participants chose not to provide).

Discuss it

When the outlier is a result of a data entry error, the best approach would often be ________.

Binning
Imputation
Removal
Transformation

When outliers are due to data entry errors, they do not provide meaningful information, hence removing them would be the most appropriate method.

Discuss it

Under what conditions might a model-based method be preferred over other imputation methods?

When a known and well-fitting model can be assumed for the data
When the amount of missing data is negligible
When the data is missing completely at random
When the data is not missing at random

A model-based method might be preferred over other imputation methods when a known and well-fitting model can be assumed for the data. The model-based method is a principled method of handling missing data under the assumption that the data follows a specific statistical model. It could be any model like linear regression, logistic regression, etc.

Discuss it

What does the acronym MCAR stand for in the context of missing data?

Missing Coefficient At Random
Missing Completely And Regularly
Missing Completely At Random
Missing Conditionally At Random

MCAR stands for Missing Completely At Random. This occurs when the probability of missing data on a variable is unrelated to any other measured variable and is also unrelated to the variable itself.

Discuss it

Which plot can be considered a combination of a box plot and a rotated kernel density plot?

Histogram
Line plot
Scatter plot
Violin plot

A Violin plot can be considered a combination of a box plot and a rotated kernel density plot. This allows it to provide a more comprehensive view of the data distribution.

Discuss it

What could be the implications of a high degree of skewness for statistical inference?

A high degree of skewness implies a high degree of kurtosis.
A high degree of skewness may bias the statistical inference.
A high degree of skewness may reduce the standard deviation.
Skewness does not impact statistical inference.

A high degree of skewness can bias the statistical inference because it can affect the mean of the data significantly. Because many statistical techniques assume a normal distribution, skewness can violate assumptions and possibly lead to incorrect conclusions.

Discuss it

You are conducting a study on the annual rainfall in various cities. The data recorded is in millimeters. What type of data is this?

Continuous data
Discrete data
Nominal data
Ordinal data

Rainfall measurement in millimeters is a continuous variable because it can take on any value within a range.

Discuss it

If a distribution is bimodal with two distinct peaks, there may be _____ distinct mode(s).

None
One
Three
Two

In a bimodal distribution, there are "Two" distinct modes or peaks. Hence, there may be two distinct modes.

Discuss it

The data missingness mechanism that could lead to the most bias if not addressed properly is __________.

All missing data
MAR
MCAR
NMAR

The NMAR (Not Missing at Random) missing data mechanism could lead to the most bias if not addressed properly as the missingness is related to the unobserved data.

Discuss it

Which measure of central tendency is most affected by outliers in the data set?

All of them
Mean
Median
Mode

The "Mean" or the average is the measure of central tendency that is most affected by outliers in a data set. The mean considers every value in the data set, and hence, extreme values (outliers) can significantly affect its value.

Discuss it

The _________ function in Matplotlib is used to create a figure and a set of subplots.

heatmap
pairplot
subplot
subplots

The 'subplots' function in Matplotlib is used to create a figure and a set of subplots. This function provides a convenient way to create both a figure and one or more subplots with a single call.

Discuss it

How does the missing data mechanism affect the effectiveness of multiple imputation?

Affects only if data is missing at random
Affects only if data is not missing at random
Doesn't affect
Significantly affects

The missing data mechanism significantly affects the effectiveness of multiple imputation. If data is missing completely at random (MCAR), any method would give unbiased results, but if data is not missing at random (NMAR), the results might be biased even with multiple imputation. The effectiveness also depends on how accurately the imputation model reflects the data process.

Discuss it