What kind of distribution is indicated by a skewness of zero?

  • A bimodal distribution.
  • A negatively skewed distribution.
  • A normal distribution.
  • A positively skewed distribution.
A skewness of zero is indicative of a "Normal Distribution". In a perfect normal distribution, both tails are equal, so they balance each other out, and hence, the skewness is zero.

If missing data is not properly addressed, the model's ________ can be significantly affected.

  • F1 score
  • accuracy
  • precision
  • recall
If missing data is not handled correctly, it can lead to biases in the data, which can adversely affect the model's accuracy.

In a clinical trial, the average recovery time for a new drug is drastically increased due to a patient who took an unusually long time to recover. How would this patient's data point be classified in this context?

  • A normal data point
  • An error
  • An outlier
  • nan
This patient's data point would be classified as an outlier as it deviates significantly from the other data points.

What are the effects of outliers on the results of a hypothesis testing procedure?

  • All of these
  • Can affect the statistical significance
  • Can lead to type I errors
  • Can lead to type II errors
Outliers can affect the results of a hypothesis testing procedure in several ways. They can lead to Type I or Type II errors, and can also affect the statistical significance of the test, thereby potentially leading to incorrect conclusions.

What role does bin size play in outlier detection when using a histogram?

  • Bin size can influence outlier detection
  • Bin size does not influence outlier detection
  • Larger bin size always increases outlier visibility
  • Smaller bin size always increases outlier visibility
The bin size in a histogram can influence the visibility of outliers. Depending on how the data is binned, an outlier may or may not be clearly visible.

The mode is the only measure of central tendency that can be used for _____ data.

  • Categorical
  • Interval
  • Numerical
  • Ordinal
The "Mode" is the only measure of central tendency that can be used for "Categorical" data. This is because it simply represents the most frequently occurring category or value.

You are developing a linear regression model and notice that despite a high R-squared value, none of your independent variables are statistically significant. What might be the potential issue here?

  • Data leakage
  • High variance
  • Multicollinearity
  • Underfitting
This could be due to multicollinearity. Multicollinearity inflates the variances of the parameter estimates, which might lead to none of them being statistically significant. Despite this, the overall model might still be significant, leading to a high R-squared value.

You have a dataset with an odd number of observations. If you were to calculate both the mean and median, how would adding a very large value to the dataset affect these measures of central tendency?

  • Both would increase
  • Both would remain unchanged
  • Only the mean would increase
  • Only the median would increase
Adding a very large value to the dataset would increase the "Mean" because it takes into account all values in the data set. However, the "Median" would not be affected unless the new value changes the middle value of the ordered data set.

In _____ deletion, all data from a participant is discarded if any single value is missing.

  • Listwise
  • Pairwise
  • Random
  • Systematic
In 'listwise' deletion, all data from a participant is discarded if any single value is missing. It is the simplest form of dealing with missing data but can lead to significant loss of information if missing data is not completely at random.

A correlation coefficient of +1 between two variables indicates what kind of relationship?

  • No relationship
  • Perfect negative linear relationship
  • Perfect positive linear relationship
  • Weak relationship
A correlation coefficient of +1 between two variables indicates a perfect positive linear relationship. This means that if one variable increases, the other variable also increases at a constant rate, and vice versa.