You are analyzing the number of calls received by a call center per hour. Which distribution would be most suitable for modeling this data and why?

  • Binomial Distribution because it represents the number of successes in a given number of trials
  • Normal Distribution because it represents continuous data
  • Poisson Distribution because it models the number of events occurring in a fixed interval of time
  • Uniform Distribution because all outcomes are equally likely
The Poisson Distribution is most suitable for modeling the number of calls received by a call center per hour because it models the number of events (calls) occurring in a fixed interval of time (per hour).

Consider a data distribution with a positive skewness and a high kurtosis. What does this scenario indicate about the distribution?

  • It has a symmetrical distribution.
  • It has evenly spread out values.
  • It has many values clustered around the left tail with potential outliers.
  • It has many values clustered around the right tail with potential outliers.
Positive skewness and high kurtosis imply that the data is heavily tailed to the right and the peak is sharp. Most of the data values are concentrated around the left tail, but there are potential outliers towards the more positive values.

What range of values does a dataset typically have after Min-Max scaling?

  • -1 to 1
  • 0 to 1
  • Depends on the dataset
  • Depends on the feature
Min-Max scaling transforms features by scaling each feature to a given range. The default range for the Min-Max scaling technique is 0 to 1. Therefore, after Min-Max scaling, the dataset will typically have values ranging from 0 to 1.

What is the term for the measure of how spread out the values in a data set are?

  • Central Tendency
  • Dispersion
  • Kurtosis
  • Skewness
The term for the measure of how spread out the values in a data set are is called "Dispersion". It includes range, interquartile range (IQR), variance, and standard deviation.

You've created a histogram of your data and you notice a few bars standing alone far from the main distribution. What might this suggest?

  • Data is evenly distributed
  • Normal distribution
  • Outliers
  • Skewness
In a histogram, bars that stand alone far from the main distribution often suggest the presence of outliers.

You have a dataset where the relationships between variables are not linear. Which correlation method is better to use and why?

  • Covariance
  • Kendall's Tau
  • Pearson's correlation coefficient
  • Spearman's correlation coefficient
For non-linear relationships between variables, Spearman's correlation coefficient would be a better choice. This is because Spearman's correlation measures the monotonic relationship between two variables and does not require the relationship to be linear.

Which of the following is a type of data distribution?

  • Age Bracket Distribution
  • Binomial Distribution
  • Household Distribution
  • Sales Distribution
The Binomial Distribution is a type of probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials each with the same probability of success.

Describe the impact of skewness and kurtosis on parametric testing.

  • They can improve the accuracy of parametric testing.
  • They can invalidate the results of parametric testing.
  • They can reduce the variance in parametric testing.
  • They do not impact parametric testing.
Skewness and kurtosis can invalidate the results of parametric testing. Many parametric tests assume that the data follows a normal distribution. If the data is highly skewed or has high kurtosis, these assumptions are violated, and the test results may not be valid.

If a distribution is leptokurtic, what does it signify about the data?

  • The data has a high variance.
  • The data is heavily tailed with potential outliers.
  • The data is less outlier-prone.
  • The data is normally distributed.
Leptokurtic distribution signifies that the data has heavy tails and a sharp peak, meaning there are substantial outliers (or extreme values). This kind of distribution often indicates that the data may have more frequent large jumps away from the mean.

A potential drawback of the Z-score method for outlier detection is that it assumes the data is _______ distributed.

  • exponentially
  • logistically
  • normally
  • uniformly
The Z-score method assumes that the data is normally distributed, which may not be the case with all datasets, and is a drawback.