Under what conditions does a binomial distribution approximate a normal distribution?

  • When the events are not independent
  • When the number of trials is large and the probability of success is not too close to 0 or 1
  • When the number of trials is small
  • When the probability of success changes with each trial
The binomial distribution approaches the normal distribution as the number of trials gets large, provided that the probability of success is not too close to 0 or 1. This is known as the De Moivre–Laplace theorem.

What are the degrees of freedom in a Chi-square test for goodness of fit?

  • The number of categories minus 1
  • The number of categories plus 1
  • The number of observations minus 1
  • The number of observations plus 1
In a Chi-square test for goodness of fit, the degrees of freedom are calculated as the number of categories minus 1.

If events A and B are independent, what is the P(A ∩ B)?

  • P(A) * P(B)
  • P(A) + P(B)
  • P(A) - P(B)
  • P(A) / P(B)
If events A and B are independent, the probability of both events occurring (P(A ∩ B)) is the product of their individual probabilities (P(A) * P(B)). This is a direct result of the Multiplication Rule for independent events.

In what type of data distribution do the mean, median, and mode coincide?

  • Negatively skewed distribution
  • Normal distribution
  • Positively skewed distribution
  • Uniform distribution
In a normal distribution, the mean, median, and mode all coincide, meaning they have the same value. A normal distribution is symmetrical, with the majority of observations clustering around the central peak; therefore, the mean, median, and mode all fall at the center.

What does a histogram represent in data visualization?

  • The change of a variable over time
  • The correlation between two variables
  • The frequency distribution of a single variable
  • The relationship between three variables
A histogram is a graphical representation of the distribution of a dataset. It is an estimate of the probability distribution of a continuous variable. To construct a histogram, the first step is to "bin" the range of values, i.e., divide the entire range of values into a series of intervals, and then count how many values fall into each interval.

What is the primary assumption for performing a Z-test?

  • All of the above
  • The data is normally distributed
  • The population standard deviation is known
  • The sample size is very large
The primary assumption for a Z-test is that the population standard deviation is known. While a large sample size and normal distribution are often associated with Z-tests, they're not the primary assumption.

What does a negative kurtosis indicate about the distribution of the dataset?

  • The distribution has a perfect bell shape
  • The distribution is less outlier-prone than a normal distribution
  • The distribution is more outlier-prone than a normal distribution
  • The distribution is skewed
A negative kurtosis value indicates that the distribution has lighter tails and a flatter peak than the normal distribution. It means there are fewer outliers (extreme values), thus it is less outlier-prone than a normal distribution.

What does the F-statistic signify in an ANOVA test?

  • The ratio of between-group variability to within-group variability
  • The ratio of total variability to within-group variability
  • The ratio of within-group variability to between-group variability
  • The ratio of within-group variability to total variability
In an ANOVA test, the F-statistic is the ratio of the between-group variability to the within-group variability. In other words, it measures how much the means of each group vary between the groups, compared to how much they vary within each group. A larger F-statistic implies a greater degree of difference between the group means.

What assumption about the residuals of a linear regression model does homoscedasticity refer to?

  • The residuals are independent
  • The residuals are normally distributed
  • The residuals have a linear relationship with the dependent variable
  • The residuals have constant variance
Homoscedasticity refers to the assumption that the residuals (errors) have constant variance at each level of the independent variable(s). This is important for the reliability of the regression model.

How does stratified random sampling differ from simple random sampling?

  • Stratified random sampling always involves larger sample sizes than simple random sampling
  • Stratified random sampling involves dividing the population into subgroups and selecting individuals from each subgroup
  • Stratified random sampling is the same as simple random sampling
  • Stratified random sampling only selects individuals from a single subgroup
Stratified random sampling differs from simple random sampling in that it first divides the population into non-overlapping groups, or strata, based on specific characteristics, and then selects a simple random sample from each stratum. This can ensure that each subgroup is adequately represented in the sample, which can increase the precision of estimates.