What is the z-value associated with a 95% confidence interval in a standard normal distribution?

  • 1.64
  • 1.96
  • 2
  • 2.33
The z-value associated with a 95% confidence interval in a standard normal distribution is approximately 1.96. This means that we are 95% confident that the true population parameter lies within 1.96 standard deviations of the sample mean.

How is the interquartile range different from the range in handling outliers?

  • Both exclude outliers
  • Both include outliers
  • The interquartile range does not include outliers, the range does
  • The interquartile range includes outliers, the range does not
The interquartile range, which is the difference between the upper quartile (Q3) and the lower quartile (Q1), represents the middle 50% of the data and is not affected by outliers. The range, on the other hand, is the difference between the maximum and minimum data values and is significantly affected by outliers.

How can 'outliers' impact the result of K-means clustering?

  • Outliers can distort the shape and size of the clusters
  • Outliers can lead to fewer clusters
  • Outliers can lead to more clusters
  • Outliers don't impact K-means clustering
Outliers can have a significant impact on the result of K-means clustering. They can distort the shape and size of the clusters, as they may pull the centroid towards them, creating less accurate and meaningful clusters.

A positive Pearson's Correlation Coefficient indicates a ________ relationship between two variables.

  • inverse
  • linear
  • perfect
  • positive
A positive Pearson's Correlation Coefficient indicates a positive relationship between two variables. This means that as one variable increases, the other variable also increases, and vice versa.

What are the assumptions made in simple linear regression?

  • Homogeneity, normality, and symmetry
  • Independence, homogeneity, and linearity
  • Linearity, homoscedasticity, and normality
  • Symmetry, linearity, and independence
The assumptions made in simple linear regression include linearity (the relationship between the independent and dependent variables is linear), homoscedasticity (the variance of the residuals is constant across all levels of the independent variable), and normality (the residuals are normally distributed).

Principal Component Analysis (PCA) is a dimensionality reduction technique that projects the data into a lower dimensional space called the _______.

  • eigen space
  • feature space
  • subspace
  • variance space
PCA is a technique that projects the data into a new, lower-dimensional subspace. This subspace consists of principal components which are orthogonal to each other and capture the maximum variance in the data.

The range of a dataset is sensitive to _______.

  • Mean
  • Median
  • Mode
  • Outliers
The range of a dataset is sensitive to outliers. Because the range is calculated as the difference between the maximum and minimum values, an outlier (an extremely high or low value) can greatly increase the range.

How is the Chi-square statistic calculated in a goodness of fit test?

  • The differences between observed and expected frequencies are averaged
  • The differences between observed and expected frequencies are divided by the expected frequencies
  • The differences between observed and expected frequencies are squared and summed
  • The differences between observed and expected frequencies are squared, summed, and then the square root is taken
In a Chi-square goodness of fit test, the Chi-square statistic is calculated by squaring the differences between observed and expected frequencies, then summing these squared differences.

In a ________ distribution, the events occur with a known constant mean rate and independently of the time since the last event.

  • Binomial
  • Normal
  • Poisson
  • Uniform
The Poisson distribution models the number of events happening in a fixed interval of time or space, given a constant mean rate of occurrence and independence of the time since the last event.

How is the probability of the complement of an event A calculated?

  • 1 - P(A)
  • P(A) * P(A')
  • P(A) + P(A')
  • P(A) - P(A')
The probability of the complement of an event A, denoted as P(A') or P(not A), is calculated as 1 - P(A). This is because an event and its complement are mutually exclusive and exhaustive, meaning either the event occurs or it does not.