The measure of how much individual sample means will vary is called the __________ error.

  • Absolute
  • Margin of
  • Sampling
  • Standard
The standard error of a statistic is a measure of the statistical accuracy of an estimate, equal to the standard deviation of the theoretical distribution of a large population of such estimates. It is used to test hypotheses on the grounds of a set of data. For sample means, the standard error tells us how the mean varies from one sample to another.

How does changing the units of measurement affect the standard deviation and variance of a dataset?

  • It decreases them
  • It depends on the new units
  • It doesn't affect them
  • It increases them
Changing the units of measurement will change the scale of the data, and hence will affect the values of standard deviation and variance. If the data is scaled up, both measures will increase, and if the data is scaled down, they will decrease. However, the relative dispersion, as measured by the coefficient of variation, will remain the same.

What is the principle of equally likely outcomes?

  • All outcomes are equally probable
  • All outcomes are identical
  • All outcomes are independent
  • All outcomes are mutually exclusive
The principle of equally likely outcomes is a basic assumption in the classical definition of probability. It states that if an experiment has n outcomes, and there's no reason to believe that any one outcome is more likely than any other, then each outcome is assumed to have an equal probability of 1/n. For example, in tossing a fair coin, heads and tails are equally likely.

________ clustering is a density-based clustering method that can find arbitrary shaped clusters and is less affected by outliers.

  • DBSCAN
  • Hierarchical
  • K-means
  • Spectral
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering method that can find arbitrary shaped clusters and is less affected by outliers. It works based on the density of points in a region, growing clusters according to the density estimate.

What does the standard deviation measure in a dataset?

  • Central tendency
  • Dispersion
  • Kurtosis
  • Skewness
The standard deviation measures the amount of variation or dispersion in a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range.

How can you determine skewness of a distribution using a box plot?

  • By the height of the box
  • By the position of the median in the box
  • By the width of the box
  • It cannot be determined from a box plot
The skewness of a distribution can be determined using a box plot by looking at the position of the median in the box. If the median is not in the center of the box (i.e., the quartiles are not equidistant from the median), the data is skewed. If the median is closer to the bottom of the box, the data is positively skewed, and if it's closer to the top, the data is negatively skewed.

The ________ score is a measure of how close each point in one cluster is to the points in the neighboring clusters.

  • boundary
  • distance
  • proximity
  • silhouette
The silhouette score is a measure of how close each point in one cluster is to the points in the neighboring clusters. It ranges from -1 (incorrect clustering) to +1 (highly dense clustering). 0 indicates overlapping clusters.

What types of scales of measurement are suitable for non-parametric tests?

  • Nominal, ordinal, interval, and ratio
  • Only interval and ratio
  • Only nominal and ordinal
  • Only ratio
Non-parametric tests can be used with nominal, ordinal, interval, and ratio scales of measurement. This is one of the reasons why non-parametric tests are sometimes chosen over parametric ones, as they can handle data that are not interval or ratio (which are required for many parametric tests).

If the p-value from a Mann-Whitney U test is less than the significance level, you would ________ the null hypothesis.

  • accept
  • either accept or reject
  • fail to reject
  • reject
If the p-value from a Mann-Whitney U test is less than the significance level (often 0.05), you would reject the null hypothesis, suggesting there is a significant difference between the groups.

What does the Law of Large Numbers state?

  • It states that as the size of a sample is increased, the mean value of the sample will get closer to the mean or expected value of the population.
  • It states that if an event is repeated under identical conditions, the probability of the event remains the same.
  • It's a rule which states that the sum of the probabilities of all possible events is 1.
  • It's the law that states the probability of an event is always constant.
The Law of Large Numbers states that as a sample size grows, its mean gets closer to the average of the whole population. In other words, as the number of experiments increases, the actual ratio of outcomes will converge on the theoretical, or expected, ratio of outcomes.