________ clustering is a density-based clustering method that can find arbitrary shaped clusters and is less affected by outliers.

  • DBSCAN
  • Hierarchical
  • K-means
  • Spectral
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering method that can find arbitrary shaped clusters and is less affected by outliers. It works based on the density of points in a region, growing clusters according to the density estimate.

What does the standard deviation measure in a dataset?

  • Central tendency
  • Dispersion
  • Kurtosis
  • Skewness
The standard deviation measures the amount of variation or dispersion in a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range.

How can you determine skewness of a distribution using a box plot?

  • By the height of the box
  • By the position of the median in the box
  • By the width of the box
  • It cannot be determined from a box plot
The skewness of a distribution can be determined using a box plot by looking at the position of the median in the box. If the median is not in the center of the box (i.e., the quartiles are not equidistant from the median), the data is skewed. If the median is closer to the bottom of the box, the data is positively skewed, and if it's closer to the top, the data is negatively skewed.

What does the Law of Large Numbers state?

  • It states that as the size of a sample is increased, the mean value of the sample will get closer to the mean or expected value of the population.
  • It states that if an event is repeated under identical conditions, the probability of the event remains the same.
  • It's a rule which states that the sum of the probabilities of all possible events is 1.
  • It's the law that states the probability of an event is always constant.
The Law of Large Numbers states that as a sample size grows, its mean gets closer to the average of the whole population. In other words, as the number of experiments increases, the actual ratio of outcomes will converge on the theoretical, or expected, ratio of outcomes.

If the p-value from a Mann-Whitney U test is less than the significance level, you would ________ the null hypothesis.

  • accept
  • either accept or reject
  • fail to reject
  • reject
If the p-value from a Mann-Whitney U test is less than the significance level (often 0.05), you would reject the null hypothesis, suggesting there is a significant difference between the groups.

The graphical representation of residuals versus predicted values is known as a ________ plot.

  • Box
  • Histogram
  • Residual
  • Scatter
A Residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate.

What can the Mann-Whitney U test tell you about the shape of your distributions?

  • It can confirm if your distributions are normal
  • It can confirm if your distributions are skewed
  • It can confirm if your distributions have equal variances
  • It cannot tell you anything about the shape of your distributions
The Mann-Whitney U test does not provide information about the shape of the distributions. It is a non-parametric test that does not make any assumptions about the distribution of the data.

What is the purpose of multiple linear regression analysis?

  • To classify data into different categories
  • To cluster data into different groups
  • To examine the relationship between several independent variables and a dependent variable
  • To predict the outcome of a binary dependent variable
Multiple linear regression analysis is used to understand the relationship between several independent (explanatory) variables and a dependent (response) variable. It can also be used for predicting the mean value of the dependent variable given the values of the independent variables.

What is the relationship between the eigenvalue of a component and the variance of that component in PCA?

  • It depends on the dataset
  • There is no relationship
  • They are directly proportional
  • They are inversely proportional
The eigenvalue of a component in PCA is directly proportional to the variance of that component. In other words, a larger eigenvalue corresponds to a larger amount of variance explained by that principal component.

_________ sampling is a method where every individual in the population has an equal chance of being selected.

  • Cluster
  • Simple Random
  • Stratified
  • Systematic
Simple random sampling is a basic type of sampling method where each individual in the population has an equal chance of being selected. This ensures that the sample will be representative of the population, making it easier to make accurate inferences about the whole population.