What does the slope of the regression line represent in simple linear regression?

  • It represents the change in the dependent variable for a one-unit change in the independent variable
  • It represents the error term
  • It represents the independent variable
  • It represents the strength of the correlation
The slope of the regression line in simple linear regression represents the change in the dependent variable for a one-unit change in the independent variable. It quantifies the strength and direction of the linear relationship between the two variables.

What is the purpose of a Chi-square test for goodness of fit?

  • To compare the means of two groups
  • To compare the variance of two groups
  • To determine the correlation between two variables
  • To test if a data set follows a given theoretical distribution
The Chi-square test for goodness of fit is used to test whether the observed data fits a specific distribution. It compares the observed data with the values that would be expected under the theoretical distribution.

A ________ result in the Chi-square test for goodness of fit indicates that the observed distribution does not significantly differ from the expected distribution.

  • negative
  • non-significant
  • significant
  • skewed
A non-significant result in the Chi-square test for goodness of fit indicates that the observed distribution does not significantly differ from the expected distribution. In other words, we do not have enough evidence to reject the null hypothesis.

A ________ test is a common non-parametric statistical method.

  • ANOVA
  • Mann-Whitney U
  • Regression
  • T
The Mann-Whitney U test is a common non-parametric statistical method used to compare two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed.

In what scenarios might Spearman's rank correlation coefficient be a better choice than Pearson's?

  • When both variables are normally distributed
  • When the data contains outliers or is not normally distributed
  • When the relationship between variables is linear
  • When the relationship between variables is non-linear and non-monotonic
Spearman's rank correlation coefficient is a non-parametric measure of correlation, meaning it can be used when the data is not normally distributed. It is also less sensitive to outliers compared to Pearson's coefficient. Further, it can be used to measure monotonic relationships, whether they are linear or not.

How can you determine skewness of a distribution using a box plot?

  • By the height of the box
  • By the position of the median in the box
  • By the width of the box
  • It cannot be determined from a box plot
The skewness of a distribution can be determined using a box plot by looking at the position of the median in the box. If the median is not in the center of the box (i.e., the quartiles are not equidistant from the median), the data is skewed. If the median is closer to the bottom of the box, the data is positively skewed, and if it's closer to the top, the data is negatively skewed.

What does the standard deviation measure in a dataset?

  • Central tendency
  • Dispersion
  • Kurtosis
  • Skewness
The standard deviation measures the amount of variation or dispersion in a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range.

________ clustering is a density-based clustering method that can find arbitrary shaped clusters and is less affected by outliers.

  • DBSCAN
  • Hierarchical
  • K-means
  • Spectral
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering method that can find arbitrary shaped clusters and is less affected by outliers. It works based on the density of points in a region, growing clusters according to the density estimate.

What is the principle of equally likely outcomes?

  • All outcomes are equally probable
  • All outcomes are identical
  • All outcomes are independent
  • All outcomes are mutually exclusive
The principle of equally likely outcomes is a basic assumption in the classical definition of probability. It states that if an experiment has n outcomes, and there's no reason to believe that any one outcome is more likely than any other, then each outcome is assumed to have an equal probability of 1/n. For example, in tossing a fair coin, heads and tails are equally likely.

How does changing the units of measurement affect the standard deviation and variance of a dataset?

  • It decreases them
  • It depends on the new units
  • It doesn't affect them
  • It increases them
Changing the units of measurement will change the scale of the data, and hence will affect the values of standard deviation and variance. If the data is scaled up, both measures will increase, and if the data is scaled down, they will decrease. However, the relative dispersion, as measured by the coefficient of variation, will remain the same.

The measure of how much individual sample means will vary is called the __________ error.

  • Absolute
  • Margin of
  • Sampling
  • Standard
The standard error of a statistic is a measure of the statistical accuracy of an estimate, equal to the standard deviation of the theoretical distribution of a large population of such estimates. It is used to test hypotheses on the grounds of a set of data. For sample means, the standard error tells us how the mean varies from one sample to another.

What is the key characteristic of a symmetric distribution?

  • It has a mean of zero
  • It has a mode at the peak
  • It has no outliers
  • It has the same shape on the left and right when split vertically at the center
The key characteristic of a symmetric distribution is that it has the same shape on the left and right when split vertically at the center (i.e., about the mean). This means that the frequencies of corresponding values on either side of the center are equal.