The conditional probability of A given B is denoted as ________.

  • P(A + B)
  • P(A / B)
  • P(A B)
  • P(A ∩ B)
The conditional probability of A given B is denoted as P(A

What is the primary goal of random sampling?

  • To always select the same individuals
  • To ensure that every member of the population has an equal chance of being selected
  • To select individuals who are likely to give the desired results
  • To select the individuals who are easiest to reach
The primary goal of random sampling is to ensure that every member of the population has an equal chance of being selected. This helps to reduce bias and increase the likelihood that the sample is representative of the population, which makes the results more valid and generalizable.

Why is it important to consider the power of a test when designing a study?

  • To ensure the study can detect an effect if it exists
  • To ensure the study does not detect an effect if it does not exist
  • To maximize the chance of a Type I error
  • To minimize the chance of a Type I error
The power of a test is the ability of the test to detect an effect if it truly exists. It's the probability that the test correctly rejects a false null hypothesis. High power is desirable because it means the test is less likely to make a Type II error (false negative). When designing a study, it's important to choose a sample size and significance level that will provide enough power to detect an effect if one exists.

_______ is a measure of how spread out the numbers in a dataset are around the mean.

  • Median
  • Range
  • Standard Deviation
  • Variance
Standard deviation is a measure of how spread out the numbers in a dataset are around the mean. It measures the average distance between each data point and the mean. The higher the standard deviation, the more spread out the data is.

In the context of cluster analysis, what is the 'centroid'?

  • The average distance between clusters
  • The geometric center of a cluster
  • The largest point in a cluster
  • The smallest point in a cluster
The centroid is the geometric center of a cluster. In other words, it's the mean value of all the points in a specific cluster.

What is the effect of monotonic transformations on Spearman’s rank correlation coefficient?

  • They decrease the coefficient
  • They don't affect the coefficient
  • They increase the coefficient
  • They make the coefficient negative
Monotonic transformations do not affect the Spearman’s rank correlation coefficient. This is because Spearman's correlation is based on the rank order of data, and monotonic transformations preserve this order.

What's the difference between a histogram and a bar plot?

  • Bar plots are for continuous data, histograms for categorical data
  • Both are for continuous data only
  • Histograms are for continuous data, bar plots for categorical data
  • There is no difference
The main difference between a histogram and a bar plot is the type of data they represent. A histogram is used for continuous data, where the bins represent ranges of data, while a bar plot is used for categorical data to compare the frequency or count of different categories.

What is the error term in a simple linear regression model?

  • It is the dependent variable
  • It is the difference between the observed and predicted values
  • It is the independent variable
  • It is the slope of the regression line
The error term in a simple linear regression model is the difference between the observed and predicted values. It captures the variability in the dependent variable that is not explained by the independent variable in the model.

What can be inferred if the residuals are not randomly distributed in the residual plot?

  • The data has no outliers
  • The data is perfectly linear
  • The linear regression model is a perfect fit for the data
  • The linear regression model is not a good fit for the data
If the residuals are not randomly distributed (e.g., if they form a pattern), it suggests that the linear regression model is not a good fit for the data. This could be because the relationship between the variables is not linear, or because the data exhibits heteroscedasticity (unequal variances of errors), among other reasons.

What type of data is used in the Chi-square test for goodness of fit?

  • Categorical data
  • Continuous data
  • Interval data
  • Ordinal data
The Chi-square test for goodness of fit is used with categorical data. It compares the observed frequencies in each category with the frequencies we would expect to see if the data followed the theoretical distribution.