What does it mean if the p-value in a Chi-square test is smaller than the significance level?

  • The alternative hypothesis is true
  • The null hypothesis is true
  • The test result is insignificant
  • There is not enough evidence to reject the null hypothesis
If the p-value in a Chi-square test is smaller than the significance level, we reject the null hypothesis in favor of the alternative hypothesis. This suggests that there is a significant association between the variables.

How does multicollinearity affect the coefficients in multiple linear regression?

  • It doesn't affect the coefficients
  • It makes the coefficients less interpretable
  • It makes the coefficients more precise
  • It makes the coefficients negative
Multicollinearity refers to a situation where two or more predictor variables in a multiple regression model are highly correlated. This high correlation can result in unstable coefficient estimates, making them less reliable and harder to interpret.

What is multicollinearity and how does it affect simple linear regression?

  • It is the correlation between dependent variables and it has no effect on regression
  • It is the correlation between errors and it makes the regression model more accurate
  • It is the correlation between independent variables and it can cause instability in the regression coefficients
  • It is the correlation between residuals and it causes bias in the regression coefficients
Multicollinearity refers to a high correlation among independent variables in a regression model. It does not reduce the predictive power or reliability of the model as a whole, but it can cause instability in the estimation of individual regression coefficients, making them difficult to interpret.

The distribution of all possible sample means is known as a __________.

  • Normal Distribution
  • Population Distribution
  • Sampling Distribution
  • Uniform Distribution
The sampling distribution in statistics is the probability distribution of a given statistic based on a random sample. For a statistic that is calculated from a sample, each different sample could (and likely will) provide a different value of that statistic. The sampling distribution shows us how those calculated statistics would be distributed.

How is 'K-means' clustering different from 'hierarchical' clustering?

  • Hierarchical clustering creates a hierarchy of clusters, while K-means does not
  • Hierarchical clustering uses centroids, while K-means does not
  • K-means requires the number of clusters to be defined beforehand, while hierarchical clustering does not
  • K-means uses a distance metric to group instances, while hierarchical clustering does not
K-means clustering requires the number of clusters to be defined beforehand, while hierarchical clustering does not. Hierarchical clustering forms a dendrogram from which the user can choose the number of clusters based on the problem requirements.

Under what conditions does a binomial distribution approximate a normal distribution?

  • When the events are not independent
  • When the number of trials is large and the probability of success is not too close to 0 or 1
  • When the number of trials is small
  • When the probability of success changes with each trial
The binomial distribution approaches the normal distribution as the number of trials gets large, provided that the probability of success is not too close to 0 or 1. This is known as the De Moivre–Laplace theorem.

In what type of data distribution do the mean, median, and mode coincide?

  • Negatively skewed distribution
  • Normal distribution
  • Positively skewed distribution
  • Uniform distribution
In a normal distribution, the mean, median, and mode all coincide, meaning they have the same value. A normal distribution is symmetrical, with the majority of observations clustering around the central peak; therefore, the mean, median, and mode all fall at the center.

What does a histogram represent in data visualization?

  • The change of a variable over time
  • The correlation between two variables
  • The frequency distribution of a single variable
  • The relationship between three variables
A histogram is a graphical representation of the distribution of a dataset. It is an estimate of the probability distribution of a continuous variable. To construct a histogram, the first step is to "bin" the range of values, i.e., divide the entire range of values into a series of intervals, and then count how many values fall into each interval.

What is the primary assumption for performing a Z-test?

  • All of the above
  • The data is normally distributed
  • The population standard deviation is known
  • The sample size is very large
The primary assumption for a Z-test is that the population standard deviation is known. While a large sample size and normal distribution are often associated with Z-tests, they're not the primary assumption.

What does a negative kurtosis indicate about the distribution of the dataset?

  • The distribution has a perfect bell shape
  • The distribution is less outlier-prone than a normal distribution
  • The distribution is more outlier-prone than a normal distribution
  • The distribution is skewed
A negative kurtosis value indicates that the distribution has lighter tails and a flatter peak than the normal distribution. It means there are fewer outliers (extreme values), thus it is less outlier-prone than a normal distribution.