In what type of data distribution do the mean, median, and mode coincide?

Negatively skewed distribution
Normal distribution
Positively skewed distribution
Uniform distribution

In a normal distribution, the mean, median, and mode all coincide, meaning they have the same value. A normal distribution is symmetrical, with the majority of observations clustering around the central peak; therefore, the mean, median, and mode all fall at the center.

Discuss it

If events A and B are independent, what is the P(A ∩ B)?

P(A) * P(B)
P(A) + P(B)
P(A) - P(B)
P(A) / P(B)

If events A and B are independent, the probability of both events occurring (P(A ∩ B)) is the product of their individual probabilities (P(A) * P(B)). This is a direct result of the Multiplication Rule for independent events.

Discuss it

What are the degrees of freedom in a Chi-square test for goodness of fit?

The number of categories minus 1
The number of categories plus 1
The number of observations minus 1
The number of observations plus 1

In a Chi-square test for goodness of fit, the degrees of freedom are calculated as the number of categories minus 1.

Discuss it

What does inference in multiple linear regression primarily involve?

Calculating the mean of the residuals
Creating the scatter plot
Drawing the best fit line
Interpreting the coefficients

Inference in multiple linear regression primarily involves interpreting the coefficients of the model, which represent the expected change in the response variable for each one-unit change in the respective explanatory variable, assuming all other variables are held constant.

Discuss it

What does kurtosis measure in a dataset?

Central tendency
Dispersion
Skewness
The "tailedness" of the distribution

Kurtosis is a statistical measure that defines how heavily the tails of a distribution differ from the tails of a normal distribution. In other words, kurtosis identifies whether the tails of a given distribution contain extreme values.

Discuss it

How is the variance related to the standard deviation in a data set?

The variance is the average of the standard deviation
The variance is the square of the standard deviation
The variance is the square root of the standard deviation
The variance is twice the standard deviation

The variance is the square of the standard deviation. Standard deviation is a measure of dispersion in a dataset and variance is a square of it, meaning that they both represent the same concept of dispersion, but in different units.

Discuss it

What is the assumption made when computing the Pearson correlation coefficient?

The correlation is zero
The variables are independent
The variables are normally distributed
There is a linear relationship between variables

When computing the Pearson correlation coefficient, it is assumed that there is a linear relationship between the variables. Furthermore, it's also assumed that the variables are continuous and that the data is homoscedastic (i.e., the variance of the errors is the same across all levels of the variables).

Discuss it

Conditional independence of A and B given C means that knowing that C has occurred does not change the ________ between A and B.

Difference
Intersection
Ratio
Relationship

Conditional independence of A and B given C means that knowing that C has occurred does not change the relationship between A and B. In other words, the occurrence of event C does not affect the independence of events A and B.

Discuss it

What is the purpose of 'normalization' or 'standardization' in the pre-processing step of cluster analysis?

To decrease the number of clusters
To ensure that all features contribute equally to the distance calculation
To handle missing values
To increase the computational complexity

Normalization or standardization ensures that all features contribute equally to the final distance calculation, regardless of their original scale. Without this step, features with larger scales would dominate the distance calculation, potentially leading to misleading clusters.

Discuss it

How does the height of a bar in a histogram relate to the frequency of the data?

It has no relation with the frequency
It represents the cumulative frequency
It represents the mean frequency
It represents the relative frequency

The height of a bar in a histogram represents the frequency (or relative frequency) of data for that particular bin. This means the taller the bar, the more data falls into that specific interval.

Discuss it

A statistical test has more power to detect an effect if the effect size is ______.

Equal to the sample size
Large
Small
Unchanged

The power of a test is influenced by the effect size - the magnitude of the difference or relationship you're testing for. Larger effect sizes increase the power of a test because they create a larger signal relative to the noise, making it easier to detect an effect if one exists.

Discuss it

In what kind of scenario is the Central Limit Theorem used?

It's used only when dealing with a uniform distribution.
It's used to determine whether an event will occur.
It's used to predict the future.
It's used when we want to make inferences about a population based on a sample.

The Central Limit Theorem (CLT) is often used in scenarios where we are interested in the average outcome of a large number of independent or nearly independent events. This is commonly the case when we are making inferences about a population based on a sample.

Discuss it