What is the goal of 'hierarchical' clustering?
- To create a hierarchy or a tree of clusters
- To find the centroid of clusters
- To find the most diverse instances in the dataset
- To predict the outcome of a new instance
The goal of hierarchical clustering is to create a hierarchy or a tree of clusters. This hierarchy can be visually represented in a dendrogram.
What is the primary purpose of the Mann-Whitney U test?
- To calculate the correlation between two variables
- To compare the means of two independent groups
- To compare the medians of two independent groups
- To compare the variances of two independent groups
The Mann-Whitney U test is a nonparametric statistical significance test for determining whether two independent samples were drawn from a population with the same distribution, specifically, it tests the null hypothesis that the medians of two groups are the same.
What is the Multiplication Rule of Probability primarily used for?
- To calculate the joint probability of two independent events
- To calculate the probability of either of two events occurring
- To divide one probability by another
- To subtract one probability from another
The Multiplication Rule in probability is used to calculate the joint probability of two independent events. It states that the probability of two independent events both occurring is the product of their individual probabilities.
What is the concept of significance level in hypothesis testing?
- The amount of data needed to support the alternative hypothesis
- The difference between the null and alternative hypotheses
- The probability of rejecting a true null hypothesis
- The proportion of the sample that supports the null hypothesis
The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true.
What is the difference between a one-sample t-test and a two-sample t-test?
- All of the above
- The number of hypotheses being tested
- The number of samples being compared
- The type of data being used
The key difference between a one-sample t-test and a two-sample t-test lies in the number of samples being compared. A one-sample t-test compares the mean of a single sample to a known value, while a two-sample t-test compares the means of two different samples.
How do non-parametric statistical methods deal with outliers compared to parametric methods?
- They are more robust to outliers
- They are more sensitive to outliers
- They don't handle outliers
- They eliminate outliers before analysis
Non-parametric statistical methods are more robust to outliers compared to parametric methods. This is because non-parametric tests often use medians and ranks, which are less sensitive to extreme values, compared to means which are used in parametric tests.
The number of trials in a binomial distribution is ________, whereas in a Poisson distribution, it's theoretically infinite.
- dependent on the sample size
- dependent on the success rate
- fixed
- infinite
In a binomial distribution, the number of trials is fixed (a fixed number of independent trials are considered), whereas in a Poisson distribution, theoretically, an infinite number of events can occur.
The closer the correlation coefficient is to __ or __, the stronger the correlation.
- -1 or 1
- 0
- 0.5 or -0.5
- 1 or 2
The correlation coefficient measures the strength and direction of a linear relationship between two variables. It ranges from -1 to 1, with -1 indicating a perfect negative correlation, 1 indicating a perfect positive correlation, and 0 indicating no correlation. Thus, the closer the correlation coefficient is to -1 or 1, the stronger the correlation.
A _______ test is used when the population variance is known.
- Chi-square
- F
- T
- Z
A Z-test is used when the population variance is known. It's based on the standard normal distribution.
How do you interpret the coefficients of interaction terms in a regression model?
- The interaction coefficient indicates the effect of one variable at a specific level of the other variable
- The interaction coefficient indicates the joint effect of the variables, independent of their individual effects
- The interaction coefficient is a measure of the correlation between the variables
- The interaction coefficient represents the average effect of two variables
The interaction coefficient in a regression model indicates the effect of one independent variable on the dependent variable for a specific level of another independent variable. It signifies that the effect of one variable depends on the value of another variable, thus capturing the interaction effect between the two variables.
________ data is data that can be organized or ranked in a specific order.
- Continuous
- Discrete
- Nominal
- Ordinal
Ordinal data is a type of categorical data that can be organized or ranked in a specific order. For example, customer satisfaction ratings (satisfied, neutral, dissatisfied) can be organized from most to least satisfied.
The correlation coefficient is denoted by the letter __.
- C
- P
- R
- S
The correlation coefficient is often denoted by the letter 'R'. In the case of Pearson's correlation, it's specifically denoted as 'r'. It measures the degree of relationship between two variables.