What is the probability of an event that is certain to happen?
- 0
- 0.5
- 1
- The probability is undefined for certain events
The probability of an event that is certain to happen is 1. This is based on the definition of probability as a measure that takes values between 0 and 1, inclusive. An event with a probability of 1 is a sure event.
The optimal number of clusters in K-means clustering is often determined using the ________ method.
- elbow
- foot
- hand
- knee
The optimal number of clusters in K-means clustering is often determined using the elbow method. This involves plotting the explained variation as a function of the number of clusters and picking the elbow of the curve as the number of clusters to use.
The _______ test compares the means of two independent groups.
- Chi-square
- Independent t
- Paired t
- Z
An Independent t-test (or two sample t-test) compares the means of two independent groups.
How does a higher R-squared value impact the inference in multiple linear regression?
- It decreases the number of observations
- It improves the interpretability of the model
- It increases the residuals
- It makes the model more complex
The R-squared value measures the proportion of the variance in the dependent variable that is predictable from the independent variables. A higher R-squared value, closer to 1, implies a higher proportion of variability in the response variable is explained by the predictors, improving the model's interpretability and predictive power.
In multiple linear regression, the __________ test is used to test if a group of variables contributes to the prediction of the response.
- Chi-square test
- F-test
- T-test
- Z-test
The F-test is used in multiple regression to test whether at least one of the predictors' regression coefficient is not equal to zero. In other words, it tests whether the predictors are significant in explaining the response variable.
In a scatter plot, a __________ trend suggests a positive relationship between variables.
- Downward
- Horizontal
- Upward
- Vertical
In a scatter plot, an upward trend suggests a positive relationship between the variables. This means as one variable increases, the other variable also increases.
What is the null hypothesis in an ANOVA test?
- The means of all groups are different
- The means of all groups are equal
- The variances of all groups are different
- The variances of all groups are equal
The null hypothesis in an ANOVA test is that the means of all groups are equal. If the p-value obtained from the ANOVA test is less than the significance level, the null hypothesis is rejected, implying that there is a significant difference between at least two of the group means.
A distribution with a positive ________ has a long tail in the positive direction.
- Kurtosis
- Mean
- Median
- Skewness
A distribution with positive skewness is said to be positively skewed or right-skewed, which means it has a long tail in the positive direction on the number line.
What is the key difference between a discrete and a continuous random variable?
- Discrete variables are predictable, continuous variables are not
- Discrete variables can only take on a countable number of values, continuous variables can take on any value within a certain range
- Discrete variables can take on any value, continuous variables can take on only integer values
- There's no difference between discrete and continuous random variables
Discrete random variables are variables that can only take on a countable number of values, such as integers, while continuous random variables can take on any value within a certain range or interval.
When would you prefer to use the median instead of the mean as a measure of central tendency?
- When the data has outliers
- When the data is in large quantity
- When the data is normally distributed
- When the data is uniformly distributed
The median is preferred over the mean when our data is skewed or has outliers. Outliers can greatly affect the mean and create a distorted view of the data, but the median is not affected by outliers or skewed data. The median is the middle score for a set of data that has been arranged in order of magnitude, making it a better measure when dealing with skewed distributions.