The ________ is the most frequent value in a data set.
- Mean
- Median
- Mode
- nan
The mode is the value that appears most frequently in a data set. A set of data may have one mode, more than one mode, or no mode at all.
In the context of a continuous random variable, the ________ function gives the probability that the variable takes a value less than or equal to a certain value.
- Cumulative Distribution Function
- Probability Density Function
- Probability Mass Function
- Random Function
The Cumulative Distribution Function (CDF) of a random variable is defined as the probability that the variable takes a value less than or equal to a certain value. The difference between discrete and continuous random variables is the way their probabilities are assigned.
What happens if the assumption of homoscedasticity is violated in simple linear regression?
- It has no effect on the regression model
- It makes the regression model more accurate
- It makes the regression model perfectly fit the data
- It makes the standard errors and confidence intervals invalid
Homoscedasticity is the assumption that the variance of the residuals is constant across all levels of the independent variable. If this assumption is violated (a condition known as heteroscedasticity), it can lead to unreliable and inefficient estimates of the standard errors. This, in turn, can make the confidence intervals and hypothesis tests invalid.
The _______ of a confidence interval corresponds to the total area under the curve that is excluded on both sides of the curve.
- Confidence level
- Margin of error
- Population parameter
- Standard error
The margin of error of a confidence interval corresponds to the total area under the curve that is excluded on both sides of the curve. This margin of error determines the width of the confidence interval.
What type of correlation does the Spearman's Rank Correlation test measure?
- Correlation of variances
- Linear correlation
- Monotonic correlation
- Polynomial correlation
Spearman's Rank Correlation test measures monotonic correlation, which indicates whether an increase in one variable will increase or decrease the other variable. It does not require the relationship between the variables to be linear.
Why might you use a non-parametric test over a parametric one?
- The data does not meet the assumptions for a parametric test
- The data follows a normal distribution
- The data has no outliers
- The data set is very large
Non-parametric tests might be used over parametric ones when the data does not meet the assumptions for a parametric test, such as when the data does not follow a normal distribution, when the variances are not equal across groups, or when the data are ordinal or nominal rather than interval or ratio.
What is the shape of a normal distribution?
- Skewed to the left
- Skewed to the right
- Symmetrical bell curve
- Uniform flat shape
The normal distribution, also known as Gaussian distribution, is a continuous probability distribution that has a bell-shaped curve. It is symmetrical around its mean, implying that the data near the mean are more frequent in occurrence than data far from the mean.
What does the 'power of a test' signify in hypothesis testing?
- The probability of correctly rejecting a false null hypothesis
- The probability of incorrectly accepting a true null hypothesis
- The probability of making a Type I error
- The probability of making a Type II error
The power of a statistical test is the probability that it correctly rejects a false null hypothesis. In other words, it is 1 minus the probability of making a Type II error.
The sum of all probabilities in a discrete probability distribution is always ________.
- 0
- 1
- Negative
- Variable
For a discrete random variable, the sum of all probabilities must equal to 1. This is because it represents a complete enumeration of all possible outcomes of the random variable, which together encompass all possibilities.
The probability of correctly rejecting a false null hypothesis is known as the ______ of the test.
- Power
- Size
- Type I error rate
- Type II error rate
The power of a test is the probability that it correctly rejects a false null hypothesis. This is essentially the ability of the test to detect an effect or difference if it truly exists. It's the complement of the Type II error rate (beta).
What is the Central Limit Theorem and why is it important for sampling distributions?
- It guarantees that large samples are always normally distributed
- It says that the sample mean equals the population mean
- It states that every statistic has a normal distribution
- It states that the sampling distribution of a mean will approach normality as the sample size increases
The Central Limit Theorem (CLT) is a fundamental theorem in statistics that states that the sampling distribution of a mean will approach normality as the sample size increases, regardless of the shape of the population distribution. The importance of CLT is that it enables us to make statistical inferences about the population mean based on the properties of the normal distribution.
What is the range of a discrete random variable?
- All negative numbers
- All positive numbers
- All real numbers
- The set of all possible outcomes
The range of a discrete random variable is the set of all possible outcomes or values that the variable can take.