The ________ of a random variable is the sum of the probabilities of all possible outcomes.
- Distribution
- Expected value
- Mean
- Variance
The "expected value" of a random variable is the sum of all possible values it can take, each multiplied by the probability of that outcome. It gives us the mean or average value of the random variable and is a fundamental concept in probability theory and statistics.
What assumptions are made when conducting an ANOVA test?
- Independent observations, no outliers, equal sample sizes
- Independent observations, normal distribution of variables, no outliers
- Independent observations, normally distributed residuals, homoscedasticity
- No missing data, normally distributed residuals, no outliers
ANOVA makes three key assumptions: 1) Observations are independent. 2) Residuals (the differences between the observed and predicted values) are normally distributed. 3) The variance of the residuals is the same for all groups (homoscedasticity).
What does a scatter plot with points clustered tightly around a line indicate?
- A strong correlation
- A weak correlation
- An undefined correlation
- No correlation
When points in a scatter plot are clustered tightly around a line, it indicates a strong correlation between the two variables. The line is typically a line of best fit or regression line.
The _________ states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger—no matter what the shape of the population distribution.
- Central Limit Theorem
- Law of Large Numbers
- Probability Rule
- Sampling Distribution
The Central Limit Theorem states that the sampling distribution of the sample means approaches a normal distribution as the sample size gets larger—no matter what the shape of the population distribution. This allows us to apply normal probability calculations to situations that might not initially seem appropriate for them.
________ is a measure of asymmetry of a probability distribution.
- Mean
- Median
- Mode
- Skewness
Skewness is a measure of the asymmetry of a probability distribution about its mean. It quantifies the direction and extent of skew (departure from horizontal symmetry) in the data.
What is the difference between frequentist and Bayesian statistics?
- Bayesians use Bayes' theorem, frequentists do not
- Frequentists believe in probability and Bayesians do not
- Frequentists interpret probability as a long-run frequency, Bayesians as a degree of belief
- There is no difference
Frequentist statistics interprets probability as the long-run frequency of events, whereas Bayesian statistics interprets probability as a degree of belief or as subjective probability. The Bayesian approach uses Bayes' theorem to update probabilities based on new data.
What are confidence intervals used for in statistics?
- To determine the median of a sample
- To determine the spread of data in a sample
- To estimate the population parameter
- To find the mean of a sample
Confidence intervals are used to estimate the range within which the true population parameter lies with a certain degree of confidence. They do not specifically determine the mean, median, or spread of a sample.
How does skewness affect the mean and median of a dataset?
- In a positively skewed distribution, the mean is greater than the median
- In a positively skewed distribution, the median is greater than the mean
- Skewness affects only the mean
- Skewness does not affect the mean and median
In a positively skewed distribution, the mean is greater than the median as the mean gets pulled in the direction of the skew (towards the right tail). In a negatively skewed distribution, the mean is less than the median as the mean gets pulled towards the left tail.
What can cause the Chi-square test for goodness of fit to be biased?
- Having a large sample size
- Having a small sample size
- Having equal expected frequencies in all categories
- Having normally distributed data
A small sample size can lead to unreliable results in a Chi-square test for goodness of fit. This can be due to the fact that the test requires a sufficient number of observations in each category to provide a reliable estimate of the distribution.
In _________ sampling, the population is divided into subgroups, and a simple random sample is drawn from each subgroup.
- Cluster
- Simple Random
- Stratified
- Systematic
In stratified sampling, the population is divided into non-overlapping groups, or strata, such as age groups, income levels, or gender. Then, a simple random sample is taken from each stratum. Stratified random sampling can provide more precise estimates if the strata are relevant to the characteristic of interest.