How does stratified random sampling differ from simple random sampling?
- Stratified random sampling always involves larger sample sizes than simple random sampling
- Stratified random sampling involves dividing the population into subgroups and selecting individuals from each subgroup
- Stratified random sampling is the same as simple random sampling
- Stratified random sampling only selects individuals from a single subgroup
Stratified random sampling differs from simple random sampling in that it first divides the population into non-overlapping groups, or strata, based on specific characteristics, and then selects a simple random sample from each stratum. This can ensure that each subgroup is adequately represented in the sample, which can increase the precision of estimates.
Why are bar plots commonly used in data analysis?
- To compare the frequency of categorical variables
- To show the change of a variable over time
- To show the distribution of a single variable
- To show the relationship between two continuous variables
Bar plots are commonly used in data analysis to compare the frequency, count, or proportion of categorical variables. Each category is represented by a separate bar, and the length or height of the bar represents its corresponding value.
What does inference in multiple linear regression primarily involve?
- Calculating the mean of the residuals
- Creating the scatter plot
- Drawing the best fit line
- Interpreting the coefficients
Inference in multiple linear regression primarily involves interpreting the coefficients of the model, which represent the expected change in the response variable for each one-unit change in the respective explanatory variable, assuming all other variables are held constant.
What are the degrees of freedom in a Chi-square test for goodness of fit?
- The number of categories minus 1
- The number of categories plus 1
- The number of observations minus 1
- The number of observations plus 1
In a Chi-square test for goodness of fit, the degrees of freedom are calculated as the number of categories minus 1.
If events A and B are independent, what is the P(A ∩ B)?
- P(A) * P(B)
- P(A) + P(B)
- P(A) - P(B)
- P(A) / P(B)
If events A and B are independent, the probability of both events occurring (P(A ∩ B)) is the product of their individual probabilities (P(A) * P(B)). This is a direct result of the Multiplication Rule for independent events.
In what type of data distribution do the mean, median, and mode coincide?
- Negatively skewed distribution
- Normal distribution
- Positively skewed distribution
- Uniform distribution
In a normal distribution, the mean, median, and mode all coincide, meaning they have the same value. A normal distribution is symmetrical, with the majority of observations clustering around the central peak; therefore, the mean, median, and mode all fall at the center.
What does a histogram represent in data visualization?
- The change of a variable over time
- The correlation between two variables
- The frequency distribution of a single variable
- The relationship between three variables
A histogram is a graphical representation of the distribution of a dataset. It is an estimate of the probability distribution of a continuous variable. To construct a histogram, the first step is to "bin" the range of values, i.e., divide the entire range of values into a series of intervals, and then count how many values fall into each interval.
A statistical test has more power to detect an effect if the effect size is ______.
- Equal to the sample size
- Large
- Small
- Unchanged
The power of a test is influenced by the effect size - the magnitude of the difference or relationship you're testing for. Larger effect sizes increase the power of a test because they create a larger signal relative to the noise, making it easier to detect an effect if one exists.
How does the height of a bar in a histogram relate to the frequency of the data?
- It has no relation with the frequency
- It represents the cumulative frequency
- It represents the mean frequency
- It represents the relative frequency
The height of a bar in a histogram represents the frequency (or relative frequency) of data for that particular bin. This means the taller the bar, the more data falls into that specific interval.
What is the purpose of 'normalization' or 'standardization' in the pre-processing step of cluster analysis?
- To decrease the number of clusters
- To ensure that all features contribute equally to the distance calculation
- To handle missing values
- To increase the computational complexity
Normalization or standardization ensures that all features contribute equally to the final distance calculation, regardless of their original scale. Without this step, features with larger scales would dominate the distance calculation, potentially leading to misleading clusters.