What does polynomial regression allow you to model?
- Correlations
- Data distribution
- Non-linear relationships
- Relationships between variables
Polynomial regression allows modeling of non-linear relationships. Unlike linear regression that models relationships between variables as a straight line, polynomial regression models relationships as curves, better capturing relationships that change in direction at different levels of the independent variables.
What does it mean if the Chi-square test for goodness of fit is statistically significant?
- The observed data and theoretical distribution are negatively correlated
- The observed data and theoretical distribution are positively correlated
- The observed data differs significantly from what we would expect if it followed the theoretical distribution
- The observed data fits the theoretical distribution perfectly
If the Chi-square test for goodness of fit is statistically significant, this means that the observed data differs significantly from what we would expect if the data followed the theoretical distribution.
In multiple regression, model selection aims to choose the most _______ model that best predicts the response variable.
- complex
- overfit
- parsimonious
- simple
In multiple regression, model selection aims to choose the most parsimonious model that best predicts the response variable. A parsimonious model is a model that accomplishes the desired level of explanation or prediction with as few predictor variables as possible.
What do we call an experiment in probability theory?
- A process that produces outcomes
- A statistical analysis
- A test of a hypothesis
- An observation of a random variable
In probability theory, an experiment refers to a process or procedure that produces outcomes. The outcomes depend on chance or randomness. For example, tossing a coin or rolling a die is considered a random experiment because the outcome is not certain but depends on chance.
The Central Limit Theorem allows us to make inferences about the ________ based on sample data.
- Data Distribution
- Hypothesis
- Population
- Sample
The Central Limit Theorem allows us to make inferences about the Population based on sample data. It states that, with a large enough sample size, the sample mean will be normally distributed around the population mean. This enables us to estimate the parameters of the population and make predictions based on the sample data.
In a Chi-square test, the null hypothesis is that the two variables are ________.
- causally related
- correlated
- dependent
- independent
In a Chi-square test for independence, the null hypothesis is always that the two variables are independent. This means that knowing the value of one variable does not help predict the value of the other variable.
How does a p-value relate to the significance level in a hypothesis test?
- A higher p-value indicates a more significant result
- A smaller p-value means the result is less likely to have occurred by chance
- The p-value does not depend on the significance level
- The p-value is the probability that the null hypothesis is true
The p-value is the probability of obtaining a result as extreme as, or more extreme than, the result actually obtained, assuming the null hypothesis is true. If the p-value is smaller than the significance level (alpha), we reject the null hypothesis.
One common feature of non-parametric methods is the use of ________ rather than raw data points.
- averages
- frequencies
- medians
- ranks
One common feature of non-parametric methods is the use of ranks rather than raw data points, which makes them more robust to outliers and does not require the assumption of a specific distribution.
The ________ of a box plot are used to indicate variability outside the upper and lower quartiles.
- Bars
- Outliers
- Tails
- Whiskers
The whiskers of a box plot are used to indicate the variability of the data outside the upper and lower quartiles. They often extend to the maximum and minimum data values (excluding outliers), or 1.5 times the interquartile range.
The first Principal Component is the direction in the dataset that captures the ______ variance in the data.
- least
- median
- most
- random
The first Principal Component is the direction (or vector) in the multidimensional space along which the data varies the most, so it captures the most variance in the data.