Which method is commonly used to find the best fitting line in simple linear regression?
- K-means clustering
- Neural network
- The method of least squares
- The method of maximum likelihood
The method of least squares is commonly used to find the best fitting line in simple linear regression. It minimizes the sum of the squares of the residuals (the vertical distances between the observed and predicted values).
What is a Type II error in the context of hypothesis testing?
- Accepting a false null hypothesis
- Accepting a true null hypothesis
- Rejecting a false null hypothesis
- Rejecting a true null hypothesis
A Type II error occurs when the null hypothesis is false, but it is not rejected. It is also known as a "false negative" result.
The ________ in a Chi-square test for independence represents the sum of the squared differences between observed and expected frequencies, divided by the expected frequencies.
- Chi-square statistic
- correlation coefficient
- p-value
- standard deviation
The Chi-square statistic in a Chi-square test for independence represents the sum of the squared differences between observed and expected frequencies, divided by the expected frequencies. This statistic measures the degree to which the observed frequencies deviate from the frequencies that would be expected under the null hypothesis of independence.
How do you calculate the expected frequency in a Chi-square test?
- By calculating the mode of the observed frequencies
- By dividing the total frequency by the number of categories
- By multiplying the row total and column total and dividing by the total number of observations
- By taking the mean of the observed frequencies
In a Chi-square test, the expected frequency for each cell in the contingency table is calculated by multiplying the row total and column total and then dividing by the total number of observations.
Pearson's Correlation Coefficient ranges from ________ to ________.
- -1 to 1
- -2 to 2
- 0 to 1
- 0 to 2
The Pearson Correlation Coefficient measures the linear relationship between two variables and can range from -1 to 1. A value of -1 means there is a perfect negative correlation, while a value of 1 means there is a perfect positive correlation.
What is the difference between nominal and ordinal data?
- Nominal data can be ordered
- Nominal data cannot be ordered
- Ordinal data can be ordered
- Ordinal data cannot be ordered
Nominal and ordinal data are both types of categorical data. The key difference between the two is that while nominal data cannot be ordered or ranked, ordinal data can. Nominal data represents simple categories or groups with no order or priority. Examples include colors or city names. Ordinal data, on the other hand, represents categories that can be ranked or ordered. Examples include Likert scale data (e.g., a five-point scale from "strongly disagree" through "strongly agree"), educational level (high school, BA, MA, PhD), etc.
What is the purpose of a residual plot in multiple linear regression?
- All of the above
- To check for independence of errors
- To check for linearity
- To check for normality
A residual plot in multiple linear regression is used to check various assumptions of the model. It can help visualize if the residuals are randomly scattered (checking for independence), whether they have a constant variance (homoscedasticity), and if they exhibit any noticeable patterns (checking for linearity and normality).
What kind of data is best suited for the Wilcoxon Signed Rank Test?
- Both Continuous and Ordinal data
- Continuous data
- Nominal data
- Ordinal data
The Wilcoxon Signed Rank Test is best suited for continuous and ordinal data. It is a non-parametric test that can handle both types of data.
What is the relationship between a cumulative distribution function and a probability density function?
- The cumulative distribution function is the integral of the probability density function
- The probability density function is the integral of the cumulative distribution function
- There is no relationship between them
- They are the same thing
The cumulative distribution function (CDF) and the probability density function (PDF) are closely related. For a continuous random variable, the CDF is the integral of the PDF. This means that the PDF is the derivative of the CDF.
How do bias and variability affect sampling methods?
- Bias and variability always increase the accuracy of estimates
- Bias and variability are unrelated concepts in statistics
- Bias increases the spread of a data distribution, and variability leads to consistent errors
- Bias leads to consistent errors in one direction, and variability refers to the spread of a data distribution
Bias and variability are two key concepts in sampling methods. Bias refers to consistent, systematic errors that lead to an overestimate or underestimate of the true population parameter. Variability refers to the spread or dispersion of a data distribution, or in this context, the sampling distribution. Lower bias and lower variability are generally desirable to increase the accuracy and precision of estimates.