What does a residual plot tell us about the fit of the model?
- It indicates how well the model's predictions match the actual data
- It indicates the variance of the residuals
- It shows the correlation between the dependent and independent variables
- It shows the relationship between the dependent and independent variables
A residual plot shows the residuals on the y-axis and the independent variable on the x-axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate.
Can PCA be used for both supervised and unsupervised learning?
- No
- Only for supervised learning
- Only for unsupervised learning
- Yes
No, PCA is a technique for unsupervised learning. It does not use any class label information in its algorithm, making it unsupervised. However, the transformed dataset from PCA can be used for subsequent supervised learning tasks.
What is the effect of outliers on PCA?
- It depends on the distribution of the data
- They can distort the principal components
- They enhance the performance of PCA
- They have no effect on PCA
Outliers can significantly distort the principal components identified by PCA, as they can artificially inflate the variance along their direction. It's generally a good practice to address outliers before applying PCA.
What is the concept of "Type I" error in the context of hypothesis testing?
- Failing to reject a false null hypothesis
- Failing to reject a true alternative hypothesis
- Rejecting a false alternative hypothesis
- Rejecting a true null hypothesis
A Type I error in hypothesis testing is the incorrect rejection of a true null hypothesis, often signified by the Greek letter alpha (α). In other words, a Type I error happens when the researcher incorrectly concludes that the null hypothesis is false when, in fact, it is true.
When can we apply the Chi-square test for goodness of fit?
- When the data are continuously distributed
- When the data are normally distributed
- When we have categorical data and want to see if it follows a specific distribution
- When we want to compare means
The Chi-square test for goodness of fit is used when we have categorical data and we want to see if the data follows a specific distribution.
How does Spearman's Rank Correlation react to outliers as compared to Pearson's correlation?
- Both are equally sensitive to outliers
- Less sensitive to outliers
- More sensitive to outliers
- Neither is sensitive to outliers
Spearman's Rank Correlation is less sensitive to outliers than Pearson's correlation. This is because Spearman's correlation is based on rank orders rather than raw data values, making it more robust against outliers.
What does a null hypothesis represent in statistical testing?
- A condition of no effect or no difference
- A specific outcome of the experiment
- An effect or difference exists
- The sample size is large enough for the test
The null hypothesis is a statement in statistical inference which asserts that there is no significant difference between the set of observed and expected data.
How does the rate parameter affect the shape of a Poisson distribution?
- All of the above
- It determines the kurtosis of the distribution
- It determines the skewness of the distribution
- It does not affect the shape of the distribution
The rate parameter (lambda) in a Poisson distribution determines the average rate of occurrence per interval, which directly affects the shape of the distribution. Higher lambda values result in distributions that are less skewed to the right.
How does the Central Limit Theorem relate to the use of Z-tests?
- It allows for the assumption that the sample mean distribution is normally distributed
- It enables the calculation of the sample standard deviation
- It increases the power of the test
- It reduces the impact of outliers in the sample
The Central Limit Theorem states that, with a large enough sample size, the distribution of the sample mean will be approximately normally distributed. This allows us to use Z-tests even when the population is not normally distributed.
What is the difference between a one-sample t-test and a two-sample t-test?
- All of the above
- The number of hypotheses being tested
- The number of samples being compared
- The type of data being used
The key difference between a one-sample t-test and a two-sample t-test lies in the number of samples being compared. A one-sample t-test compares the mean of a single sample to a known value, while a two-sample t-test compares the means of two different samples.