What does a residual plot tell us about the fit of the model?

  • It indicates how well the model's predictions match the actual data
  • It indicates the variance of the residuals
  • It shows the correlation between the dependent and independent variables
  • It shows the relationship between the dependent and independent variables
A residual plot shows the residuals on the y-axis and the independent variable on the x-axis. If the points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate.

Can PCA be used for both supervised and unsupervised learning?

  • No
  • Only for supervised learning
  • Only for unsupervised learning
  • Yes
No, PCA is a technique for unsupervised learning. It does not use any class label information in its algorithm, making it unsupervised. However, the transformed dataset from PCA can be used for subsequent supervised learning tasks.

What is the effect of outliers on PCA?

  • It depends on the distribution of the data
  • They can distort the principal components
  • They enhance the performance of PCA
  • They have no effect on PCA
Outliers can significantly distort the principal components identified by PCA, as they can artificially inflate the variance along their direction. It's generally a good practice to address outliers before applying PCA.

What is the concept of "Type I" error in the context of hypothesis testing?

  • Failing to reject a false null hypothesis
  • Failing to reject a true alternative hypothesis
  • Rejecting a false alternative hypothesis
  • Rejecting a true null hypothesis
A Type I error in hypothesis testing is the incorrect rejection of a true null hypothesis, often signified by the Greek letter alpha (α). In other words, a Type I error happens when the researcher incorrectly concludes that the null hypothesis is false when, in fact, it is true.

What is the difference between a one-sample t-test and a two-sample t-test?

  • All of the above
  • The number of hypotheses being tested
  • The number of samples being compared
  • The type of data being used
The key difference between a one-sample t-test and a two-sample t-test lies in the number of samples being compared. A one-sample t-test compares the mean of a single sample to a known value, while a two-sample t-test compares the means of two different samples.

What is the concept of significance level in hypothesis testing?

  • The amount of data needed to support the alternative hypothesis
  • The difference between the null and alternative hypotheses
  • The probability of rejecting a true null hypothesis
  • The proportion of the sample that supports the null hypothesis
The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true.

What is the Multiplication Rule of Probability primarily used for?

  • To calculate the joint probability of two independent events
  • To calculate the probability of either of two events occurring
  • To divide one probability by another
  • To subtract one probability from another
The Multiplication Rule in probability is used to calculate the joint probability of two independent events. It states that the probability of two independent events both occurring is the product of their individual probabilities.

What is the primary purpose of the Mann-Whitney U test?

  • To calculate the correlation between two variables
  • To compare the means of two independent groups
  • To compare the medians of two independent groups
  • To compare the variances of two independent groups
The Mann-Whitney U test is a nonparametric statistical significance test for determining whether two independent samples were drawn from a population with the same distribution, specifically, it tests the null hypothesis that the medians of two groups are the same.

What is the goal of 'hierarchical' clustering?

  • To create a hierarchy or a tree of clusters
  • To find the centroid of clusters
  • To find the most diverse instances in the dataset
  • To predict the outcome of a new instance
The goal of hierarchical clustering is to create a hierarchy or a tree of clusters. This hierarchy can be visually represented in a dendrogram.

________ is a popular method for cluster analysis that partitions the data into non-hierarchical clusters.

  • DBSCAN
  • Hierarchical
  • K-means
  • PCA
K-means is a popular method for cluster analysis that partitions the data into non-hierarchical clusters. The algorithm iteratively assigns each data point to one of the K clusters based on the feature similarity (distance).