What effect does a high leverage point have on a multiple linear regression model?

It can significantly affect the estimate of the regression coefficients
It does not affect the model
It increases the R-squared value
It leads to homoscedasticity

High leverage points are observations with extreme values on the predictor variables. They can have a disproportionate influence on the estimation of the regression coefficients, potentially leading to a less reliable model.

Discuss it

How does multicollinearity affect the interpretation of regression coefficients?

It has no effect on the interpretation of the coefficients.
It increases the value of the coefficients.
It makes the coefficients less interpretable and reliable.
It makes the coefficients more interpretable and reliable.

Multicollinearity can cause large changes in the estimated regression coefficients for small changes in the data. Hence, it makes the coefficients less reliable and interpretable.

Discuss it

The Wilcoxon Signed Rank Test uses the _______ of differences for ranking.

distributions
magnitudes
nan
signs

The Wilcoxon Signed Rank Test uses the magnitudes of differences for ranking.

Discuss it

The probability of an event A, given that another event B has occurred, is called the ________ probability of A given B.

Conditional
Independent
Joint
Marginal

The probability of an event A, given that another event B has occurred, is called the conditional probability of A given B. It is denoted as P(A

Discuss it

The sum of the squared loadings for a factor (i.e., the column in the factor matrix) which represents the variance in all the variables accounted for by the factor is known as _______ in factor analysis.

communality
eigenvalue
factor variance
total variance

The sum of the squared loadings for a factor (i.e., the column in the factor matrix) which represents the variance in all the variables accounted for by the factor is known as eigenvalue in factor analysis.

Discuss it

When the residuals exhibit a pattern or trend rather than a random scatter, it is a sign of _________.

Autocorrelation
Model misspecification
Overfitting
Underfitting

When the residuals exhibit a pattern or trend rather than a random scatter, it can be a sign of model misspecification, i.e., the model doesn't properly capture the relationship between the predictors and the outcome variable.

Discuss it

The branch of statistics that involves using a sample to draw conclusions about a population is called ________ statistics.

descriptive
inferential
numerical
qualitative

Inferential statistics is the branch of statistics that involves using a sample to draw conclusions about a population. It takes data from a sample and makes inferences about the larger population from which the sample was drawn. For example, inferential statistics might use data from a sample of women to infer something about the mean weight of all women.

Discuss it

What is the primary purpose of factor analysis in data science?

To categorize data
To classify data
To identify underlying variables (factors)
To predict future outcomes

Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. Its primary purpose is to identify the underlying structure and relationships within a set of variables.

Discuss it

How does the 'elbow method' help in determining the optimal number of clusters in K-means clustering?

By calculating the average distance between all pairs of clusters
By comparing the silhouette scores for different numbers of clusters
By creating a dendrogram of clusters
By finding the point in the plot of within-cluster sum of squares where the decrease rate sharply shifts

The elbow method involves plotting the explained variation as a function of the number of clusters and picking the elbow of the curve as the number of clusters to use. This 'elbow' is the point representing the optimal number of clusters at which the within-cluster sum of squares (WCSS) doesn't decrease significantly with each iteration.

Discuss it

The bin width (and thus number of categories or ranges) in a histogram can dramatically affect the ________, skewness, and appearance of the histogram.

Interpretation
Mean
Median
Mode

The bin width and the number of bins in a histogram can dramatically affect the interpretation, skewness, and overall appearance of the histogram. This is because the choice of bin size can influence the level of detail visible in the histogram, potentially either obscuring or highlighting certain patterns in the data.

Discuss it