The probability of an event A, given that another event B has occurred, is called the ________ probability of A given B.

Conditional
Independent
Joint
Marginal

The probability of an event A, given that another event B has occurred, is called the conditional probability of A given B. It is denoted as P(A

Discuss it

The sum of the squared loadings for a factor (i.e., the column in the factor matrix) which represents the variance in all the variables accounted for by the factor is known as _______ in factor analysis.

communality
eigenvalue
factor variance
total variance

The sum of the squared loadings for a factor (i.e., the column in the factor matrix) which represents the variance in all the variables accounted for by the factor is known as eigenvalue in factor analysis.

Discuss it

When the residuals exhibit a pattern or trend rather than a random scatter, it is a sign of _________.

Autocorrelation
Model misspecification
Overfitting
Underfitting

When the residuals exhibit a pattern or trend rather than a random scatter, it can be a sign of model misspecification, i.e., the model doesn't properly capture the relationship between the predictors and the outcome variable.

Discuss it

The branch of statistics that involves using a sample to draw conclusions about a population is called ________ statistics.

descriptive
inferential
numerical
qualitative

Inferential statistics is the branch of statistics that involves using a sample to draw conclusions about a population. It takes data from a sample and makes inferences about the larger population from which the sample was drawn. For example, inferential statistics might use data from a sample of women to infer something about the mean weight of all women.

Discuss it

What is the primary purpose of factor analysis in data science?

To categorize data
To classify data
To identify underlying variables (factors)
To predict future outcomes

Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. Its primary purpose is to identify the underlying structure and relationships within a set of variables.

Discuss it

What is the impact of data transformation on the decision to use non-parametric tests?

A suitable data transformation may make it possible to use a parametric test
Data transformation always leads to non-parametric tests
Data transformation always makes data normally distributed
Data transformation does not affect the choice between parametric and non-parametric tests

A suitable data transformation may make it possible to use a parametric test instead of a non-parametric test. Transformations can help to stabilize variances, normalize the data, or linearize relationships between variables, allowing for the use of parametric tests that might have more statistical power.

Discuss it

If two events are independent, what is the conditional probability of one given the other?

0
1
Equal to the probability of the given event
Undefined

If two events are independent, the conditional probability of one event given the other is simply the probability of the event itself. This is because in independent events, the occurrence of one event does not affect the occurrence of the other event.

Discuss it

In what situation could a "Type II" error occur during hypothesis testing?

When the alternative hypothesis is false
When the null hypothesis is false but not rejected
When the null hypothesis is rejected
When the null hypothesis is true

A Type II error, also known as a false negative, occurs when the null hypothesis is false, but we fail to reject it.

Discuss it

Under what circumstances can the mode of a data set be irrelevant or misleading?

When the data is continuous
When the data set is large
When the data set is small
When there are multiple modes

The mode can be misleading or irrelevant especially with continuous data. Since the mode is the most frequently occurring value, with continuous data the frequency of each value is often the same (i.e., 1), hence it becomes difficult to define a mode in a traditional sense.

Discuss it

A Variance Inflation Factor (VIF) greater than 5 indicates a high degree of _______ among the predictors.

correlation
distribution
multicollinearity
variance

A VIF greater than 5 is often taken as an indication of high multicollinearity among the predictors in a regression model. This could lead to imprecise and unreliable estimates of the regression coefficients.

Discuss it