What is the difference between nominal and ordinal data?

Nominal data can be ordered
Nominal data cannot be ordered
Ordinal data can be ordered
Ordinal data cannot be ordered

Nominal and ordinal data are both types of categorical data. The key difference between the two is that while nominal data cannot be ordered or ranked, ordinal data can. Nominal data represents simple categories or groups with no order or priority. Examples include colors or city names. Ordinal data, on the other hand, represents categories that can be ranked or ordered. Examples include Likert scale data (e.g., a five-point scale from "strongly disagree" through "strongly agree"), educational level (high school, BA, MA, PhD), etc.

Discuss it

What is the purpose of a residual plot in multiple linear regression?

All of the above
To check for independence of errors
To check for linearity
To check for normality

A residual plot in multiple linear regression is used to check various assumptions of the model. It can help visualize if the residuals are randomly scattered (checking for independence), whether they have a constant variance (homoscedasticity), and if they exhibit any noticeable patterns (checking for linearity and normality).

Discuss it

Is the Kruskal-Wallis Test used for comparing two groups or more than two groups?

Both
More than two groups
Neither
Two groups

The Kruskal-Wallis Test is used for comparing more than two groups.

Discuss it

When should you use the Spearman’s Rank Correlation test?

When data is normally distributed
When data is ordinal or not normally distributed
When data is perfectly ranked
When the correlation is linear

The Spearman’s Rank Correlation test should be used when data is ordinal or not normally distributed. It is a non-parametric test that does not require the assumption of normal distribution.

Discuss it

How does the sample size impact the result of a Z-test?

Larger sample sizes can produce more precise estimates, reducing the standard error
Larger sample sizes increase the likelihood of a Type I error
Sample size has no impact on the results of a Z-test
nan

Larger sample sizes generally allow for more precise estimates of population parameters. This reduces the standard error, making the z-score larger and potentially leading to stronger evidence against the null hypothesis in a Z-test.

Discuss it

What is the alternative hypothesis in the context of statistical testing?

A condition of no effect or no difference
A specific outcome of the experiment
An effect or difference exists
The sample size is large enough for the test

The alternative hypothesis is the hypothesis used in hypothesis testing that is contrary to the null hypothesis. It is usually taken to be that the observations are the result of a real effect.

Discuss it

How do bias and variability affect sampling methods?

Bias and variability always increase the accuracy of estimates
Bias and variability are unrelated concepts in statistics
Bias increases the spread of a data distribution, and variability leads to consistent errors
Bias leads to consistent errors in one direction, and variability refers to the spread of a data distribution

Bias and variability are two key concepts in sampling methods. Bias refers to consistent, systematic errors that lead to an overestimate or underestimate of the true population parameter. Variability refers to the spread or dispersion of a data distribution, or in this context, the sampling distribution. Lower bias and lower variability are generally desirable to increase the accuracy and precision of estimates.

Discuss it

What is the Central Limit Theorem (CLT)?

It states that the probability of an event is the product of the probabilities of independent events.
It states that the sum of a large number of random variables, each with finite mean and variance, will approximate a normal distribution.
It's a rule which states that the probability of a compound event is the product of the probabilities of the independent events.
It's the theorem which states that probabilities are equal to the number of favorable outcomes divided by the total outcomes.

The Central Limit Theorem (CLT) is a statistical theory that states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population.

Discuss it

How does the type of data affect the choice of statistical analysis methods?

It dictates the statistical tests that can be applied
It doesn't affect the choice
It has no influence
It suggests the kind of visualizations that can be used

The type of data directly affects the choice of statistical analysis methods. Certain types of data require specific statistical tests. For example, nominal data may be analyzed using a chi-square test, while continuous data may be analyzed using a t-test or ANOVA.

Discuss it

How do you decide on the number of Principal Components to retain during PCA?

All of the above
By calculating the cumulative explained variance
By checking the eigenvalues
By using the elbow method

The number of principal components to retain can be decided in several ways: checking the eigenvalues (typically, components with eigenvalues greater than 1 are retained), using the elbow method (looking for a clear "elbow" in the scree plot), or calculating the cumulative explained variance (often, enough components to explain at least 95% of the variance are retained).

Discuss it