What information does a box plot provide about a dataset?

The correlation between variables
The exact values of all data points
The mean and standard deviation
The minimum, first quartile, median, third quartile, and maximum

A box plot (also known as a whisker plot) displays a summary of the distribution of data values, including the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. The 'box' represents the interquartile range (the distance between Q1 and Q3), and the 'whiskers' represent the range of the data. Outliers may also be plotted as individual points.

Discuss it

Why is sampling without replacement often used in practice?

It allows for the inclusion of every individual in the population
It ensures that each selection is independent
It guarantees that each sample is unique
It is easier than sampling with replacement

Sampling without replacement is often used in practice because it guarantees that each sample is unique. This means that once an individual is selected, it cannot be chosen again for the same sample. This method can help reduce bias and ensure a more diverse and representative sample.

Discuss it

What is the name of the rule that states the probability of the sum of all possible outcomes of an experiment is 1?

Bayes' Theorem
Law of Large Numbers
Law of Total Probability
Rule of Complementary Events

The Law of Total Probability states that the sum of the probabilities of all possible outcomes of an experiment is 1. This rule is fundamental to probability theory and provides a way to calculate the probability of complex events by breaking them down into simpler, mutually exclusive events.

Discuss it

What are the two types of factor analysis used in data science?

Confirmatory and explanatory
Exploratory and confirmatory
Inferential and descriptive
Predictive and explanatory

Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA) are the two types of factor analysis commonly used in data science. EFA is used when the structure of the underlying factors is not known, while CFA is used when the researcher has specific hypotheses about the factor structure.

Discuss it

The __________ Theorem states that with a large enough sample size, the sampling distribution of the mean will be normally distributed.

Central Limit
Law of Large Numbers
Regression
Variance

The Central Limit Theorem is a fundamental concept in probability theory and statistics. The theorem states that, as the size of a sample is increased, the sampling distribution of the mean will be closer to a normal distribution. This happens no matter the shape of the population distribution.

Discuss it

A one-way ANOVA compares group(s), while a two-way ANOVA compares group(s).

one; two
three or more; two or more
two; three
two; two or more

A one-way ANOVA compares the means of three or more unrelated groups, while a two-way ANOVA compares the means of two or more groups that are split on two independent variables.

Discuss it

What are the potential disadvantages of using non-parametric statistical methods?

They always give inaccurate results
They can be less powerful than parametric tests when assumptions for parametric tests are met
They cannot be used for certain types of data
They cannot handle large data sets

Non-parametric statistical methods can be less powerful than parametric tests when the assumptions for the parametric tests are met. This is because they use less information (e.g., they use ranks rather than actual values). Therefore, if the data does meet the assumptions of parametric tests, parametric tests might be preferred.

Discuss it

Hypothesis testing in statistics is a way to test the validity of a claim that is made about a _______.

Dataset
Population
Sample
Statistic

In statistics, hypothesis testing is typically used to test claims about a population parameter, not a sample statistic, dataset, or an individual statistic.

Discuss it

The _______ Information Criterion is a measure used in model selection that takes into account the goodness of fit and the simplicity of the model.

Akaike
Bayesian
Pearson
Spearman

The Akaike Information Criterion (AIC) balances goodness of fit with model simplicity by including a penalty for the number of parameters in the model. This discourages overfitting.

Discuss it

What is the skewness value for a perfect normal distribution?

-1
0
1
It varies

For a perfect normal distribution, the skewness value is zero. This is because a normal distribution is perfectly symmetrical, so its left and right tails are identical.

Discuss it

What information does a box plot provide about a dataset?

Why is sampling without replacement often used in practice?

What is the name of the rule that states the probability of the sum of all possible outcomes of an experiment is 1?

What are the two types of factor analysis used in data science?

The __________ Theorem states that with a large enough sample size, the sampling distribution of the mean will be normally distributed.

A one-way ANOVA compares ________ group(s), while a two-way ANOVA compares ________ group(s).

What are the potential disadvantages of using non-parametric statistical methods?

Hypothesis testing in statistics is a way to test the validity of a claim that is made about a _______.

The _______ Information Criterion is a measure used in model selection that takes into account the goodness of fit and the simplicity of the model.

What is the skewness value for a perfect normal distribution?

A one-way ANOVA compares group(s), while a two-way ANOVA compares group(s).