The normal distribution is also known as the ________ distribution.

Exponential
Gaussian
Poisson
Uniform

The normal distribution is also known as the Gaussian distribution. It is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is bell-shaped.

Discuss it

How does the presence of outliers affect measures of dispersion like range, variance, and standard deviation?

Decreases them
Depends on the values of the outliers
Increases them
No effect

Outliers can greatly affect measures of dispersion like the range, variance, and standard deviation by making them larger. These measures consider the distance of each value from the mean, so an outlier (which is a value that is significantly higher or lower than the other values) can result in a much larger measure of dispersion.

Discuss it

What is the implication of multicollinearity in polynomial regression?

It increases the fit of the model to the training data
It increases the interpretability of the model
It reduces the complexity of the model
It reduces the precision of coefficient estimates

Multicollinearity in polynomial regression can reduce the precision of the coefficient estimates and cause them to be highly sensitive to minor changes in the model. This can lead to unstable and unreliable estimates, making it difficult to interpret the model and infer about the relationships between variables.

Discuss it

The Mann-Whitney U test is used when data is ________, which means it can't be reasonably fit to a normal distribution.

non-parametric
normally distributed
parametric
skewed

The Mann-Whitney U test is a non-parametric test, meaning it can be used when data can't be reasonably fit to a normal distribution.

Discuss it

What is the primary objective of statistics in data science?

Data storage
Data visualization
To make decisions based on data analysis
Web design

The primary goal of statistics in data science is to provide a foundation for decision making based on data analysis. It is a discipline that provides tools and methods to interpret and understand data, answer specific questions, and visualize data in a meaningful way. This field of study is crucial in areas where constructing decisions are essential, such as business strategies, scientific research, policy making, etc.

Discuss it

Does the Central Limit Theorem apply to all distributions?

No, it only applies to normal distributions.
No, it only applies to uniform distributions.
Yes, but only when the sample size is sufficiently large and the distribution has finite variance.
Yes, regardless of the sample size.

The Central Limit Theorem (CLT) applies to the sampling distribution of the mean for a wide range of underlying distributions, provided the sample size is sufficiently large and the underlying distribution has finite variance.

Discuss it

What is 'dendrogram' in hierarchical clustering?

A diagram showing the change in the number of clusters
A graph showing the distribution of clusters
A tree-like diagram that represents the hierarchy of clusters
The center point of a cluster

A dendrogram is a tree-like diagram that is used in hierarchical clustering to represent the hierarchy of clusters. Each join in the dendrogram represents the two clusters merging, and the height of the join is the distance between those clusters.

Discuss it

When a data distribution is skewed, which measure of central tendency is typically the most reliable?

Mean
Median
Mode
nan

The median is usually the most reliable measure of central tendency when a data distribution is skewed. Unlike the mean, the median isn't influenced by extreme values. Therefore, in a skewed distribution, the median generally gives a better idea of the typical value than the mean.

Discuss it

Polynomial regression allows us to model a relationship between the dependent variable and independent variables as a _________.

High
Linear equation
Non-linear equation
Straight line

Polynomial regression allows us to model the relationship between the dependent variable and independent variables as a non-linear equation. This is achieved by raising independent variables to a power, allowing the model to fit more complex data patterns.

Discuss it

How does the sample size affect the power of the Kruskal-Wallis Test?

It depends on the data
Larger sample sizes decrease power
Larger sample sizes increase power
Sample size has no effect on power

Larger sample sizes increase the power of the Kruskal-Wallis Test. Power is the ability of a test to detect a true effect when there is one.

Discuss it