How do outliers affect the skewness of a dataset?

Depends on the direction of the outliers
They decrease skewness
They do not affect skewness
They increase skewness

Outliers can have a big impact on the skewness of a dataset. If the outlier is greater than the rest of the data, it will pull the skewness positive, and if it is less than the rest of the data, it will pull the skewness negative.

Discuss it

How does sample size impact the Mann-Whitney U test?

Larger sample sizes make the test less reliable
Larger sample sizes make the test more reliable
Only equal sample sizes can be used in the test
Sample size has no impact on the test

Larger sample sizes make the Mann-Whitney U test more reliable. As with most statistical tests, a larger sample size increases the power of the test, which is the probability that it will correctly reject a false null hypothesis.

Discuss it

In which situations is it appropriate to use the Wilcoxon Signed Rank Test?

When comparing the means of two independent groups
When comparing the medians of two related groups
When comparing the modes of two related groups
nan

The Wilcoxon Signed Rank Test is appropriate when comparing the medians of two related groups.

Discuss it

A ________ is a graphical representation of the distribution of a dataset, typically used to visualize the frequency of data items in successive numerical intervals.

Bar plot
Histogram
Line graph
Pie chart

A histogram is a graphical representation of the distribution of a dataset, typically used to visualize the frequency of data items in successive numerical intervals. The data range is divided into a series of intervals or 'bins' and the number of data points falling within each bin is represented by the height of a bar.

Discuss it

When a distribution has a long tail on the right, it is said to be ________ skewed.

Negatively
Normally
Positively
Uniformly

When a distribution has a long tail on the right, it is said to be positively skewed or right-skewed. In a positively skewed distribution, the mean is typically greater than the median, which is greater than the mode.

Discuss it

A random variable that takes a finite or countably infinite number of values is known as a ________ random variable.

Continuous
Dependent
Discrete
Normal

A discrete random variable is one which may take on only a countable number of distinct values and thus can be quantified. For example, you can count the change in your pocket. You can count the money in your bank account. You can count the number of heads in 50 coin tosses. These are all examples of discrete random variables.

Discuss it

A situation where two or more independent variables in a regression model are highly correlated is known as ________.

autocorrelation
heteroscedasticity
homoscedasticity
multicollinearity

Multicollinearity refers to a situation in which two or more independent variables in a regression model are highly linearly related. This can lead to unstable estimates of the regression coefficients and make it difficult to assess the effect of independent variables on the dependent variable.

Discuss it

How does multiple linear regression differ from simple linear regression?

Multiple linear regression cannot handle categorical variables, simple linear regression can
Multiple linear regression is not suitable for prediction tasks
Multiple linear regression requires a larger dataset
Multiple linear regression uses multiple independent variables, simple linear regression only uses one

The main difference between simple and multiple linear regression is the number of independent variables. While simple linear regression uses only one independent variable to predict the dependent variable, multiple linear regression uses two or more independent variables to predict the dependent variable.

Discuss it

What does the residual plot tell you in a simple linear regression analysis?

It shows the distribution of residuals and can help identify non-linearity, unequal error variances, and outliers
It shows the distribution of the independent variable
It shows the relationship between the dependent and independent variables
It tells you the strength of the correlation

A residual plot is a graph that shows the residuals on the vertical axis and the independent variable on the horizontal axis. It helps to identify non-linearity, unequal error variances (heteroscedasticity), and outliers. If the points in a residual plot are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate.

Discuss it

Two events are said to be ________ if the occurrence of one does not affect the probability of the occurrence of the other.

Dependent
Exhaustive
Independent
Mutually exclusive

Two events are said to be "independent" if the occurrence of one does not affect the probability of the occurrence of the other. For example, if you toss a coin twice, the outcome of the first toss doesn't affect the outcome of the second toss, so the two events are independent.

Discuss it