What are the effects of outliers on the results of a hypothesis testing procedure?

All of these
Can affect the statistical significance
Can lead to type I errors
Can lead to type II errors

Outliers can affect the results of a hypothesis testing procedure in several ways. They can lead to Type I or Type II errors, and can also affect the statistical significance of the test, thereby potentially leading to incorrect conclusions.

Discuss it

What role does bin size play in outlier detection when using a histogram?

Bin size can influence outlier detection
Bin size does not influence outlier detection
Larger bin size always increases outlier visibility
Smaller bin size always increases outlier visibility

The bin size in a histogram can influence the visibility of outliers. Depending on how the data is binned, an outlier may or may not be clearly visible.

Discuss it

The mode is the only measure of central tendency that can be used for _____ data.

Categorical
Interval
Numerical
Ordinal

The "Mode" is the only measure of central tendency that can be used for "Categorical" data. This is because it simply represents the most frequently occurring category or value.

Discuss it

You are developing a linear regression model and notice that despite a high R-squared value, none of your independent variables are statistically significant. What might be the potential issue here?

Data leakage
High variance
Multicollinearity
Underfitting

This could be due to multicollinearity. Multicollinearity inflates the variances of the parameter estimates, which might lead to none of them being statistically significant. Despite this, the overall model might still be significant, leading to a high R-squared value.

Discuss it

You have a dataset with an odd number of observations. If you were to calculate both the mean and median, how would adding a very large value to the dataset affect these measures of central tendency?

Both would increase
Both would remain unchanged
Only the mean would increase
Only the median would increase

Adding a very large value to the dataset would increase the "Mean" because it takes into account all values in the data set. However, the "Median" would not be affected unless the new value changes the middle value of the ordered data set.

Discuss it

In _____ deletion, all data from a participant is discarded if any single value is missing.

Listwise
Pairwise
Random
Systematic

In 'listwise' deletion, all data from a participant is discarded if any single value is missing. It is the simplest form of dealing with missing data but can lead to significant loss of information if missing data is not completely at random.

Discuss it

A correlation coefficient of +1 between two variables indicates what kind of relationship?

No relationship
Perfect negative linear relationship
Perfect positive linear relationship
Weak relationship

A correlation coefficient of +1 between two variables indicates a perfect positive linear relationship. This means that if one variable increases, the other variable also increases at a constant rate, and vice versa.

Discuss it

You've 'explored' the data and drawn some conclusions, but upon 'communicating' your findings, stakeholders have additional questions. What would be the next step in the EDA process?

Direct the stakeholders to the raw data
Ignore the questions and conclude the analysis
Revisit the questioning phase with these new questions
Wrap up the communication phase quickly

In this situation, the next step should be to revisit the 'questioning' phase with these new questions from stakeholders. Additional questions from stakeholders might reflect aspects of the data that have not been covered or require further investigation. Revisiting the questioning phase will allow these aspects to be incorporated into the analysis.

Discuss it

What kind of distribution is indicated by a skewness of zero?

A bimodal distribution.
A negatively skewed distribution.
A normal distribution.
A positively skewed distribution.

A skewness of zero is indicative of a "Normal Distribution". In a perfect normal distribution, both tails are equal, so they balance each other out, and hence, the skewness is zero.

Discuss it

If missing data is not properly addressed, the model's ________ can be significantly affected.

F1 score
accuracy
precision
recall

If missing data is not handled correctly, it can lead to biases in the data, which can adversely affect the model's accuracy.

Discuss it