What does a histogram represent in data visualization?

The change of a variable over time
The correlation between two variables
The frequency distribution of a single variable
The relationship between three variables

A histogram is a graphical representation of the distribution of a dataset. It is an estimate of the probability distribution of a continuous variable. To construct a histogram, the first step is to "bin" the range of values, i.e., divide the entire range of values into a series of intervals, and then count how many values fall into each interval.

Discuss it

What is the primary assumption for performing a Z-test?

All of the above
The data is normally distributed
The population standard deviation is known
The sample size is very large

The primary assumption for a Z-test is that the population standard deviation is known. While a large sample size and normal distribution are often associated with Z-tests, they're not the primary assumption.

Discuss it

What does a negative kurtosis indicate about the distribution of the dataset?

The distribution has a perfect bell shape
The distribution is less outlier-prone than a normal distribution
The distribution is more outlier-prone than a normal distribution
The distribution is skewed

A negative kurtosis value indicates that the distribution has lighter tails and a flatter peak than the normal distribution. It means there are fewer outliers (extreme values), thus it is less outlier-prone than a normal distribution.

Discuss it

What does the F-statistic signify in an ANOVA test?

The ratio of between-group variability to within-group variability
The ratio of total variability to within-group variability
The ratio of within-group variability to between-group variability
The ratio of within-group variability to total variability

In an ANOVA test, the F-statistic is the ratio of the between-group variability to the within-group variability. In other words, it measures how much the means of each group vary between the groups, compared to how much they vary within each group. A larger F-statistic implies a greater degree of difference between the group means.

Discuss it

What assumption about the residuals of a linear regression model does homoscedasticity refer to?

The residuals are independent
The residuals are normally distributed
The residuals have a linear relationship with the dependent variable
The residuals have constant variance

Homoscedasticity refers to the assumption that the residuals (errors) have constant variance at each level of the independent variable(s). This is important for the reliability of the regression model.

Discuss it

How does stratified random sampling differ from simple random sampling?

Stratified random sampling always involves larger sample sizes than simple random sampling
Stratified random sampling involves dividing the population into subgroups and selecting individuals from each subgroup
Stratified random sampling is the same as simple random sampling
Stratified random sampling only selects individuals from a single subgroup

Stratified random sampling differs from simple random sampling in that it first divides the population into non-overlapping groups, or strata, based on specific characteristics, and then selects a simple random sample from each stratum. This can ensure that each subgroup is adequately represented in the sample, which can increase the precision of estimates.

Discuss it

Why are bar plots commonly used in data analysis?

To compare the frequency of categorical variables
To show the change of a variable over time
To show the distribution of a single variable
To show the relationship between two continuous variables

Bar plots are commonly used in data analysis to compare the frequency, count, or proportion of categorical variables. Each category is represented by a separate bar, and the length or height of the bar represents its corresponding value.

Discuss it

Conditional independence of A and B given C means that knowing that C has occurred does not change the ________ between A and B.

Difference
Intersection
Ratio
Relationship

Conditional independence of A and B given C means that knowing that C has occurred does not change the relationship between A and B. In other words, the occurrence of event C does not affect the independence of events A and B.

Discuss it

What is the assumption made when computing the Pearson correlation coefficient?

The correlation is zero
The variables are independent
The variables are normally distributed
There is a linear relationship between variables

When computing the Pearson correlation coefficient, it is assumed that there is a linear relationship between the variables. Furthermore, it's also assumed that the variables are continuous and that the data is homoscedastic (i.e., the variance of the errors is the same across all levels of the variables).

Discuss it

How is the variance related to the standard deviation in a data set?

The variance is the average of the standard deviation
The variance is the square of the standard deviation
The variance is the square root of the standard deviation
The variance is twice the standard deviation

The variance is the square of the standard deviation. Standard deviation is a measure of dispersion in a dataset and variance is a square of it, meaning that they both represent the same concept of dispersion, but in different units.

Discuss it