Readability in data visualization refers to how easily the audience can __________.

Analyze the underlying code
Download the graph
Interact with the graph
Understand the represented data

Readability in data visualization refers to how easily the audience can understand the represented data. This includes the clarity of text elements (labels, title, caption), color scheme, and whether the choice of plot type makes sense for the represented data.

Discuss it

In the context of handling missing data, what does 'imputation' mean?

Adding artificial data
Deleting data points
Filling in missing data with substituted values
Transforming data

In the context of handling missing data, 'imputation' refers to the process of filling in missing data with substituted values. These values can be determined in a variety of ways such as using measures of central tendency (mean, median, mode), predictive models, or other techniques.

Discuss it

Imagine you are examining a correlation matrix and find that two variables have a correlation coefficient close to -1. What does this imply about the relationship between these two variables?

Their relationship is random
They are unrelated
They have a strong negative relationship
They have a weak positive relationship

A correlation coefficient close to -1 implies that the two variables have a strong negative relationship. This means that as one variable increases, the other decreases and vice versa.

Discuss it

What is the difference between skewness and kurtosis?

Skewness measures asymmetry, kurtosis measures variability.
Skewness measures center, kurtosis measures spread.
Skewness measures spread, kurtosis measures center.
Skewness measures symmetry, kurtosis measures tailedness.

The difference between skewness and kurtosis is that skewness measures the asymmetry of a data distribution around its mean, whereas kurtosis measures the "tailedness" of a data distribution. So, skewness is about the symmetry, and kurtosis is about the tails of the distribution.

Discuss it

While analyzing a dataset using a box plot, you notice that there are several data points plotted as circles. What might these circles represent?

Data within the interquartile range
Data within the whiskers
Median values
Outliers

In a box plot, data points plotted as circles often represent outliers.

Discuss it

What is the key difference between 'removal' and 'transformation' of outliers?

Removal changes the data distribution, while transformation does not
Removal deals with extreme values, while transformation does not
Removal discards outliers, while transformation modifies their values
Removal is a type of data cleaning, while transformation is not

The key difference between 'removal' and 'transformation' of outliers is that removal discards outliers from the dataset, while transformation modifies the values of outliers to reduce their impact.

Discuss it

Consider you are using a correlation matrix to understand the relationship between multiple features. You come across a correlation coefficient of -0.85 between two features. What does this indicate?

A strong negative linear relationship
A strong positive linear relationship
A weak positive linear relationship
No relationship

A correlation coefficient of -0.85 indicates a strong negative linear relationship between two features. This means as one feature increases, the other decreases.

Discuss it

Replacing missing values with the median of the existing values is known as _____ imputation.

Mean
Median
Mode
Pairwise

Replacing missing values with the median of the existing values is known as 'median' imputation. This technique is useful for skewed distributions as the median is less affected by outliers than the mean.

Discuss it

In a survey about income levels, some individuals chose not to disclose their earnings. How would you categorize this missing data?

MAR
MCAR
NMAR
Not missing data

This would also be NMAR (Not Missing at Random) because the missingness (income level) depends on the value of the unobserved data itself (i.e., people with higher or lower incomes may be more likely to omit this information).

Discuss it

_____ data can only take certain values with gaps between them.

Continuous
Discrete
Nominal
Ordinal

Discrete data can only take certain values (usually integers) and there are gaps between the values.

Discuss it