The diagonals of a pairplot often show the _____ of the individual variables.

frequency distribution
mean
mode
standard deviation

In pairplot, the diagonals often show the frequency distribution of the individual variables. This provides an understanding of the distribution of individual variables in addition to their relationships with other variables.

Discuss it

The degree of tailedness in a distribution is measured by _________.

Kurtosis
Skewness
Standard Deviation
Variance

Kurtosis is a statistical measure used to describe the distribution's tails and sharpness. It measures the degree of peakedness or flatness in a distribution, or in simple terms, the 'tailedness' of the distribution.

Discuss it

In which plot can we see the distribution, median, quartiles, and outliers all at once?

Bar chart
Box plot
Pie chart
Scatter plot

A Box plot, also known as a whisker plot, displays a summary of the set of data values including minimum, first quartile (25th percentile), median, third quartile (75th percentile), and maximum. Outliers are also often indicated in box plots through the use of markers.

Discuss it

_____ data is a type of qualitative data that can be sorted into non-numerical categories.

Nominal
Ordinal
Qualitative
Quantitative

Nominal data is a type of qualitative data that can be sorted into non-numerical categories, with no order or priority.

Discuss it

One major advantage of _______ methods over filter methods for feature selection is that they can capture the interaction between input features.

Embedded
Filter
PCA
Wrapper

One major advantage of wrapper methods over filter methods for feature selection is that they can capture the interaction between input features. Unlike filter methods that evaluate each feature independently, wrapper methods consider the subset of features and can thus capture interactions among features.

Discuss it

Why is standardization (z-score) often used in machine learning algorithms?

Because it brings features to a comparable scale and is not bounded to a specific range
Because it's easy to compute
Because it's not affected by outliers
Because it's the only way to handle numerical data

Standardization, also known as Z-score normalization, is a scaling technique that subtracts the mean and divides by the standard deviation. It is often used in machine learning as it can handle features that are measured in different units by bringing them to a comparable scale. It also doesn't bound values to a specific range.

Discuss it

In ____________, different models are used to estimate the missing values based on observed data.

Mean Imputation
Mode-based Imputation
Model-based Imputation
Multiple Imputation

This process is called model-based imputation. Different statistical models are used to estimate the missing values based on the observed (non-missing) data.

Discuss it

You have a dataset with missing values and you've chosen to use multiple imputation. However, the results after applying multiple imputation are not as expected. What factors might be causing this?

Both too few and too many imputations
The model used for imputation is perfect
Too few imputations
Too many imputations

If too few imputations are used in multiple imputation, the results may not be accurate. This may lead to an underestimation of standard errors and incorrect statistical inference. Increasing the number of imputations generally leads to more accurate results.

Discuss it

What type of data typically requires more complex statistical methods for analysis?

Categorical data
Continuous data
Discrete data
Ordinal data

Continuous data usually requires more complex statistical methods for analysis because it can take on any value within a certain range. This might require techniques like regression, hypothesis testing, and advanced graphical representations.

Discuss it

In a box plot, outliers are typically represented as ______.

boxes
dots
lines
whiskers

In a box plot, outliers are typically represented as dots or points that fall outside the whiskers of the box.

Discuss it