How is the whisker of a box plot usually calculated?

Mean ± Standard Deviation
Median ± Interquartile Range
Minimum and maximum values of the dataset
Q1 - 1.5 * IQR, Q3 + 1.5 * IQR

The whisker of a box plot is typically calculated using the formula: Q1 - 1.5 * IQR and Q3 + 1.5 * IQR.

Given a machine learning algorithm that is highly sensitive to the range of input values, which scaling technique should you implement?

Min-Max scaling because it scales all values between 0 and 1
No scaling, as the original data values should be maintained
Robust scaling because it is not affected by outliers
Z-score standardization because it creates a normal distribution

Min-Max scaling is suitable when the algorithm is sensitive to the range of input values, as it scales all feature values into a specified range (usually 0-1). This ensures that all features have the same scale.

Discuss it

_____ data provides numerical measurements and it can be broken down into two subcategories: continuous and discrete.

Nominal
Ordinal
Qualitative
Quantitative

Quantitative data provides numerical measurements and it can be divided into two types: continuous (data that can take any value within a range) and discrete (data that can only take certain values).

Discuss it

In what scenario would a Poisson Distribution be a better fit than a Normal Distribution?

When modeling the number of times an event occurs in a fixed interval
When the data are continuous
When the data are negatively skewed
When the data are positively skewed

A Poisson Distribution would be a better fit when modeling the number of times an event occurs in a fixed interval of time or space. The Poisson Distribution is discrete while the Normal Distribution is continuous.

Discuss it

What is the Interquartile Range (IQR)?

The average spread of the data
The range of all the data
The range of the middle 50% of the data
The spread of the most common data

The Interquartile Range (IQR) is the "Range of the middle 50% of the data". It is calculated as the difference between the upper quartile (Q3) and the lower quartile (Q1).

Discuss it

Continuous data typically can be divided into which two main types?

Discrete and ordinal data
Interval and ratio data
Ordinal and nominal data
Qualitative and quantitative data

Continuous data can typically be divided into two main types: interval and ratio data. Interval data have a consistent scale but no true zero, while ratio data have a consistent scale and a true zero.

Discuss it

The square root of the ________ gives the standard deviation of a data set.

Mean
Median
Range
Variance

The "Variance" of a dataset is the average of the squared differences from the mean. The "Standard Deviation" is the square root of the variance. This means it's in the same unit as the data, which helps us understand the dispersion better.

Discuss it

Mishandling missing data can lead to a high level of ________, impacting model performance.

bias
precision
recall
variance

If missing data is handled improperly, it can lead to biased training data, which can cause the model to learn incorrect or irrelevant patterns and, as a result, adversely affect its performance.

Discuss it

How does multiple imputation handle missing data?

It deletes rows with missing data
It estimates multiple values for each missing value
It fills missing data with mode values
It replaces missing data with a single value

Multiple imputation estimates multiple values for each missing value, instead of filling in a single value for each missing point. It reflects the uncertainty around the true value and provides more realistic estimates.

Discuss it

In the context of a Binomial Distribution, a "success" is defined as _____.

a positive outcome
a random event
an outcome of interest
an outcome that occurs most frequently

In the context of a Binomial Distribution, a "success" is defined as an outcome of interest, which could be positive, negative, or neutral depending on the context.

Discuss it