You are given a dataset with a high number of features. The computational resources are limited. What feature selection method might you consider?

Backward elimination
Filter methods
Forward selection
Wrapper methods

Given limited computational resources, filter methods might be a good choice. These methods are less computationally expensive than wrapper methods as they do not involve the use of any specific machine learning algorithm. Instead, they rank features based on statistical measures and remove irrelevant features based on a certain threshold or number of top features to keep.

Discuss it

What type of data can only take on discrete values?

Categorical data
Continuous data
Discrete data
Ordinal data

Discrete data can only take on distinct, separate values. It can't be made more precise by further measurement or counting. For example, the number of students in a class would be discrete data.

Discuss it

You're working with a dataset where two features, 'age' and 'years of experience', have a high correlation. Which problem does this situation exemplify?

Data leakage
Multicollinearity
Overfitting
Underfitting

This situation exemplifies multicollinearity, a condition where two or more predictors in a multiple regression model are highly correlated. This high correlation means that 'age' and 'years of experience' provide similar information in predicting the dependent variable.

Discuss it

In a Normal Distribution, approximately 95% of the data falls within _____ standard deviations of the mean.

1
2
3
4

In a Normal Distribution, approximately 95% of the data falls within 2 standard deviations of the mean.

Discuss it

How can extreme outliers impact the interpretation of the skewness of a dataset?

Can either increase or decrease the skewness
Decrease the skewness
Does not affect the skewness
Increase the skewness

The skewness of a distribution is a measure of the extent and direction of asymmetry. Extreme outliers can either increase or decrease skewness depending on which tail they lie in. If the outliers are greater than the mean, skewness will be increased. If less, skewness will be decreased.

Discuss it

How do outliers affect the performance of machine learning models?

Decrease model accuracy
Increase model accuracy
Increase model precision
Increase model recall

Outliers can significantly affect the performance of machine learning models, often leading to decreased accuracy. This is because they can cause the model to learn based on these anomalies rather than the underlying data pattern.

Discuss it

How do outliers affect the standard deviation of a dataset?

Can either increase or decrease the standard deviation
Decrease the standard deviation
Does not affect the standard deviation
Increase the standard deviation

Outliers can significantly increase the standard deviation, as the standard deviation is sensitive to extreme values. This is because the standard deviation squares the differences from the mean, making it more reactive to values far from the mean.

Discuss it

_____ are used to indicate different values in a heatmap.

Colors
Lines
Shapes
Sizes

Colors are used to indicate different values in a heatmap. The color scale represents the magnitude of the variable, with different color gradients representing different value ranges.

Discuss it

The _____ of a histogram can significantly influence the representation of data.

Bin width
Color
Shape
Size

The bin width of a histogram is critical in data representation. If it's too large, it may smooth over the details of the distribution. If it's too small, the histogram may be too cluttered or noisy.

Discuss it

Outliers can potentially _______ the interpretation of the data.

Complicate
Improve
Simplify
Skew

Outliers can skew the interpretation of the data. They can affect the mean and standard deviation, thus distorting the overall understanding of the data.

Discuss it