Your data shows a notable difference between the mean and the median values. Which type of scaling would be least affected by this discrepancy?

All scaling methods are affected by this discrepancy
Min-Max scaling because it scales all values between 0 and 1
Robust scaling because it uses median and quartile ranges
Z-score standardization because it creates a normal distribution

Robust scaling uses the median and interquartile range to scale the data, so it is not affected by the mean and is thus least affected by a discrepancy between the mean and the median.

Discuss it

When the correlation coefficient is close to 1, it implies a strong ________ relationship between the two variables.

Negative
Neutral
Positive
Zero

When the correlation coefficient is close to 1, it implies a strong positive relationship between the two variables. This means as one variable increases, the other also increases.

Discuss it

_____ plots can give a high-level view of a single continuous variable but may hide details about the distribution.

Bar
Box
Histogram
Scatter

Histograms can provide a high-level view of a single continuous variable by showing the frequency of data points in different bins. However, due to the binning process, some details about the distribution might be hidden.

Discuss it

You've identified several outliers using the modified Z-score method in your dataset. What could be the possible reasons for their existence?

All of these
The data may have been corrupted
The dataset may contain measurement errors
The dataset may have a complex, multi-modal distribution

All these reasons could lead to the existence of outliers in a dataset.

Discuss it

A high ________ suggests that data points are generally far from the mean, indicating a wide spread in the data set.

Mean
Median
Standard Deviation
Variance

A "High Standard Deviation" suggests that data points are generally far from the mean, indicating a wide spread in the dataset. It measures the absolute variability of a distribution; the higher the spread, the higher the standard deviation.

Discuss it

When the distribution is skewed to the right, it is referred to as _________ skewness.

Any of these
Negative
Positive
Zero

Positive skewness refers to a distribution where the right tail is longer or fatter than the left tail. In such distributions, the majority of the values (including the median and the mode) tend to be less than the mean.

Discuss it

The final step of the EDA process, '______,' is about presenting your conclusions in an understandable way to your audience.

communicating
concluding
questioning
wrangling

The final step of the EDA process, 'communicating,' is about presenting your conclusions in an understandable way to your audience. It is crucial to ensure that the insights and conclusions drawn from the data are communicated effectively and can be understood by the audience.

Discuss it

A machine learning model is overfitting on a training dataset. How could feature selection be used to address this issue?

By increasing the model complexity
By increasing the number of features
By reducing the number of features
By transforming the features

Feature selection can be used to address overfitting by reducing the number of features. Overfitting occurs when a model learns the noise in the training data, leading to poor performance on unseen data. By reducing the number of features, the complexity of the model can be reduced, which in turn can help to mitigate overfitting.

Discuss it

How does the probability mass function of a Binomial Distribution change with different parameters?

All of the above
It alters the skewness and kurtosis
It changes the range of possible outcomes
It impacts the center of the distribution

The probability mass function of a Binomial Distribution changes with different parameters. Specifically, it alters the possible range of outcomes (the number of trials), and the probability of success in each trial.

Discuss it

What type of plot is ideal for visualizing relationships among more than two variables?

Bar plot
Box plot
Pairplot
Scatter plot

Pairplot is a type of plot that is ideal for visualizing relationships among more than two variables. It creates a grid of Axes such that each variable in your data is shared in the y-axis across a single row and in the x-axis across a single column.

Discuss it