When data is normally distributed, approximately 95% of the data falls within ________ standard deviations of the mean.

Four
One
Three
Two

When data is normally distributed, approximately "95%" of the data falls within "Two" standard deviations of the mean. This is known as the empirical rule, or the 68-95-99.7 rule, a shorthand used to remember the percentage of values that lie within a band around the mean in a normal distribution.

Discuss it

How do filter, wrapper, and embedded methods for feature selection differ from each other?

By the bias-variance tradeoff
By the computational complexity
By the problem-solving approach
By their use of machine learning models

Filter methods for feature selection evaluate the relevance of the input features based on their correlation with the target variable, and do not involve the use of any specific machine learning algorithm. Wrapper methods involve the use of a specific machine learning algorithm and select features that contribute to the performance of the model. Embedded methods integrate feature selection as part of the model training process.

Discuss it

Which type of data can take on any value within a certain range?

Categorical data
Continuous data
Discrete data
Nominal data

Continuous data can take on any value within a certain range. For example, the height of a person can be any value within the range of human heights.

Discuss it

Suppose you have an overfitting model. You identify that missing data was incorrectly filled with a constant value. How might this have contributed to overfitting?

The model became too complex.
The model learned noise from the data.
The model was under-regularized.
The model's hyperparameters were not optimized.

Filling missing data with a constant value could introduce noise into the data, causing the model to learn the noise along with the underlying patterns, thus leading to overfitting.

Discuss it

Which type of data analysis helps the most in feature selection for Machine Learning?

All of them equally contribute.
CDA
EDA
Predictive Modeling

EDA plays a significant role in feature selection for Machine Learning. Through the exploration of relationships between features and the target variable, and the identification of potential data issues like multicollinearity, EDA can help analysts determine which features are most relevant for a given machine learning model.

Discuss it

A data scientist is working on a dataset with multiple categories and subcategories. What data visualization techniques can be used to ensure the readability and aesthetics of the data presentation?

Box plot, because it shows the range and outliers
Parallel coordinates, because it can represent multiple dimensions
Scatter plot, because it shows relationships between variables
Stacked bar chart or treemap, because they can show hierarchical data

Stacked bar charts or treemaps are suitable for visualizing data with multiple categories and subcategories (hierarchical data). These graphs allow the viewers to see the total size of each main category and the size of each subcategory within the main ones.

Discuss it

Why is variance considered a squared measure?

Because it involves squaring the difference from the mean
Because it is always a perfect square
Because it's derived from the square of the data values
Because it's the square root of the standard deviation

"Variance" is considered a squared measure "Because it involves squaring the difference from the mean". Squaring is done to avoid cancellation of positive and negative differences.

Discuss it

What type of data is based on measurements or counts?

Nominal data
Ordinal data
Qualitative data
Quantitative data

Quantitative data is based on measurements or counts. It's typically numerical and can be used in mathematical and statistical operations.

Discuss it

Which measure of central tendency is calculated by adding all the numbers and dividing by the number of numbers?

Mean
Median
Mode
nan

The "Mean" is calculated by adding all the numbers in the data set and then dividing by the count of numbers. It is often referred to as the average and provides a single value representation of the center of the data.

Discuss it

What are some common methods to handle Multicollinearity in a dataset?

All of these methods can be used.
Increasing the sample size
Performing Principal Component Analysis
Removing highly correlated variables

All the mentioned methods can be used to handle Multicollinearity. Depending on the severity of the multicollinearity and the specific context, you might choose to remove highly correlated variables, increase your sample size, or perform Principal Component Analysis (PCA) to create a smaller set of uncorrelated variables.

Discuss it