A company has asked you to build a model that can predict customer churn based on a set of features. Which type of data analysis will you perform?

All are equally suitable
CDA
EDA
Predictive Modeling

Predictive Modeling would be most suitable in this case. It involves the application of machine learning algorithms to the data in order to make predictions about future outcomes, in this case, customer churn.

Discuss it

How does the choice of model in a model-based method impact the imputation process?

The choice of model can cause overfitting
The choice of model can influence the accuracy of the imputations
The choice of model can introduce unnecessary complexity
The choice of model has no impact

The choice of model in a model-based method can significantly influence the accuracy of the imputations. If the chosen model closely matches the actual data generation process, then the imputations will be accurate. However, if the model is a poor fit, the imputed values may be far from the true values, leading to biased results.

Discuss it

What is the biggest challenge in the 'wrangle' phase of the EDA process?

Communicating the insights
Dealing with missing values and other inconsistencies in the data
Defining the right questions
Drawing conclusions from the data

The wrangling phase of the EDA process can be challenging as it involves dealing with various data quality issues. These can include missing values, inconsistent data entries, outliers, and other anomalies. The analyst might need to make informed decisions about how to handle these issues without introducing bias or distorting the underlying information in the data.

Discuss it

How can outliers influence the mean of a dataset?

Can either increase or decrease the mean
Decrease the mean
Does not affect the mean
Increase the mean

Outliers can have a big impact on the mean. Depending on whether the outlier is much higher or lower than the other values, it can significantly increase or decrease the mean, thereby skewing the data.

Discuss it

In a positively skewed distribution, which is greater: mean or median?

Both are equal
Mean
Median
nan

In a positively skewed distribution, the "Mean" is generally greater than the median. Positive skewness means that the distribution has a long right tail, so extreme values in the positive direction can pull the mean upwards more than the median.

Discuss it

In a leptokurtic distribution, the kurtosis value is ___________ than 0.

Any of these
Equal
Greater
Less

A leptokurtic distribution has kurtosis greater than 0, indicating a sharper peak and fatter tails compared to a normal distribution.

Discuss it

Under what circumstances is NMAR typically observed in a dataset?

All of the above
When data missingness is associated with the missing data itself
When data missingness is random
When data missingness is unrelated to observed and unobserved data

NMAR (Not Missing At Random) is typically observed when the missingness is related to the value of the missing data itself. This is the most challenging type of missingness to handle as it relies on unobserved data.

Discuss it

The '______' step in the EDA process involves formulating the questions you want to answer with your data.

communicating
concluding
questioning
wrangling

The first step in the EDA process, 'questioning,' involves formulating the questions that you want to answer with your data. It's during this step that you define what you want to achieve with your analysis and what problems you are trying to solve.

Discuss it

________ correlation is more appropriate when dealing with ordinal variables.

Covariance
Kendall's Tau
Pearson's
Spearman's

Spearman's correlation is more appropriate when dealing with ordinal variables. Unlike Pearson's, Spearman's correlation works with ranks, which makes it suitable for ordinal data.

Discuss it

Anomalies or outliers in the dataset can be identified through the process of ________.

CDA
EDA
Machine Learning
Predictive Modeling

Anomalies or outliers in the dataset can be identified through the process of EDA. Various techniques such as visualization methods (like box plots and scatter plots) and statistical methods (like the IQR method or the Z-score method) can be used to detect outliers during EDA.

Discuss it