How can a Uniform Distribution be transformed into a Normal Distribution?

  • By adding a constant to each value
  • By applying the Central Limit Theorem
  • By squaring each value
  • It can't be transformed
A Uniform Distribution can be approximated to a Normal Distribution by the application of the Central Limit Theorem, which states that the sum of a large number of independent and identically distributed variables, irrespective of their shape, tends towards a Normal Distribution.

You are working with a normally distributed data set. How would the standard deviation help you understand the data?

  • It can tell you how spread out the data is around the mean
  • It can tell you the range of the data
  • It can tell you the skewness of the data
  • It can tell you where the outliers are
For a normally distributed dataset, the "Standard Deviation" tells you "How spread out the data is around the mean". In a normal distribution, about 68% of values are within 1 standard deviation from the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations.

You've created a pairplot of your dataset, and one scatter plot in the grid shows a clear linear pattern. What could this potentially indicate?

  • The two variables are highly uncorrelated
  • The two variables are unrelated
  • The two variables have a strong linear relationship
  • The two variables have no relationship
If a scatter plot in a pairplot shows a clear linear pattern, this could potentially indicate that the two variables have a strong linear relationship. This means that changes in one variable correspond directly to changes in the other variable.

A team of researchers has already formulated their hypotheses and now they want to test these against their collected data. What type of data analysis would be appropriate?

  • All are equally suitable
  • CDA
  • EDA
  • Predictive Modeling
CDA would be the most appropriate as it involves testing pre-formulated hypotheses against the collected data to either confirm or refute them.

When performing a pairwise analysis, _____ deletion discards only the specific pairs of data where one is missing.

  • Listwise
  • Pairwise
  • Random
  • Systematic
When performing a pairwise analysis, 'pairwise' deletion discards only the specific pairs of data where one is missing. It allows the retention of more data compared to listwise deletion, but it can lead to biased results if the data is not missing completely at random.

In regression analysis, if the Variance Inflation Factor (VIF) for a predictor is 1, this means that _________.

  • The predictor is not at all correlated with other predictors
  • The predictor is not at all correlated with the response
  • The predictor is perfectly correlated with other predictors
  • The predictor is perfectly correlated with the response
In regression analysis, a Variance Inflation Factor (VIF) of 1 indicates that there is no correlation between the given predictor and the other predictors. This implies no multicollinearity.

Why might PCA be considered a method of feature selection?

  • It can handle correlated features
  • It can improve model performance
  • It can reduce the dimensionality of the data
  • It transforms the data into a new space
Principal Component Analysis (PCA) can be considered a method of feature selection because it reduces the dimensionality of the data by transforming the original features into a new set of uncorrelated features. These new features, called principal components, are linear combinations of the original features and are selected to capture the most variance in the data.

You have found that your dataset has a high degree of multicollinearity. What steps would you consider to rectify this issue?

  • Add more data points
  • Increase the model bias
  • Increase the model complexity
  • Use Principal Component Analysis (PCA)
One way to rectify multicollinearity is to use Principal Component Analysis (PCA). PCA transforms the original variables into a new set of uncorrelated variables, thereby removing multicollinearity.

Which of the following best describes qualitative data?

  • Data that can be categorized
  • Data that can be ordered
  • Data that can take any value
  • Data that is numerical in nature
Qualitative data refers to non-numerical information that can be categorized based on traits and characteristics. It captures information that cannot be simply expressed in numbers.

In the context of EDA, what does the concept of "data wrangling" entail?

  • Calculating descriptive statistics for the dataset
  • Cleaning, transforming, and reshaping raw data
  • Training and validating a machine learning model
  • Visualizing the data using charts and graphs
In the context of EDA, "data wrangling" involves cleaning, transforming, and reshaping raw data. This could include dealing with missing or inconsistent data, transforming variables, or restructuring data frames for easier analysis.