In a scatter plot, the _____ and _____ of the dots represent the values of two different variables.

  • color, size
  • position, color
  • position, size
  • shape, color
In a scatter plot, the position of a dot on the X (horizontal) and Y (vertical) axis represents the values of two different variables. By looking at how the dots are scattered on the plot, one can deduce the type and strength of the relationship between the two variables.

You've identified several outliers using the modified Z-score method in your dataset. What could be the possible reasons for their existence?

  • All of these
  • The data may have been corrupted
  • The dataset may contain measurement errors
  • The dataset may have a complex, multi-modal distribution
All these reasons could lead to the existence of outliers in a dataset.

A high ________ suggests that data points are generally far from the mean, indicating a wide spread in the data set.

  • Mean
  • Median
  • Standard Deviation
  • Variance
A "High Standard Deviation" suggests that data points are generally far from the mean, indicating a wide spread in the dataset. It measures the absolute variability of a distribution; the higher the spread, the higher the standard deviation.

When the distribution is skewed to the right, it is referred to as _________ skewness.

  • Any of these
  • Negative
  • Positive
  • Zero
Positive skewness refers to a distribution where the right tail is longer or fatter than the left tail. In such distributions, the majority of the values (including the median and the mode) tend to be less than the mean.

The final step of the EDA process, '______,' is about presenting your conclusions in an understandable way to your audience.

  • communicating
  • concluding
  • questioning
  • wrangling
The final step of the EDA process, 'communicating,' is about presenting your conclusions in an understandable way to your audience. It is crucial to ensure that the insights and conclusions drawn from the data are communicated effectively and can be understood by the audience.

A machine learning model is overfitting on a training dataset. How could feature selection be used to address this issue?

  • By increasing the model complexity
  • By increasing the number of features
  • By reducing the number of features
  • By transforming the features
Feature selection can be used to address overfitting by reducing the number of features. Overfitting occurs when a model learns the noise in the training data, leading to poor performance on unseen data. By reducing the number of features, the complexity of the model can be reduced, which in turn can help to mitigate overfitting.

How does the probability mass function of a Binomial Distribution change with different parameters?

  • All of the above
  • It alters the skewness and kurtosis
  • It changes the range of possible outcomes
  • It impacts the center of the distribution
The probability mass function of a Binomial Distribution changes with different parameters. Specifically, it alters the possible range of outcomes (the number of trials), and the probability of success in each trial.

What type of plot is ideal for visualizing relationships among more than two variables?

  • Bar plot
  • Box plot
  • Pairplot
  • Scatter plot
Pairplot is a type of plot that is ideal for visualizing relationships among more than two variables. It creates a grid of Axes such that each variable in your data is shared in the y-axis across a single row and in the x-axis across a single column.

How does the uncertainty level differ in EDA, CDA, and Predictive Modeling?

  • Uncertainty is equally distributed among all three.
  • Uncertainty is highest in CDA, lower in Predictive Modeling, and lowest in EDA.
  • Uncertainty is highest in EDA, lower in CDA, and lowest in Predictive Modeling.
  • Uncertainty is highest in Predictive Modeling, lower in CDA, and lowest in EDA.
In EDA, where the primary aim is to explore patterns and relationships in the data, the level of uncertainty is highest. This reduces in CDA, which seeks to confirm the hypotheses generated during EDA. The uncertainty level is lowest in Predictive Modeling as it builds on the outcomes of EDA and CDA to make future predictions.

Can the steps of the EDA process be re-ordered or are they strictly sequential?

  • Some steps can be reordered, but not all.
  • The order of steps depends on the data set size.
  • They are strictly sequential and cannot be reordered.
  • They can be reordered based on the analysis needs.
The EDA process is generally sequential, starting from questioning and ending in communication. However, depending on the nature and needs of the analysis, some steps might be revisited. For instance, new questions might emerge during the explore phase, necessitating going back to the questioning phase. Or, additional data wrangling might be needed after exploring the data.