You have a dataset with missing values and you've chosen to use multiple imputation. However, the results after applying multiple imputation are not as expected. What factors might be causing this?

  • Both too few and too many imputations
  • The model used for imputation is perfect
  • Too few imputations
  • Too many imputations
If too few imputations are used in multiple imputation, the results may not be accurate. This may lead to an underestimation of standard errors and incorrect statistical inference. Increasing the number of imputations generally leads to more accurate results.

What type of data typically requires more complex statistical methods for analysis?

  • Categorical data
  • Continuous data
  • Discrete data
  • Ordinal data
Continuous data usually requires more complex statistical methods for analysis because it can take on any value within a certain range. This might require techniques like regression, hypothesis testing, and advanced graphical representations.

In a box plot, outliers are typically represented as ______.

  • boxes
  • dots
  • lines
  • whiskers
In a box plot, outliers are typically represented as dots or points that fall outside the whiskers of the box.

In what way does improper handling of missing data affect the generalization capability of a model?

  • Depends on the amount of missing data.
  • Hampers generalization.
  • Improves generalization.
  • No effect on generalization.
Improper handling of missing data can lead to the model learning incorrect or misleading patterns from the data. This can hamper the model's ability to generalize well to unseen data.

What is the key visual feature of a scatter plot that may indicate the presence of outliers?

  • Color coding
  • Legends
  • Points far away from the general grouping
  • Trend line
Points that are far away from the general grouping in a scatter plot may indicate the presence of outliers.

The diagonals of a pairplot often show the _____ of the individual variables.

  • frequency distribution
  • mean
  • mode
  • standard deviation
In pairplot, the diagonals often show the frequency distribution of the individual variables. This provides an understanding of the distribution of individual variables in addition to their relationships with other variables.

The degree of tailedness in a distribution is measured by _________.

  • Kurtosis
  • Skewness
  • Standard Deviation
  • Variance
Kurtosis is a statistical measure used to describe the distribution's tails and sharpness. It measures the degree of peakedness or flatness in a distribution, or in simple terms, the 'tailedness' of the distribution.

In which plot can we see the distribution, median, quartiles, and outliers all at once?

  • Bar chart
  • Box plot
  • Pie chart
  • Scatter plot
A Box plot, also known as a whisker plot, displays a summary of the set of data values including minimum, first quartile (25th percentile), median, third quartile (75th percentile), and maximum. Outliers are also often indicated in box plots through the use of markers.

_____ data is a type of qualitative data that can be sorted into non-numerical categories.

  • Nominal
  • Ordinal
  • Qualitative
  • Quantitative
Nominal data is a type of qualitative data that can be sorted into non-numerical categories, with no order or priority.

One major advantage of _______ methods over filter methods for feature selection is that they can capture the interaction between input features.

  • Embedded
  • Filter
  • PCA
  • Wrapper
One major advantage of wrapper methods over filter methods for feature selection is that they can capture the interaction between input features. Unlike filter methods that evaluate each feature independently, wrapper methods consider the subset of features and can thus capture interactions among features.