How does the Min-Max scaling differ from standardization when it comes to handling outliers?

  • Both handle outliers in the same way
  • Min-Max scaling is more sensitive to outliers than standardization
  • Min-Max scaling removes outliers, while standardization doesn't
  • Standardization is more sensitive to outliers than Min-Max scaling
Min-Max scaling is more sensitive to outliers than standardization. In Min-Max scaling, if the dataset contains extreme values or outliers, then the majority of the data after scaling could end up within a small interval. On the other hand, standardization does not have a bounding range, which makes it more suitable for handling outliers.

Suppose you have a model with a high level of precision but low recall. You notice that missing data was handled incorrectly. How might this have affected the model's performance?

  • Missing data could have affected the model's complexity.
  • Missing data might have introduced false negatives.
  • Missing data might have introduced false positives.
  • Missing data might have skewed the distribution of the data.
Incorrect handling of missing data may result in the model being trained on a biased dataset, leading to false negatives and subsequently a lower recall.

Why is it important to deal with outliers before conducting data analysis?

  • To clean the data
  • To ensure accurate results
  • To normalize the data
  • To remove irrelevant variables
Dealing with outliers is important before conducting data analysis to ensure accurate results, as outliers can distort the data distribution and statistical parameters.

Which visualization library in Python is primarily built on Matplotlib and provides a high-level interface for drawing attractive statistical graphics?

  • NumPy
  • Pandas
  • SciPy
  • Seaborn
Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for creating attractive graphics and comes with several built-in themes for styling Matplotlib graphics.

Which plot uses kernel smoothing to give a visual representation of the density of data?

  • Box plot
  • Histogram
  • Kernel Density plot
  • Scatter plot
A Kernel Density Plot uses kernel smoothing to give a visual representation of the density of data. It is used for visualizing the Probability Density of a continuous variable. It depicts the probability density at different values in a continuous variable.

Regression imputation can lead to biased estimates if the data is not __________.

  • All of the above
  • Missing completely at random
  • Normally distributed
  • Uniformly distributed
Regression imputation can lead to biased estimates if the missingness of the data is not completely at random (MCAR). If there is a systematic pattern in the missingness, regression imputation could lead to bias.

Suppose you're given a task to find the outliers in the multivariate dataset. Which plot will be helpful in this context and why?

  • Bar Plot
  • Box Plot
  • Histogram
  • Scatter Plot
A scatter plot would be helpful in finding outliers in a multivariate dataset. By plotting different variable combinations, you can identify points that fall far from the general distribution, which could indicate potential outliers.

A wildlife study records the number of different bird species seen during each observation period. How would you classify this data type?

  • Continuous data
  • Discrete data
  • Nominal data
  • Ordinal data
The number of different bird species seen during each observation period is a count and therefore classified as discrete data.

How can a Uniform Distribution be transformed into a Normal Distribution?

  • By adding a constant to each value
  • By applying the Central Limit Theorem
  • By squaring each value
  • It can't be transformed
A Uniform Distribution can be approximated to a Normal Distribution by the application of the Central Limit Theorem, which states that the sum of a large number of independent and identically distributed variables, irrespective of their shape, tends towards a Normal Distribution.

You are working with a normally distributed data set. How would the standard deviation help you understand the data?

  • It can tell you how spread out the data is around the mean
  • It can tell you the range of the data
  • It can tell you the skewness of the data
  • It can tell you where the outliers are
For a normally distributed dataset, the "Standard Deviation" tells you "How spread out the data is around the mean". In a normal distribution, about 68% of values are within 1 standard deviation from the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations.