What are the potential downsides of removing outliers from your dataset?

  • It always improves the quality of the dataset
  • It might discard important information
  • It might introduce noise into the dataset
  • nan
Removing outliers might discard potentially important information that could significantly influence the analysis results.

In a correlation matrix, the value -1 signifies a perfect _____ correlation between two variables.

  • negative
  • neutral
  • positive
  • random
In a correlation matrix, a value of -1 signifies a perfect negative correlation between two variables. This means that as one variable increases, the other decreases proportionally, and vice versa.

A correlation coefficient of 0 implies ________.

  • A strong negative relationship
  • A strong positive relationship
  • An uncertain relationship
  • No linear relationship
A correlation coefficient of 0 implies no linear relationship between the variables. However, it doesn't necessarily mean that there is no relationship at all, as the relationship could be non-linear.

In the EDA process, what does 'wrangling' refer to?

  • Cleaning and transforming data
  • Formulating hypothesis
  • Interpreting data
  • Visualizing data
Wrangling in the EDA process refers to the cleaning and transforming of data to facilitate subsequent analysis. This could involve addressing missing values, correcting inconsistencies, or reshaping the data structure.

What happens to a model's performance when missing data is not handled correctly?

  • It depends on the model.
  • It deteriorates.
  • It improves.
  • It remains the same.
When missing data is not handled correctly, it can distort the underlying data distribution and lead to incorrect model learning, ultimately deteriorating the model's performance.

How does EDA help in understanding the underlying structure of data?

  • By cleaning data
  • By modelling data
  • By summarizing data
  • By visualizing data
EDA, particularly data visualization, plays a crucial role in understanding the underlying structure of data. Visual techniques such as histograms, scatterplots, or box plots, can uncover patterns, trends, relationships, or outliers that would remain hidden in raw, numerical data. Visual exploration can guide statistical analysis and predictive modeling by revealing the underlying structure and suggesting hypotheses.

What are the disadvantages of using backward elimination in feature selection?

  • It assumes a linear relationship
  • It can be computationally expensive
  • It can result in overfitting
  • It's sensitive to outliers
Backward elimination in feature selection involves starting with all variables and then removing the least significant variables one by one. This process can be computationally expensive, especially when dealing with datasets with a large number of features.

The 'style' and 'context' functions in Seaborn are used to set the ___________ of the plots.

  • aesthetic and context
  • axis labels
  • layout and structure
  • size and color
The 'style' function in Seaborn is used to set the overall aesthetic look of the plot, including background color, grids, and spines. The 'context' function allows you to set the context parameters, which adjust the scale of the plot elements based on the context in which the plot will be presented (e.g., paper, notebook, talk, poster).

How does the choice of the threshold affect the number of identified outliers using the Z-score method?

  • A higher threshold identifies more outliers
  • A lower threshold identifies more outliers
  • It has no effect
  • The threshold value is irrelevant in the Z-score method
The lower the threshold, the more data points will exceed it, and thus, more outliers will be identified.

Outliers are _________ observations that lie an abnormal distance from other values in a dataset.

  • Anomalous
  • Erroneous
  • Random
  • Statistical
Anomalous is the correct term. Outliers are anomalous observations that lie an abnormal distance from other values in a random sample from a population.