If you want to represent both the distribution and density of data, a _____ plot is a good choice.

  • Bar
  • Line
  • Scatter
  • Violin
A Violin plot is a good choice to represent both the distribution and density of data. It combines aspects of both a box plot and a density plot, giving a fuller picture of the distribution.

What are the potential downsides of removing outliers from your dataset?

  • It always improves the quality of the dataset
  • It might discard important information
  • It might introduce noise into the dataset
  • nan
Removing outliers might discard potentially important information that could significantly influence the analysis results.

A correlation coefficient of 0 implies ________.

  • A strong negative relationship
  • A strong positive relationship
  • An uncertain relationship
  • No linear relationship
A correlation coefficient of 0 implies no linear relationship between the variables. However, it doesn't necessarily mean that there is no relationship at all, as the relationship could be non-linear.

In the EDA process, what does 'wrangling' refer to?

  • Cleaning and transforming data
  • Formulating hypothesis
  • Interpreting data
  • Visualizing data
Wrangling in the EDA process refers to the cleaning and transforming of data to facilitate subsequent analysis. This could involve addressing missing values, correcting inconsistencies, or reshaping the data structure.

What happens to a model's performance when missing data is not handled correctly?

  • It depends on the model.
  • It deteriorates.
  • It improves.
  • It remains the same.
When missing data is not handled correctly, it can distort the underlying data distribution and lead to incorrect model learning, ultimately deteriorating the model's performance.

During an experiment, you discover that a certain variable is presenting a high number of outliers. What might this suggest about your data collection process?

  • Both are possible
  • Data collection process is accurate
  • Data collection process is flawed
  • Neither of these is possible
A high number of outliers might suggest that there are issues with the data collection process, such as measurement errors or other issues.

What is the primary cause of outliers in normally distributed data?

  • All of these
  • Data entry errors
  • Data processing errors
  • Measurement errors
Outliers in normally distributed data can be a result of various factors such as data entry errors, measurement errors, or errors in data processing.

In Plotly, the ________ object is the top-level container for all plot attributes.

  • Diagram
  • Figure
  • Graph
  • Plot
In Plotly, the 'Figure' object is the top-level container in which all plot-related attributes such as data and layout are stored.

How does EDA help in understanding the underlying structure of data?

  • By cleaning data
  • By modelling data
  • By summarizing data
  • By visualizing data
EDA, particularly data visualization, plays a crucial role in understanding the underlying structure of data. Visual techniques such as histograms, scatterplots, or box plots, can uncover patterns, trends, relationships, or outliers that would remain hidden in raw, numerical data. Visual exploration can guide statistical analysis and predictive modeling by revealing the underlying structure and suggesting hypotheses.

What are the disadvantages of using backward elimination in feature selection?

  • It assumes a linear relationship
  • It can be computationally expensive
  • It can result in overfitting
  • It's sensitive to outliers
Backward elimination in feature selection involves starting with all variables and then removing the least significant variables one by one. This process can be computationally expensive, especially when dealing with datasets with a large number of features.