In Plotly, the ________ object is the top-level container for all plot attributes.

  • Diagram
  • Figure
  • Graph
  • Plot
In Plotly, the 'Figure' object is the top-level container in which all plot-related attributes such as data and layout are stored.

What is the primary cause of outliers in normally distributed data?

  • All of these
  • Data entry errors
  • Data processing errors
  • Measurement errors
Outliers in normally distributed data can be a result of various factors such as data entry errors, measurement errors, or errors in data processing.

During an experiment, you discover that a certain variable is presenting a high number of outliers. What might this suggest about your data collection process?

  • Both are possible
  • Data collection process is accurate
  • Data collection process is flawed
  • Neither of these is possible
A high number of outliers might suggest that there are issues with the data collection process, such as measurement errors or other issues.

In a _____ plot, the width of the "violin" indicates the frequency or density of data.

  • Bar
  • Box
  • Scatter
  • Violin
In a Violin plot, the width of the "violin" (or the density plot on each side) varies with the estimated density of data points at a given level. The wider the plot, the higher the density of data points at that value.

An unusually large or small value in a dataset, which does not follow the general trend of the rest of the data, is known as an ________.

  • Aberration
  • Anomaly
  • Deviation
  • Outlier
An unusually large or small value in a dataset, which does not follow the general trend of the rest of the data, is known as an outlier.

Outliers are _________ observations that lie an abnormal distance from other values in a dataset.

  • Anomalous
  • Erroneous
  • Random
  • Statistical
Anomalous is the correct term. Outliers are anomalous observations that lie an abnormal distance from other values in a random sample from a population.

How does the choice of the threshold affect the number of identified outliers using the Z-score method?

  • A higher threshold identifies more outliers
  • A lower threshold identifies more outliers
  • It has no effect
  • The threshold value is irrelevant in the Z-score method
The lower the threshold, the more data points will exceed it, and thus, more outliers will be identified.

The 'style' and 'context' functions in Seaborn are used to set the ___________ of the plots.

  • aesthetic and context
  • axis labels
  • layout and structure
  • size and color
The 'style' function in Seaborn is used to set the overall aesthetic look of the plot, including background color, grids, and spines. The 'context' function allows you to set the context parameters, which adjust the scale of the plot elements based on the context in which the plot will be presented (e.g., paper, notebook, talk, poster).

What are the disadvantages of using backward elimination in feature selection?

  • It assumes a linear relationship
  • It can be computationally expensive
  • It can result in overfitting
  • It's sensitive to outliers
Backward elimination in feature selection involves starting with all variables and then removing the least significant variables one by one. This process can be computationally expensive, especially when dealing with datasets with a large number of features.

When would it be appropriate to use 'transformation' as an outlier handling method?

  • When the outliers are a result of data duplication
  • When the outliers are errors in data collection
  • When the outliers are extreme but legitimate data points
  • When the outliers do not significantly impact the data analysis
Transformation is appropriate to use as an outlier handling method when the outliers are extreme but legitimate data points that carry valuable information.

When applying regression imputation, what factors need to be taken into consideration?

  • Both dependent and independent variables
  • None of the variables
  • Only the dependent variable
  • Only the independent variables
When applying regression imputation, both dependent and independent variables need to be taken into consideration. A regression model is built using the complete cases and then this model is used to predict the missing values in the incomplete cases. Therefore, it is important to carefully consider which variables to include in the regression model.

How can EDA assist in identifying errors or anomalies in the dataset?

  • By conducting a statistical test of normality
  • By creating a correlation matrix of the variables
  • By running the dataset through a predefined ML model
  • By summarizing and visualizing the data, which can reveal unexpected values or patterns
EDA, especially through summarizing and visualizing data, can assist in identifying errors or anomalies in the dataset. Graphical representations of data often make it easier to spot unexpected values, patterns, or aberrations that may not be apparent in the raw data.