In Plotly, the ________ object is the top-level container for all plot attributes.

Diagram
Figure
Graph
Plot

In Plotly, the 'Figure' object is the top-level container in which all plot-related attributes such as data and layout are stored.

Discuss it

What is the primary cause of outliers in normally distributed data?

All of these
Data entry errors
Data processing errors
Measurement errors

Outliers in normally distributed data can be a result of various factors such as data entry errors, measurement errors, or errors in data processing.

Discuss it

During an experiment, you discover that a certain variable is presenting a high number of outliers. What might this suggest about your data collection process?

Both are possible
Data collection process is accurate
Data collection process is flawed
Neither of these is possible

A high number of outliers might suggest that there are issues with the data collection process, such as measurement errors or other issues.

Discuss it

In a _____ plot, the width of the "violin" indicates the frequency or density of data.

Bar
Box
Scatter
Violin

In a Violin plot, the width of the "violin" (or the density plot on each side) varies with the estimated density of data points at a given level. The wider the plot, the higher the density of data points at that value.

Discuss it

An unusually large or small value in a dataset, which does not follow the general trend of the rest of the data, is known as an ________.

Aberration
Anomaly
Deviation
Outlier

An unusually large or small value in a dataset, which does not follow the general trend of the rest of the data, is known as an outlier.

Discuss it

Outliers are _________ observations that lie an abnormal distance from other values in a dataset.

Anomalous
Erroneous
Random
Statistical

Anomalous is the correct term. Outliers are anomalous observations that lie an abnormal distance from other values in a random sample from a population.

Discuss it

How does the choice of the threshold affect the number of identified outliers using the Z-score method?

A higher threshold identifies more outliers
A lower threshold identifies more outliers
It has no effect
The threshold value is irrelevant in the Z-score method

The lower the threshold, the more data points will exceed it, and thus, more outliers will be identified.

Discuss it

The 'style' and 'context' functions in Seaborn are used to set the ___________ of the plots.

aesthetic and context
axis labels
layout and structure
size and color

The 'style' function in Seaborn is used to set the overall aesthetic look of the plot, including background color, grids, and spines. The 'context' function allows you to set the context parameters, which adjust the scale of the plot elements based on the context in which the plot will be presented (e.g., paper, notebook, talk, poster).

Discuss it

What are the disadvantages of using backward elimination in feature selection?

It assumes a linear relationship
It can be computationally expensive
It can result in overfitting
It's sensitive to outliers

Backward elimination in feature selection involves starting with all variables and then removing the least significant variables one by one. This process can be computationally expensive, especially when dealing with datasets with a large number of features.

Discuss it

When would it be appropriate to use 'transformation' as an outlier handling method?

When the outliers are a result of data duplication
When the outliers are errors in data collection
When the outliers are extreme but legitimate data points
When the outliers do not significantly impact the data analysis

Transformation is appropriate to use as an outlier handling method when the outliers are extreme but legitimate data points that carry valuable information.

Discuss it

When applying regression imputation, what factors need to be taken into consideration?

Both dependent and independent variables
None of the variables
Only the dependent variable
Only the independent variables

When applying regression imputation, both dependent and independent variables need to be taken into consideration. A regression model is built using the complete cases and then this model is used to predict the missing values in the incomplete cases. Therefore, it is important to carefully consider which variables to include in the regression model.

Discuss it

How can EDA assist in identifying errors or anomalies in the dataset?

By conducting a statistical test of normality
By creating a correlation matrix of the variables
By running the dataset through a predefined ML model
By summarizing and visualizing the data, which can reveal unexpected values or patterns

EDA, especially through summarizing and visualizing data, can assist in identifying errors or anomalies in the dataset. Graphical representations of data often make it easier to spot unexpected values, patterns, or aberrations that may not be apparent in the raw data.

Discuss it