How does negative kurtosis affect the tails of a data distribution?
- It has no effect on the tails of the distribution.
- It makes the distribution perfectly symmetrical.
- It makes the tails of the distribution heavier.
- It makes the tails of the distribution lighter.
Negative kurtosis, also known as platykurtic kurtosis, makes the tails of the data distribution lighter, indicating fewer extreme outliers. The distribution is flatter or more spread out than a normal distribution.
Which type of analysis is most commonly used for hypothesis testing?
- CDA
- Data Visualization
- EDA
- Predictive Modeling
CDA (Confirmatory Data Analysis) is most commonly used for hypothesis testing. While EDA is used to formulate hypotheses, CDA uses statistical techniques to confirm or reject these hypotheses.
The _________ library in Python allows for the creation of complex animated plots and provides widgets to allow for interactive plots.
- Bokeh
- Matplotlib
- Plotly
- Seaborn
Bokeh is a powerful library for creating interactive plots, including complex animated plots, and it includes support for widgets, making it a great tool for creating dynamic, interactive visualizations.
What is the full form of NMAR in the context of missing data?
- Never Missing At Random
- No Missing At Random
- Not Measured At Random
- Not Missing At Random
In the context of missing data, NMAR stands for Not Missing At Random.
To create multiple plots in one figure in Matplotlib, you would use the ___________ function.
- heatmap
- pairplot
- subplot
- violinplot
The 'subplot' function in Matplotlib is used to create multiple plots in a single figure. It allows you to arrange plots in a grid structure.
You find that both Z-score and modified Z-score methods give different sets of outliers for the same dataset. How will you reconcile this?
- Assume the Z-score method is correct
- Assume the modified Z-score method is correct
- Consider the intersection of both methods
- Further inspect the data and the assumptions of each method
When two methods give different sets of outliers, it's best to further inspect the data and the assumptions of each method before drawing conclusions.
Which method for dealing with missing data might introduce bias if the data is not missing completely at random?
- Listwise Deletion
- Mean/Median/Mode Imputation
- Pairwise Deletion
- Regression Imputation
Mean/Median/Mode Imputation might introduce bias if the data is not missing completely at random. If missing values have some systematic patterns, replacing them with mean, median, or mode might lead to incorrect estimation of variability and biased results.
Readability in data visualization refers to how easily the audience can __________.
- Analyze the underlying code
- Download the graph
- Interact with the graph
- Understand the represented data
Readability in data visualization refers to how easily the audience can understand the represented data. This includes the clarity of text elements (labels, title, caption), color scheme, and whether the choice of plot type makes sense for the represented data.
What is the underlying JavaScript library that Plotly uses to render its graphics?
- D3.js
- Node.js
- React.js
- jQuery
Plotly uses D3.js (Data-Driven Documents) under the hood to render its graphics. D3.js is a JavaScript library for producing dynamic and interactive data visualizations in web browsers.
You're working with a data set where a few observations are vastly different from the rest. Which method, Z-score or IQR, would be more robust to use for outlier detection?
- Either would work equally well
- IQR
- Neither would be effective
- Z-score
The IQR method is more robust than Z-score for outlier detection in this scenario, as Z-scores can be significantly affected by extreme values.
What is an outlier in the context of Exploratory Data Analysis?
- A data point that falls outside of the normal range
- A data point that is a duplicate
- A data point that is missing
- A frequently occurring data point
In statistics, an outlier is an observation that lies an abnormal distance from other values in a random sample from a population. In simple terms, an outlier is a value that is significantly different from other similar values.
What is the key difference between 'removal' and 'transformation' of outliers?
- Removal changes the data distribution, while transformation does not
- Removal deals with extreme values, while transformation does not
- Removal discards outliers, while transformation modifies their values
- Removal is a type of data cleaning, while transformation is not
The key difference between 'removal' and 'transformation' of outliers is that removal discards outliers from the dataset, while transformation modifies the values of outliers to reduce their impact.