Given that you need to create a publication-quality figure, which Python library provides the best control over every aspect of the figure properties?
- Bokeh
- Matplotlib
- Plotly
- Seaborn
Matplotlib provides a low-level, object-oriented API for embedding plots into applications and gives the most control over every aspect of the figure properties. This makes it suitable for creating publication-quality figures.
Which Python library is specifically useful for creating interactive plots?
- NumPy
- Plotly
- SciPy
- Seaborn
Plotly is a Python graphing library that makes interactive, publication-quality graphs online. It's perfect for interactive dashboards, data analysis, and visualizations.
Which key metric of model evaluation is most affected by mishandling missing data?
- Accuracy
- F1 Score
- Precision
- Recall
All metrics could be affected, but the accuracy of the model is often the most affected by mishandling of missing data. Incorrect imputation of missing values can lead to the model learning incorrect patterns, resulting in inaccurate predictions.
How can pairwise deletion affect the correlation between variables?
- It can cause overfitting
- It can deflate the correlation
- It can inflate the correlation
- It can lead to underfitting
Pairwise deletion might inflate the correlation between variables. This is because different pairs of data are used to compute each correlation, which might lead to inconsistencies and overly optimistic estimates of the correlations.
A _____ is a visualization tool that displays pairwise relationships in a dataset.
- bar chart
- histogram
- pairplot
- scatter plot
A pairplot is a visualization tool that displays pairwise relationships in a dataset. It shows all bivariate relationships between combinations of variables in a grid format, making it easy to visualize and compare all relationships simultaneously.
The Central Limit Theorem states that the sum of a large number of independent and identically distributed variables will approximately follow a _____ Distribution, regardless of the shape of the original distribution.
- Binomial
- Normal
- Poisson
- Uniform
The Central Limit Theorem states that the sum of a large number of independent and identically distributed variables will approximately follow a Normal Distribution, regardless of the shape of the original distribution.
You are using a box plot to analyze a dataset and observe that the upper whisker is much longer than the lower whisker. What could this indicate about your data?
- Data has negative skewness
- Data has positive skewness
- Data is evenly distributed
- Data is normally distributed
If the upper whisker in a box plot is much longer than the lower whisker, it can indicate that the data has positive skewness, meaning there are a number of data points greater than the median.
What is the primary advantage of using Plotly over Matplotlib?
- Ability to create basic 2D plots
- Ability to create interactive plots
- High-level interface for drawing attractive graphics
- Statistical plotting
Plotly, unlike Matplotlib, allows you to create interactive plots. Users can hover over the points for more information, zoom in, zoom out, and even move the plot in three dimensions in the case of 3D plots.
What is the potential impact of outliers on the analysis of a dataset?
- All of these
- Can affect the statistical significance
- Can influence assumptions of the analysis
- Can lead to incorrect conclusions
Outliers can have significant effects on our conclusions and can affect the basic assumptions of our analyses. They can also impact the statistical significance of the data.
You are conducting a study on the effectiveness of a new drug. Patients rate their pain levels before and after the treatment on a scale of 1-10. What type of data are these ratings?
- Continuous data
- Nominal data
- Ordinal data
- Ratio data
Patients' pain levels are ordinal data as they're categorized into an order (1-10) but the intervals between the levels might not be equivalent.