What factors should be considered when assessing the aesthetics of a data visualization?

  • The balance, simplicity, clarity, and color scheme
  • The designer's personal taste
  • The latest trends in data visualization
  • The time it took to create the visualization
Aesthetics in data visualization involve multiple factors including balance (equal weightage to all parts), simplicity (avoiding unnecessary complexity), clarity (clearly understandable), and the color scheme (which can direct attention, represent categories, or express quantities). Good aesthetics make the data easy to understand and the message memorable.

How many variables can a heatmap typically visualize at once?

  • Any number
  • Four
  • Three
  • Two
A heatmap can visualize any number of variables at once. Each cell in the heatmap corresponds to a combination of categories from all the variables.

Which Python library is specifically useful for creating interactive plots?

  • NumPy
  • Plotly
  • SciPy
  • Seaborn
Plotly is a Python graphing library that makes interactive, publication-quality graphs online. It's perfect for interactive dashboards, data analysis, and visualizations.

Which key metric of model evaluation is most affected by mishandling missing data?

  • Accuracy
  • F1 Score
  • Precision
  • Recall
All metrics could be affected, but the accuracy of the model is often the most affected by mishandling of missing data. Incorrect imputation of missing values can lead to the model learning incorrect patterns, resulting in inaccurate predictions.

How can pairwise deletion affect the correlation between variables?

  • It can cause overfitting
  • It can deflate the correlation
  • It can inflate the correlation
  • It can lead to underfitting
Pairwise deletion might inflate the correlation between variables. This is because different pairs of data are used to compute each correlation, which might lead to inconsistencies and overly optimistic estimates of the correlations.

A _____ is a visualization tool that displays pairwise relationships in a dataset.

  • bar chart
  • histogram
  • pairplot
  • scatter plot
A pairplot is a visualization tool that displays pairwise relationships in a dataset. It shows all bivariate relationships between combinations of variables in a grid format, making it easy to visualize and compare all relationships simultaneously.

The Central Limit Theorem states that the sum of a large number of independent and identically distributed variables will approximately follow a _____ Distribution, regardless of the shape of the original distribution.

  • Binomial
  • Normal
  • Poisson
  • Uniform
The Central Limit Theorem states that the sum of a large number of independent and identically distributed variables will approximately follow a Normal Distribution, regardless of the shape of the original distribution.

You are using a box plot to analyze a dataset and observe that the upper whisker is much longer than the lower whisker. What could this indicate about your data?

  • Data has negative skewness
  • Data has positive skewness
  • Data is evenly distributed
  • Data is normally distributed
If the upper whisker in a box plot is much longer than the lower whisker, it can indicate that the data has positive skewness, meaning there are a number of data points greater than the median.

What is the primary advantage of using Plotly over Matplotlib?

  • Ability to create basic 2D plots
  • Ability to create interactive plots
  • High-level interface for drawing attractive graphics
  • Statistical plotting
Plotly, unlike Matplotlib, allows you to create interactive plots. Users can hover over the points for more information, zoom in, zoom out, and even move the plot in three dimensions in the case of 3D plots.

What is the potential impact of outliers on the analysis of a dataset?

  • All of these
  • Can affect the statistical significance
  • Can influence assumptions of the analysis
  • Can lead to incorrect conclusions
Outliers can have significant effects on our conclusions and can affect the basic assumptions of our analyses. They can also impact the statistical significance of the data.