You are required to visualize the density of data on a single continuous variable. Which type of plot would you use and why?
- Scatter plot
- Line graph
- Kernel Density Plot
- Bar graph
A Kernel Density Plot is the best option to visualize the density of data on a single continuous variable. This plot provides a smooth curve that gives a clear idea about the density of the distribution.
What is the effect of redundant features on a machine learning model?
- All of the above
- They can lead to overfitting
- They can reduce the interpretability of the model
- They can slow down the learning process
Redundant features can lead to overfitting, slow down the learning process, and reduce the interpretability of the model.
You've applied the IQR method on your dataset and found no outliers. However, you suspect there may be some. What could be your next steps?
- All of these
- Increase the IQR threshold
- Inspect the data visually
- Use a different outlier detection method
When the IQR method fails to detect suspected outliers, it may be useful to try a different approach, such as increasing the IQR threshold, using a different method, or visualizing the data.
What are the 'dots' appearing outside the 'whiskers' of a box plot typically indicating?
- The mean of the data
- The median of the data
- The mode of the data
- The outliers in the data
In a box plot, 'dots' that appear outside the 'whiskers' are typically indicating the outliers in the data.
A ___________ skewness indicates that the data distribution is skewed to the left.
- Any of these
- Negative
- Positive
- Zero
Negative skewness refers to a distribution where the left tail is longer or fatter than the right tail. In such distributions, the majority of the values (including the median and the mode) tend to be greater than the mean.
For a multimodal distribution, which measure of central tendency may not be very informative?
- Mean
- Median
- Mode
- nan
For a multimodal distribution (distribution with more than one peak), the "Mean" may not be very informative. In such distributions, the mean may not be representative of any central value, as it can be influenced by the multiple peaks in the data, leading to an unrepresentative measure of the center.
How are correlation coefficients affected when transformations are applied to the data?
- Correlation coefficients can change
- Correlation coefficients can decrease
- Correlation coefficients can increase
- Transformations do not affect correlation coefficients
Correlation coefficients can change when transformations are applied to the data. The exact effect depends on the transformation and the nature of the data. Transformations can linearize relationships, reduce skewness, or spread out data that is concentrated at a single point, all of which can change the correlation coefficient.
The ______ of a scatter plot may indicate the presence of outliers in the dataset.
- correlation
- scatter
- slope
- trend line
In a scatter plot, the scattering or spread of data points can help identify outliers. Points that are distant from the main concentration of data can indicate potential outliers.
The process of combining highly correlated variables into one is called _________.
- Data Aggregation
- Principal Component Analysis (PCA)
- Standardization
- Variance Inflation
When dealing with multicollinearity, one approach is to combine the correlated variables into one using a technique such as Principal Component Analysis (PCA). PCA creates new uncorrelated variables that capture the information of the original variables.
A data analyst needs to demonstrate the occurrence of outliers in a dataset using a plot. Which plot type would you recommend and why?
- Bar graph
- Box plot
- Line graph
- Scatter plot
The Box plot is ideal for demonstrating outliers in a dataset. The 'whiskers' in a box plot represent the range for the bulk of the data, and any data point that falls outside of this range is visually represented as an outlier.
You create a histogram of a dataset and notice that the frequency count is very high on the far right of the distribution but drops significantly after that. What can be inferred from this?
- Data has a negative skewness
- Data has a positive skewness
- Data is evenly distributed
- Data is normally distributed
If the frequency count in a histogram is very high on the far right but drops significantly after that, it can indicate that the data has a positive skewness.
You have a scatter plot with a strong positive correlation, but a few points are far from the correlation line. What might these points represent?
- Correlated data points
- False positives
- Normal data points
- Outliers
In a scatter plot, points that are far away from the correlation line often represent outliers.