A business wants to understand how much revenue they will generate in the next quarter based on historical data. Which type of data analysis will you apply?
- All are equally suitable
- CDA
- EDA
- Predictive Modeling
Predictive Modeling would be the most suitable because it leverages historical data to predict future outcomes, which is exactly what the business needs.
Which of the following plot types is best used to visualize a single continuous variable?
- Pie chart
- Scatter plot
- Histogram
- Bar chart
A Histogram is the best option for visualizing a single continuous variable as it can provide a snapshot of data distribution, showing the center, spread and skewness of the dataset.
When the data distribution is perfectly symmetrical, the skewness is _________.
- Any of these
- Negative
- Positive
- Zero
Skewness is a measure of the asymmetry of a distribution. A perfectly symmetrical distribution has skewness of zero.
If missingness is completely at random and does not depend on any other feature, the data is termed as ________.
- All missing data
- MAR
- MCAR
- NMAR
If missingness is completely at random and does not depend on any other feature, the data is termed as MCAR (Missing Completely at Random).
Unlike EDA and CDA, Predictive Modeling requires _______ to train the model for making predictions.
- data cleaning
- data visualization
- labeled data
- unlabeled data
Predictive Modeling, unlike EDA and CDA, requires labeled data for training the model. This is because Predictive Modeling involves supervised learning, which uses labeled data to learn the relationship between input features and the target variable.
In which kind of scenario is the Poisson Distribution typically used?
- Describing the ages of a population
- Describing the distribution of household incomes
- Modeling the number of times an event occurs in an interval of time or space
- Modeling the probability of success in a fixed number of trials
The Poisson Distribution is typically used to model the number of times an event occurs in a fixed interval of time or space.
What assumptions do we make when using a scatter plot to visualize bivariate data?
- Both variables are numeric
- The data is linearly correlated
- The data is normally distributed
- There are no outliers in the data
When using a scatter plot to visualize bivariate data, we assume that both variables are numeric. Scatter plots are used to visualize the relationship between two continuous or ordinal variables.
In a quality control process at a manufacturing unit, defects occur rarely and independently. Which data distribution would be an appropriate model for the number of defects?
- Binomial Distribution because it represents the number of successes in a given number of trials
- Normal Distribution because it represents continuous data
- Poisson Distribution because it models the number of events occurring in a fixed interval of time
- Uniform Distribution because all outcomes are equally likely
The Poisson Distribution is most suitable for modeling the number of defects in a manufacturing unit because it models the number of events (defects) occurring in a fixed interval of time or space.
What potential issues can arise when identifying outliers using a histogram?
- Choice of bin size may influence outlier detection
- Histograms are not affected by outliers
- Histograms cannot show outliers
- Outliers always distort histograms
The choice of bin size in a histogram can influence outlier detection. Depending on the bin size, an outlier may not be visually detectable.
Is Predictive Modeling usually performed before or after EDA?
- After
- Before
- Not related to EDA
- Simultaneously with EDA
Predictive Modeling is usually performed after EDA. The insights gained during the EDA process, such as understanding the underlying data structure, detecting outliers, and identifying patterns, inform the predictive modeling process.