A business wants to understand how much revenue they will generate in the next quarter based on historical data. Which type of data analysis will you apply?

All are equally suitable
CDA
EDA
Predictive Modeling

Predictive Modeling would be the most suitable because it leverages historical data to predict future outcomes, which is exactly what the business needs.

Discuss it

Which of the following plot types is best used to visualize a single continuous variable?

Pie chart
Scatter plot
Histogram
Bar chart

A Histogram is the best option for visualizing a single continuous variable as it can provide a snapshot of data distribution, showing the center, spread and skewness of the dataset.

Discuss it

Is Predictive Modeling usually performed before or after EDA?

After
Before
Not related to EDA
Simultaneously with EDA

Predictive Modeling is usually performed after EDA. The insights gained during the EDA process, such as understanding the underlying data structure, detecting outliers, and identifying patterns, inform the predictive modeling process.

Discuss it

When the data distribution is perfectly symmetrical, the skewness is _________.

Any of these
Negative
Positive
Zero

Skewness is a measure of the asymmetry of a distribution. A perfectly symmetrical distribution has skewness of zero.

Discuss it

If missingness is completely at random and does not depend on any other feature, the data is termed as ________.

All missing data
MAR
MCAR
NMAR

If missingness is completely at random and does not depend on any other feature, the data is termed as MCAR (Missing Completely at Random).

Discuss it

Unlike EDA and CDA, Predictive Modeling requires _______ to train the model for making predictions.

data cleaning
data visualization
labeled data
unlabeled data

Predictive Modeling, unlike EDA and CDA, requires labeled data for training the model. This is because Predictive Modeling involves supervised learning, which uses labeled data to learn the relationship between input features and the target variable.

Discuss it

In which kind of scenario is the Poisson Distribution typically used?

Describing the ages of a population
Describing the distribution of household incomes
Modeling the number of times an event occurs in an interval of time or space
Modeling the probability of success in a fixed number of trials

The Poisson Distribution is typically used to model the number of times an event occurs in a fixed interval of time or space.

Discuss it

What assumptions do we make when using a scatter plot to visualize bivariate data?

Both variables are numeric
The data is linearly correlated
The data is normally distributed
There are no outliers in the data

When using a scatter plot to visualize bivariate data, we assume that both variables are numeric. Scatter plots are used to visualize the relationship between two continuous or ordinal variables.

Discuss it

In a quality control process at a manufacturing unit, defects occur rarely and independently. Which data distribution would be an appropriate model for the number of defects?

Binomial Distribution because it represents the number of successes in a given number of trials
Normal Distribution because it represents continuous data
Poisson Distribution because it models the number of events occurring in a fixed interval of time
Uniform Distribution because all outcomes are equally likely

The Poisson Distribution is most suitable for modeling the number of defects in a manufacturing unit because it models the number of events (defects) occurring in a fixed interval of time or space.

Discuss it

What potential issues can arise when identifying outliers using a histogram?

Choice of bin size may influence outlier detection
Histograms are not affected by outliers
Histograms cannot show outliers
Outliers always distort histograms

The choice of bin size in a histogram can influence outlier detection. Depending on the bin size, an outlier may not be visually detectable.

Discuss it