When is a Poisson distribution used?

When each event is dependent on the previous event
When the events are independent and occur at a constant rate
When the events are normally distributed
When the events have only two possible outcomes

A Poisson distribution is used when we are counting the number of times an event happens over a fixed interval of time or space, and the events are independent and occur at a constant average rate. It's often used to model random events such as calls to a call center or arrivals at a website.

Discuss it

How can qualitative data be transformed into quantitative data for analysis?

By calculating the mean
By coding the responses
By conducting a t-test
This transformation is not possible

Qualitative data can be transformed into quantitative data for analysis by coding the responses. This is a process where categories or themes identified in the qualitative data are assigned numerical codes. These numerical codes can then be used in statistical analyses. For instance, if you have data on types of pets (dogs, cats, etc.), you can assign a numerical code (1 for dogs, 2 for cats, etc.) to transform this qualitative data into quantitative data.

Discuss it

What does it mean when we say that a distribution is skewed?

All data points are identical
It has outliers
It is not symmetric about its mean
Its mean and median are not equal

When we say that a distribution is skewed, we mean that the distribution is not symmetric about its mean. In a skewed distribution, the data points are not evenly distributed around the mean, with more data on one side of the mean than the other.

Discuss it

What does it mean if the p-value in a Chi-square test is smaller than the significance level?

The alternative hypothesis is true
The null hypothesis is true
The test result is insignificant
There is not enough evidence to reject the null hypothesis

If the p-value in a Chi-square test is smaller than the significance level, we reject the null hypothesis in favor of the alternative hypothesis. This suggests that there is a significant association between the variables.

Discuss it

How does multicollinearity affect the coefficients in multiple linear regression?

It doesn't affect the coefficients
It makes the coefficients less interpretable
It makes the coefficients more precise
It makes the coefficients negative

Multicollinearity refers to a situation where two or more predictor variables in a multiple regression model are highly correlated. This high correlation can result in unstable coefficient estimates, making them less reliable and harder to interpret.

Discuss it

What is multicollinearity and how does it affect simple linear regression?

It is the correlation between dependent variables and it has no effect on regression
It is the correlation between errors and it makes the regression model more accurate
It is the correlation between independent variables and it can cause instability in the regression coefficients
It is the correlation between residuals and it causes bias in the regression coefficients

Multicollinearity refers to a high correlation among independent variables in a regression model. It does not reduce the predictive power or reliability of the model as a whole, but it can cause instability in the estimation of individual regression coefficients, making them difficult to interpret.

Discuss it

The distribution of all possible sample means is known as a __________.

Normal Distribution
Population Distribution
Sampling Distribution
Uniform Distribution

The sampling distribution in statistics is the probability distribution of a given statistic based on a random sample. For a statistic that is calculated from a sample, each different sample could (and likely will) provide a different value of that statistic. The sampling distribution shows us how those calculated statistics would be distributed.

Discuss it

How is 'K-means' clustering different from 'hierarchical' clustering?

Hierarchical clustering creates a hierarchy of clusters, while K-means does not
Hierarchical clustering uses centroids, while K-means does not
K-means requires the number of clusters to be defined beforehand, while hierarchical clustering does not
K-means uses a distance metric to group instances, while hierarchical clustering does not

K-means clustering requires the number of clusters to be defined beforehand, while hierarchical clustering does not. Hierarchical clustering forms a dendrogram from which the user can choose the number of clusters based on the problem requirements.

Discuss it

Under what conditions does a binomial distribution approximate a normal distribution?

When the events are not independent
When the number of trials is large and the probability of success is not too close to 0 or 1
When the number of trials is small
When the probability of success changes with each trial

The binomial distribution approaches the normal distribution as the number of trials gets large, provided that the probability of success is not too close to 0 or 1. This is known as the De Moivre–Laplace theorem.

Discuss it

If events A and B are independent, what is the P(A ∩ B)?

P(A) * P(B)
P(A) + P(B)
P(A) - P(B)
P(A) / P(B)

If events A and B are independent, the probability of both events occurring (P(A ∩ B)) is the product of their individual probabilities (P(A) * P(B)). This is a direct result of the Multiplication Rule for independent events.

Discuss it