What does it mean if the Chi-square test for goodness of fit is statistically significant?

The observed data and theoretical distribution are negatively correlated
The observed data and theoretical distribution are positively correlated
The observed data differs significantly from what we would expect if it followed the theoretical distribution
The observed data fits the theoretical distribution perfectly

If the Chi-square test for goodness of fit is statistically significant, this means that the observed data differs significantly from what we would expect if the data followed the theoretical distribution.

Discuss it

In multiple regression, model selection aims to choose the most _______ model that best predicts the response variable.

complex
overfit
parsimonious
simple

In multiple regression, model selection aims to choose the most parsimonious model that best predicts the response variable. A parsimonious model is a model that accomplishes the desired level of explanation or prediction with as few predictor variables as possible.

Discuss it

What does a p-value represent in a t-test or Z-test?

All of the above
The probability of observing the sample data if the null hypothesis is true
The probability of rejecting the null hypothesis when it is true
The probability of the sample mean being equal to the population mean

In a t-test or Z-test, the p-value represents the probability of obtaining a sample statistic as extreme or more extreme than the observed statistic, assuming the null hypothesis is true.

Discuss it

A Type I error occurs when we reject the null hypothesis, even though it is _______.

FALSE
Not applicable
Not proven
TRUE

A Type I error occurs when we reject the null hypothesis, even though it is true. This is also known as a "false positive" error.

Discuss it

What are the two main types of data in statistics?

Categorical and Numerical
Discrete and Continuous
Parametric and Nonparametric
Qualitative and Quantitative

The two main types of data in statistics are Qualitative and Quantitative. Qualitative data, also known as categorical data, represents characteristics or attributes and cannot be mathematically quantified. Quantitative data, on the other hand, is numerical, representing measurements or counts that can be quantified mathematically.

Discuss it

What is the difference between mutually exclusive and independent events?

Mutually exclusive events always happen together; independent events never happen together
Mutually exclusive events can't occur at the same time; independent events don't influence each other
Mutually exclusive events influence each other; independent events can't occur at the same time
There is no difference

Mutually exclusive events are events that cannot occur at the same time - the occurrence of one event excludes the occurrence of the other(s). On the other hand, independent events are those where the occurrence of one event does not affect the probability of the occurrence of the other event(s). The concepts are related but distinct.

Discuss it

What is the primary objective of cluster analysis?

To classify variables into different groups
To group similar instances into clusters
To predict the output variable
To visualize high-dimensional data

The primary objective of cluster analysis is to group similar instances (observations, data points, etc.) into clusters.

Discuss it

If the results of an ANOVA test are significant, ________ tests are often used to identify specifically which groups' means are different.

Interaction
Post-hoc
Pre-hoc
Tukey

If the results of an ANOVA test are significant, post-hoc tests are often used to identify specifically which groups' means are different. These tests are performed after the ANOVA and help to avoid type I errors when making multiple comparisons.

Discuss it

How does 'DBSCAN' clustering differ from 'K-means' and 'hierarchical' clustering?

DBSCAN can find arbitrarily shaped clusters and is less affected by outliers
DBSCAN creates a hierarchy of clusters
DBSCAN requires the number of clusters to be specified
DBSCAN uses centroid to form the clusters

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) differs from K-means and hierarchical clustering in that it can find arbitrarily shaped clusters, and it's less affected by outliers. It does not require the user to set the number of clusters a priori, but instead, it infers the number of clusters based on the data.

Discuss it

In Bayes' theorem, what is the posterior probability?

The likelihood of the evidence
The probability of an event before evidence is observed
The probability of the evidence given the event
The updated probability of an event after evidence is observed

In Bayes' Theorem, the posterior probability is the updated probability of an event after new evidence has been observed. It is calculated by multiplying the likelihood and the prior probability and then dividing by the probability of the evidence.

Discuss it