What does it mean if the Chi-square test for goodness of fit is statistically significant?
- The observed data and theoretical distribution are negatively correlated
- The observed data and theoretical distribution are positively correlated
- The observed data differs significantly from what we would expect if it followed the theoretical distribution
- The observed data fits the theoretical distribution perfectly
If the Chi-square test for goodness of fit is statistically significant, this means that the observed data differs significantly from what we would expect if the data followed the theoretical distribution.
In multiple regression, model selection aims to choose the most _______ model that best predicts the response variable.
- complex
- overfit
- parsimonious
- simple
In multiple regression, model selection aims to choose the most parsimonious model that best predicts the response variable. A parsimonious model is a model that accomplishes the desired level of explanation or prediction with as few predictor variables as possible.
What does a p-value represent in a t-test or Z-test?
- All of the above
- The probability of observing the sample data if the null hypothesis is true
- The probability of rejecting the null hypothesis when it is true
- The probability of the sample mean being equal to the population mean
In a t-test or Z-test, the p-value represents the probability of obtaining a sample statistic as extreme or more extreme than the observed statistic, assuming the null hypothesis is true.
A Type I error occurs when we reject the null hypothesis, even though it is _______.
- FALSE
- Not applicable
- Not proven
- TRUE
A Type I error occurs when we reject the null hypothesis, even though it is true. This is also known as a "false positive" error.
What are the two main types of data in statistics?
- Categorical and Numerical
- Discrete and Continuous
- Parametric and Nonparametric
- Qualitative and Quantitative
The two main types of data in statistics are Qualitative and Quantitative. Qualitative data, also known as categorical data, represents characteristics or attributes and cannot be mathematically quantified. Quantitative data, on the other hand, is numerical, representing measurements or counts that can be quantified mathematically.
What is the difference between mutually exclusive and independent events?
- Mutually exclusive events always happen together; independent events never happen together
- Mutually exclusive events can't occur at the same time; independent events don't influence each other
- Mutually exclusive events influence each other; independent events can't occur at the same time
- There is no difference
Mutually exclusive events are events that cannot occur at the same time - the occurrence of one event excludes the occurrence of the other(s). On the other hand, independent events are those where the occurrence of one event does not affect the probability of the occurrence of the other event(s). The concepts are related but distinct.
What is the primary objective of cluster analysis?
- To classify variables into different groups
- To group similar instances into clusters
- To predict the output variable
- To visualize high-dimensional data
The primary objective of cluster analysis is to group similar instances (observations, data points, etc.) into clusters.
If the results of an ANOVA test are significant, ________ tests are often used to identify specifically which groups' means are different.
- Interaction
- Post-hoc
- Pre-hoc
- Tukey
If the results of an ANOVA test are significant, post-hoc tests are often used to identify specifically which groups' means are different. These tests are performed after the ANOVA and help to avoid type I errors when making multiple comparisons.
How does 'DBSCAN' clustering differ from 'K-means' and 'hierarchical' clustering?
- DBSCAN can find arbitrarily shaped clusters and is less affected by outliers
- DBSCAN creates a hierarchy of clusters
- DBSCAN requires the number of clusters to be specified
- DBSCAN uses centroid to form the clusters
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) differs from K-means and hierarchical clustering in that it can find arbitrarily shaped clusters, and it's less affected by outliers. It does not require the user to set the number of clusters a priori, but instead, it infers the number of clusters based on the data.
In Bayes' theorem, what is the posterior probability?
- The likelihood of the evidence
- The probability of an event before evidence is observed
- The probability of the evidence given the event
- The updated probability of an event after evidence is observed
In Bayes' Theorem, the posterior probability is the updated probability of an event after new evidence has been observed. It is calculated by multiplying the likelihood and the prior probability and then dividing by the probability of the evidence.