In statistics, the entire group of individuals or observations that we want to understand is called the _______.
- distribution
- parameter
- population
- sample
In statistics, a population is the entire group of individuals or observations that we want to understand or draw conclusions about. It's the total set of observations that can be made. For example, if you want to know the average height of an adult male in the US, the population would be all adult males in the US.
How does Pearson's Correlation Coefficient handle outliers?
- Automatically removes outliers
- Converts outliers to mean values
- Ignores outliers
- Is highly sensitive to outliers
Pearson's Correlation Coefficient is highly sensitive to outliers. This is because it involves a mean and standard deviation calculation, and these values can be greatly influenced by outliers. Even a single outlier can significantly skew the result of the correlation.
What does a p-value represent in a t-test or Z-test?
- All of the above
- The probability of observing the sample data if the null hypothesis is true
- The probability of rejecting the null hypothesis when it is true
- The probability of the sample mean being equal to the population mean
In a t-test or Z-test, the p-value represents the probability of obtaining a sample statistic as extreme or more extreme than the observed statistic, assuming the null hypothesis is true.
A Type I error occurs when we reject the null hypothesis, even though it is _______.
- FALSE
- Not applicable
- Not proven
- TRUE
A Type I error occurs when we reject the null hypothesis, even though it is true. This is also known as a "false positive" error.
What are the two main types of data in statistics?
- Categorical and Numerical
- Discrete and Continuous
- Parametric and Nonparametric
- Qualitative and Quantitative
The two main types of data in statistics are Qualitative and Quantitative. Qualitative data, also known as categorical data, represents characteristics or attributes and cannot be mathematically quantified. Quantitative data, on the other hand, is numerical, representing measurements or counts that can be quantified mathematically.
What is the difference between mutually exclusive and independent events?
- Mutually exclusive events always happen together; independent events never happen together
- Mutually exclusive events can't occur at the same time; independent events don't influence each other
- Mutually exclusive events influence each other; independent events can't occur at the same time
- There is no difference
Mutually exclusive events are events that cannot occur at the same time - the occurrence of one event excludes the occurrence of the other(s). On the other hand, independent events are those where the occurrence of one event does not affect the probability of the occurrence of the other event(s). The concepts are related but distinct.
What is the primary objective of cluster analysis?
- To classify variables into different groups
- To group similar instances into clusters
- To predict the output variable
- To visualize high-dimensional data
The primary objective of cluster analysis is to group similar instances (observations, data points, etc.) into clusters.
If the results of an ANOVA test are significant, ________ tests are often used to identify specifically which groups' means are different.
- Interaction
- Post-hoc
- Pre-hoc
- Tukey
If the results of an ANOVA test are significant, post-hoc tests are often used to identify specifically which groups' means are different. These tests are performed after the ANOVA and help to avoid type I errors when making multiple comparisons.
How does 'DBSCAN' clustering differ from 'K-means' and 'hierarchical' clustering?
- DBSCAN can find arbitrarily shaped clusters and is less affected by outliers
- DBSCAN creates a hierarchy of clusters
- DBSCAN requires the number of clusters to be specified
- DBSCAN uses centroid to form the clusters
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) differs from K-means and hierarchical clustering in that it can find arbitrarily shaped clusters, and it's less affected by outliers. It does not require the user to set the number of clusters a priori, but instead, it infers the number of clusters based on the data.
In Bayes' theorem, what is the posterior probability?
- The likelihood of the evidence
- The probability of an event before evidence is observed
- The probability of the evidence given the event
- The updated probability of an event after evidence is observed
In Bayes' Theorem, the posterior probability is the updated probability of an event after new evidence has been observed. It is calculated by multiplying the likelihood and the prior probability and then dividing by the probability of the evidence.