In multiple regression, model selection aims to choose the most _______ model that best predicts the response variable.
- complex
- overfit
- parsimonious
- simple
In multiple regression, model selection aims to choose the most parsimonious model that best predicts the response variable. A parsimonious model is a model that accomplishes the desired level of explanation or prediction with as few predictor variables as possible.
What do we call an experiment in probability theory?
- A process that produces outcomes
- A statistical analysis
- A test of a hypothesis
- An observation of a random variable
In probability theory, an experiment refers to a process or procedure that produces outcomes. The outcomes depend on chance or randomness. For example, tossing a coin or rolling a die is considered a random experiment because the outcome is not certain but depends on chance.
What does a p-value represent in a t-test or Z-test?
- All of the above
- The probability of observing the sample data if the null hypothesis is true
- The probability of rejecting the null hypothesis when it is true
- The probability of the sample mean being equal to the population mean
In a t-test or Z-test, the p-value represents the probability of obtaining a sample statistic as extreme or more extreme than the observed statistic, assuming the null hypothesis is true.
A Type I error occurs when we reject the null hypothesis, even though it is _______.
- FALSE
- Not applicable
- Not proven
- TRUE
A Type I error occurs when we reject the null hypothesis, even though it is true. This is also known as a "false positive" error.
What are the two main types of data in statistics?
- Categorical and Numerical
- Discrete and Continuous
- Parametric and Nonparametric
- Qualitative and Quantitative
The two main types of data in statistics are Qualitative and Quantitative. Qualitative data, also known as categorical data, represents characteristics or attributes and cannot be mathematically quantified. Quantitative data, on the other hand, is numerical, representing measurements or counts that can be quantified mathematically.
What is the difference between mutually exclusive and independent events?
- Mutually exclusive events always happen together; independent events never happen together
- Mutually exclusive events can't occur at the same time; independent events don't influence each other
- Mutually exclusive events influence each other; independent events can't occur at the same time
- There is no difference
Mutually exclusive events are events that cannot occur at the same time - the occurrence of one event excludes the occurrence of the other(s). On the other hand, independent events are those where the occurrence of one event does not affect the probability of the occurrence of the other event(s). The concepts are related but distinct.
How does 'DBSCAN' clustering differ from 'K-means' and 'hierarchical' clustering?
- DBSCAN can find arbitrarily shaped clusters and is less affected by outliers
- DBSCAN creates a hierarchy of clusters
- DBSCAN requires the number of clusters to be specified
- DBSCAN uses centroid to form the clusters
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) differs from K-means and hierarchical clustering in that it can find arbitrarily shaped clusters, and it's less affected by outliers. It does not require the user to set the number of clusters a priori, but instead, it infers the number of clusters based on the data.
In Bayes' theorem, what is the posterior probability?
- The likelihood of the evidence
- The probability of an event before evidence is observed
- The probability of the evidence given the event
- The updated probability of an event after evidence is observed
In Bayes' Theorem, the posterior probability is the updated probability of an event after new evidence has been observed. It is calculated by multiplying the likelihood and the prior probability and then dividing by the probability of the evidence.
The range of values around the point estimate that captures the true population parameter at some predetermined confidence level is called a ________ interval.
- Confidence
- Correlation
- Deviation
- Variable
The range of values around the point estimate that captures the true population parameter at some predetermined confidence level is called a confidence interval. Confidence intervals are used in statistics to indicate the reliability of an estimate.
To prevent overfitting, we can apply a technique called ________ in polynomial regression.
- Aggregation
- Factorization
- Normalization
- Regularization
To prevent overfitting, we can apply a technique called regularization in polynomial regression. Regularization involves adding a penalty term to the loss function during the process of training a model. This penalty term discourages the coefficients of the model from reaching large values, leading to a simpler model that's less likely to overfit.