The normal distribution is also known as the ________ distribution.
- Exponential
- Gaussian
- Poisson
- Uniform
The normal distribution is also known as the Gaussian distribution. It is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is bell-shaped.
How do you calculate the probability of the intersection of two independent events?
- P(A ∩ B) = P(A) * P(B)
- P(A ∩ B) = P(A) + P(B)
- P(A ∩ B) = P(A) - P(B)
- P(A ∩ B) = P(A) / P(B)
The probability of the intersection of two independent events is calculated as the product of their individual probabilities. So if A and B are independent, P(A ∩ B) = P(A) * P(B). This is a direct result of the Multiplication Rule for independent events.
What type of data represents characteristics or attributes?
- Categorical data
- Ordinal data
- Qualitative data
- Quantitative data
Qualitative data represents characteristics or attributes. It is often non-numerical and may include qualities such as textures, colors, smells, tastes, appearance, beauty, etc. This data type is commonly used in fields such as sociology, marketing, and psychology.
How is the strength of correlation between two variables determined?
- By the correlation coefficient
- By the number of data points
- By the slope of the line of best fit
- By the y-intercept of the line of best fit
The strength of correlation between two variables is determined by the correlation coefficient. A value close to +1 or -1 indicates a strong correlation, while a value close to 0 indicates a weak or no correlation.
How does the sample size affect the power of the Kruskal-Wallis Test?
- It depends on the data
- Larger sample sizes decrease power
- Larger sample sizes increase power
- Sample size has no effect on power
Larger sample sizes increase the power of the Kruskal-Wallis Test. Power is the ability of a test to detect a true effect when there is one.
Polynomial regression allows us to model a relationship between the dependent variable and independent variables as a _________.
- High
- Linear equation
- Non-linear equation
- Straight line
Polynomial regression allows us to model the relationship between the dependent variable and independent variables as a non-linear equation. This is achieved by raising independent variables to a power, allowing the model to fit more complex data patterns.
When a data distribution is skewed, which measure of central tendency is typically the most reliable?
- Mean
- Median
- Mode
- nan
The median is usually the most reliable measure of central tendency when a data distribution is skewed. Unlike the mean, the median isn't influenced by extreme values. Therefore, in a skewed distribution, the median generally gives a better idea of the typical value than the mean.
What is 'dendrogram' in hierarchical clustering?
- A diagram showing the change in the number of clusters
- A graph showing the distribution of clusters
- A tree-like diagram that represents the hierarchy of clusters
- The center point of a cluster
A dendrogram is a tree-like diagram that is used in hierarchical clustering to represent the hierarchy of clusters. Each join in the dendrogram represents the two clusters merging, and the height of the join is the distance between those clusters.
The __________ plot is used to check the linearity and equal variance assumptions of a multiple linear regression.
- Cook's Distance
- Leverage
- Quantile-Quantile
- Residuals vs fitted values
The residuals vs fitted values plot is commonly used in regression diagnostics to check the assumptions of linearity and equal variance (homoscedasticity). The residuals should be scattered randomly around zero, and the spread of the residuals should not change with the fitted values.
In the context of probability distributions, what is a random variable?
- A variable that always takes a constant value
- A variable that does not have a specific value
- A variable that is not influenced by other variables
- A variable whose outcome is based on the result of a random event
A random variable is a variable whose possible values are outcomes of a random event. It can be either discrete (having specific values) or continuous (any value within a certain range).
What are the two subtypes of quantitative data?
- Categorical and Ordinal
- Discrete and Continuous
- Interval and Ratio
- Nominal and Categorical
Quantitative data can be classified into two subtypes: discrete and continuous. Discrete data can only take certain values (like the number of children in a family – 1, 2, 3, etc.) and Continuous data can take any value within a given range or continuum (like height or weight of a person).
How does a probability mass function differ from a probability density function?
- A probability mass function is used for continuous random variables, while a probability density function is used for discrete random variables
- A probability mass function is used for discrete random variables, while a probability density function is used for continuous random variables
- The two terms are interchangeable
- There is no difference between a probability mass function and a probability density function
A probability mass function is used for discrete random variables and gives the probability that a discrete random variable is exactly equal to some value. A probability density function, on the other hand, is used for continuous random variables and gives the density of the variable at a particular value.