What is the purpose of a Chi-square test for goodness of fit?
- To compare the means of two groups
- To compare the variance of two groups
- To determine the correlation between two variables
- To test if a data set follows a given theoretical distribution
The Chi-square test for goodness of fit is used to test whether the observed data fits a specific distribution. It compares the observed data with the values that would be expected under the theoretical distribution.
________ clustering is a density-based clustering method that can find arbitrary shaped clusters and is less affected by outliers.
- DBSCAN
- Hierarchical
- K-means
- Spectral
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-based clustering method that can find arbitrary shaped clusters and is less affected by outliers. It works based on the density of points in a region, growing clusters according to the density estimate.
What does the standard deviation measure in a dataset?
- Central tendency
- Dispersion
- Kurtosis
- Skewness
The standard deviation measures the amount of variation or dispersion in a set of values. A low standard deviation indicates that the values tend to be close to the mean (also called the expected value) of the set, while a high standard deviation indicates that the values are spread out over a wider range.
How can you determine skewness of a distribution using a box plot?
- By the height of the box
- By the position of the median in the box
- By the width of the box
- It cannot be determined from a box plot
The skewness of a distribution can be determined using a box plot by looking at the position of the median in the box. If the median is not in the center of the box (i.e., the quartiles are not equidistant from the median), the data is skewed. If the median is closer to the bottom of the box, the data is positively skewed, and if it's closer to the top, the data is negatively skewed.
The ________ score is a measure of how close each point in one cluster is to the points in the neighboring clusters.
- boundary
- distance
- proximity
- silhouette
The silhouette score is a measure of how close each point in one cluster is to the points in the neighboring clusters. It ranges from -1 (incorrect clustering) to +1 (highly dense clustering). 0 indicates overlapping clusters.
What types of scales of measurement are suitable for non-parametric tests?
- Nominal, ordinal, interval, and ratio
- Only interval and ratio
- Only nominal and ordinal
- Only ratio
Non-parametric tests can be used with nominal, ordinal, interval, and ratio scales of measurement. This is one of the reasons why non-parametric tests are sometimes chosen over parametric ones, as they can handle data that are not interval or ratio (which are required for many parametric tests).
In a multiple linear regression model, the assumption that the variance of the residuals is the same for all levels of the predictors is known as __________.
- Autocorrelation
- Homoscedasticity
- Linearity
- Multicollinearity
Homoscedasticity refers to the assumption in regression analysis that the variance of the residuals (or "errors") is constant across all levels of the independent variables.
Simple linear regression is a method used to predict a ________ variable using a ________ variable.
- continuous, discrete
- dependent, independent
- discrete, continuous
- independent, dependent
Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables: One variable, denoted x, is regarded as the predictor, explanatory, or independent variable. The other variable, denoted y, is regarded as the response, outcome, or dependent variable.
Can the probability of an event be a negative number?
- It depends on the event
- No
- Only if the event is impossible
- Yes
The probability of an event cannot be a negative number. By definition, the probability of an event is a number between 0 and 1, inclusive.
What is the key characteristic of a symmetric distribution?
- It has a mean of zero
- It has a mode at the peak
- It has no outliers
- It has the same shape on the left and right when split vertically at the center
The key characteristic of a symmetric distribution is that it has the same shape on the left and right when split vertically at the center (i.e., about the mean). This means that the frequencies of corresponding values on either side of the center are equal.