The term '________' refers to the sharpness of the peak of a frequency-distribution curve.

Kurtosis
Median
Mode
Skewness

Kurtosis refers to the sharpness of the peak of a frequency-distribution curve. It measures the tails and sharpness of the distribution. Distributions with large kurtosis exhibit tail data exceeding the tails of the normal distribution.

Discuss it

How does factor analysis differ from principal component analysis (PCA)?

Factor analysis does not involve rotation of variables, while PCA does
Factor analysis looks for shared variance while PCA looks for total variance
PCA focuses on unobservable variables, while factor analysis focuses on observable variables
PCA is used for dimensionality reduction, while factor analysis is used for data cleaning

Factor analysis and PCA differ primarily in what they seek to model. Factor analysis models the shared variance among variables, focusing on the latent or unobservable variables, while PCA models the total variance and aims at reducing the dimensionality.

Discuss it

How would an outlier affect the confidence interval for a mean?

It would make the interval narrower
It would make the interval skewed
It would make the interval wider
It would not affect the interval

An outlier can significantly affect the mean and increase the variability in the data, which would lead to a larger standard error and thus a wider confidence interval.

Discuss it

What is the difference between descriptive and inferential statistics?

Descriptive and inferential statistics are the same
Descriptive statistics predict trends; inferential statistics summarize data
Descriptive statistics summarize data; inferential statistics make predictions about the population
Descriptive statistics summarize data; inferential statistics visualize data

Descriptive statistics provide simple summaries about the sample and the measures. It's about describing the collected data using the measures such as mean, median, mode, etc. On the other hand, inferential statistics takes data from a sample and makes inferences about the larger population from which the sample was drawn. It is the process of using data analysis to deduce properties of an underlying distribution of probability.

Discuss it

What is the difference between a parameter and a statistic in the field of statistics?

A parameter and a statistic are the same thing
A parameter is based on a sample; a statistic is based on the population
A statistic is a numerical measure; a parameter is a graphical representation
A statistic is based on a sample; a parameter is based on the population

In the field of statistics, a parameter is a numerical characteristic of a population, whereas a statistic is a numerical characteristic of a sample. Parameters are often unknown because we cannot examine the entire population. We use statistics, which we compute from sample data, to estimate parameters.

Discuss it

How does adding more predictors to a multiple linear regression model affect its inferences?

It always improves the model
It always makes the model worse
It can lead to overfitting
It has no effect on the model

Adding more predictors to a model may increase the R-squared value, making it appear that the model is improving. However, if these additional predictors are not truly associated with the response variable, it may result in overfitting, making the model perform poorly on new, unseen data.

Discuss it

How does ridge regression help in dealing with multicollinearity?

By eliminating the correlated variables.
By increasing the sample size.
By introducing a penalty term to shrink the coefficients.
By transforming the variables.

Ridge regression introduces a regularization term (penalty term) into the loss function which helps to shrink the coefficients towards zero and mitigate the effect of multicollinearity.

Discuss it

Which mathematical concept is at the core of PCA?

Differentiation
Eigenvalues and Eigenvectors
Integration
Matrix Multiplication

PCA relies heavily on the concepts of Eigenvalues and Eigenvectors. These allow it to determine the axes along which the data has the most variance, which are used to form the new variables (principal components).

Discuss it

___________ refers to the condition where the variance of the errors or residuals is constant across all levels of the explanatory variables.

Autocorrelation
Heteroscedasticity
Homoscedasticity
Multicollinearity

Homoscedasticity is the condition in which the variance of the errors or residuals is constant across all levels of the explanatory variables. It is one of the key assumptions of linear regression.

Discuss it

The type of data that describes attributes or characteristics of a group is called ________ data.

Continuous
Discrete
Qualitative
Quantitative

The type of data that describes attributes or characteristics of a group is called Qualitative data. These are often non-numeric and may include data types such as text, audio, or video. Examples include a person's gender, eye color, or the make of a car.

Discuss it