How do non-parametric tests treat data points?

  • They analyze only the maximum and minimum data values
  • They analyze ranks rather than actual data values
  • They analyze the median of the data set only
  • They ignore outliers in the data set
Non-parametric tests treat data points by analyzing their ranks rather than their actual values. This makes non-parametric tests less sensitive to extreme values and makes them a good choice when dealing with skewed data or data with many outliers.

What is interval estimation in inferential statistics?

  • The process of calculating the standard deviation of a population
  • The process of determining the mode of a population
  • The process of estimating the mean of a population
  • The process of providing a range of values for an unknown population parameter
Interval estimation in inferential statistics is a method by which a range of values is provided that is likely to contain the population parameter. Instead of a single value, it provides an interval of estimates making it more flexible and informative than point estimation.

What are the assumptions made while applying ANOVA?

  • Independence, Homogeneity of variance, Non-linearity
  • Linearity, Independence, Equal Variance
  • Normality, Homogeneity of variance, Independence
  • Normality, Linearity, Independence
While applying ANOVA, the following assumptions are made: Normality (data is normally distributed), Homogeneity of variance (variance among the groups is approximately equal), Independence (the observations are independent of each other).

The ________ mean is a type of average, which is calculated by taking the reciprocal of the arithmetic mean of the reciprocals.

  • Arithmetic
  • Geometric
  • Harmonic
  • nan
The harmonic mean is a measure of central tendency that is much less well known than, for example, the arithmetic mean or the median. It is appropriate for situations when the average of rates is desired. The harmonic mean is calculated by taking the reciprocal of the arithmetic mean of the reciprocals.

If all the values in a dataset are identical, what would be the variance and standard deviation?

  • The variance and standard deviation would be 0
  • The variance and standard deviation would be 1
  • The variance would be 0 but the standard deviation would be 1
  • The variance would be 1 but the standard deviation would be 0
If all the values in a dataset are identical, there is no variation or dispersion in the data. Hence, both the variance and the standard deviation would be zero.

In what situations can the use of stepwise regression for model selection be problematic?

  • When the true model is non-linear.
  • When there are too few predictor variables.
  • When there are too many predictor variables.
  • When there is no multicollinearity.
Stepwise regression assumes a linear relationship between the predictors and the response. It might be problematic when the true model is non-linear, leading to incorrect inferences.

Why is it important to check the assumptions of a multiple linear regression model?

  • To ensure the validity of the model
  • To increase the complexity of the model
  • To increase the number of observations
  • To reduce the R-squared value
Checking the assumptions of a multiple linear regression model (like linearity, independence, normality, and homoscedasticity) is crucial to ensure the validity of the model and its estimates. Violations of these assumptions can lead to biased or inefficient estimates, and inferences made from such models could be misleading.

What is the relationship between the mean and the standard deviation in a normal distribution?

  • The mean is always larger than the standard deviation
  • The mean is the midpoint of the distribution, and the standard deviation measures the spread
  • The standard deviation is always larger than the mean
  • There is no relationship between the mean and the standard deviation
In a normal distribution, the mean is the center of the distribution and represents the "average" value. The standard deviation measures the dispersion around the mean. Roughly 68% of the data falls within one standard deviation of the mean in a normal distribution.

What are the assumptions required for a distribution to be considered a Poisson distribution?

  • The events are dependent on each other
  • The events are occurring at a constant mean rate and independently of the time since the last event
  • The events have more than two possible outcomes
  • The number of trials is fixed
The key assumptions for a Poisson distribution are that the events are happening at a constant mean rate and independently of the time since the last event. This is often used for modeling the number of times an event occurs in a given interval of time or space.

The p-value in a hypothesis test is the probability of getting a sample statistic as extreme as the test statistic, given that the _______ hypothesis is true.

  • Alternative
  • Null
  • Original
  • Random
In the context of hypothesis testing, the p-value is the probability of observing a test statistic as extreme as the one calculated, assuming that the null hypothesis is true.

What is the purpose of hypothesis testing in statistics?

  • To compare the sample mean to the population mean
  • To make inferences about a population based on sample data
  • To understand the distribution of the data
  • To visualize the data
Hypothesis testing is a statistical method that is used in making statistical decisions using experimental data. It's an inferential statistic that allows us to infer if our observed results deviate from null hypothesis by chance or by a true statistical difference.

In a Chi-square test for independence, small expected frequencies can lead to a ________ Chi-square value.

  • constant
  • larger
  • smaller
  • zero
In a Chi-square test for independence, small expected frequencies can lead to a larger Chi-square value. This is because the Chi-square value is inflated by small expected frequencies, which can lead to a significant result even when there is no substantial relationship between the variables.