________ is a popular method for cluster analysis that partitions the data into non-hierarchical clusters.

  • DBSCAN
  • Hierarchical
  • K-means
  • PCA
K-means is a popular method for cluster analysis that partitions the data into non-hierarchical clusters. The algorithm iteratively assigns each data point to one of the K clusters based on the feature similarity (distance).

What is the impact of outliers on the skewness of a distribution?

  • Outliers can decrease skewness
  • Outliers can either increase or decrease skewness
  • Outliers can increase skewness
  • Outliers do not impact skewness
Outliers can have a significant impact on the skewness of a distribution. An outlier can increase skewness if it is further from the mean in the direction of the skew. Conversely, an outlier can decrease skewness if it is further from the mean in the direction opposite to the skew. The extent of the impact depends on the value and direction of the outlier relative to the rest of the data.

Bayes' theorem is a method for updating ________ probabilities based on new data.

  • conditional
  • joint
  • marginal
  • prior
Bayes' theorem is a principle in probability theory and statistics that describes how to update the probabilities of hypotheses (prior probabilities) when given evidence (new data).

How does the concept of geometric mean differ from the arithmetic mean?

  • Geometric mean cannot be used for negative numbers, arithmetic mean can
  • Geometric mean uses addition, arithmetic mean uses multiplication
  • Geometric mean uses multiplication, arithmetic mean uses addition
  • There is no difference
The arithmetic mean involves the sum of the values divided by the number of values, while the geometric mean involves multiplying all the values together, and then taking the nth root of the product (where n is the total number of values). Geometric mean is especially useful when comparing different items with extremely variable ranges.

What are some real-world implications of kurtosis in a dataset?

  • Datasets with high kurtosis are easier to interpret
  • High kurtosis can indicate a bias in data collection
  • High kurtosis can indicate the presence of outliers
  • Kurtosis does not have real-world implications
In real-world data analysis, kurtosis is used to identify the presence of outliers. High kurtosis in a dataset may signal an increase in tail risk. This is particularly relevant in fields like finance, where tail risk could translate into heavier losses than the normal distribution would predict.

What does the Wilcoxon Signed Rank Test compare in paired samples?

  • Means
  • Medians
  • Modes
  • Variance
The Wilcoxon Signed Rank Test compares the medians in paired samples.

What is the difference between correlation and causation?

  • Causation implies correlation
  • Correlation and causation are independent of each other
  • Correlation implies causation
  • Correlation means there is no causation
While correlation simply implies a relationship between two variables, causation goes a step further to explain that one variable actually causes the other to change. It's important to remember that correlation does not imply causation. However, if there is causation, there's likely to be correlation.

The correlation coefficient is denoted by the letter __.

  • C
  • P
  • R
  • S
The correlation coefficient is often denoted by the letter 'R'. In the case of Pearson's correlation, it's specifically denoted as 'r'. It measures the degree of relationship between two variables.

________ data is data that can be organized or ranked in a specific order.

  • Continuous
  • Discrete
  • Nominal
  • Ordinal
Ordinal data is a type of categorical data that can be organized or ranked in a specific order. For example, customer satisfaction ratings (satisfied, neutral, dissatisfied) can be organized from most to least satisfied.

How do you interpret the coefficients of interaction terms in a regression model?

  • The interaction coefficient indicates the effect of one variable at a specific level of the other variable
  • The interaction coefficient indicates the joint effect of the variables, independent of their individual effects
  • The interaction coefficient is a measure of the correlation between the variables
  • The interaction coefficient represents the average effect of two variables
The interaction coefficient in a regression model indicates the effect of one independent variable on the dependent variable for a specific level of another independent variable. It signifies that the effect of one variable depends on the value of another variable, thus capturing the interaction effect between the two variables.