What are the key properties of a Bernoulli distribution?

It can only take positive integer values
It has a bell-shaped curve
It has a single trial with two possible outcomes
It models a series of independent trials

A Bernoulli distribution is a discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with probability q=1-p. It models a single trial with two possible outcomes, often labelled 'success' and 'failure'.

Discuss it

One common feature of non-parametric methods is the use of ________ rather than raw data points.

averages
frequencies
medians
ranks

One common feature of non-parametric methods is the use of ranks rather than raw data points, which makes them more robust to outliers and does not require the assumption of a specific distribution.

Discuss it

The ________ of a box plot are used to indicate variability outside the upper and lower quartiles.

Bars
Outliers
Tails
Whiskers

The whiskers of a box plot are used to indicate the variability of the data outside the upper and lower quartiles. They often extend to the maximum and minimum data values (excluding outliers), or 1.5 times the interquartile range.

Discuss it

The first Principal Component is the direction in the dataset that captures the ______ variance in the data.

least
median
most
random

The first Principal Component is the direction (or vector) in the multidimensional space along which the data varies the most, so it captures the most variance in the data.

Discuss it

In statistics, the entire group of individuals or observations that we want to understand is called the _______.

distribution
parameter
population
sample

In statistics, a population is the entire group of individuals or observations that we want to understand or draw conclusions about. It's the total set of observations that can be made. For example, if you want to know the average height of an adult male in the US, the population would be all adult males in the US.

Discuss it

How does Pearson's Correlation Coefficient handle outliers?

Automatically removes outliers
Converts outliers to mean values
Ignores outliers
Is highly sensitive to outliers

Pearson's Correlation Coefficient is highly sensitive to outliers. This is because it involves a mean and standard deviation calculation, and these values can be greatly influenced by outliers. Even a single outlier can significantly skew the result of the correlation.

Discuss it

What does it mean if the Chi-square test for goodness of fit is statistically significant?

The observed data and theoretical distribution are negatively correlated
The observed data and theoretical distribution are positively correlated
The observed data differs significantly from what we would expect if it followed the theoretical distribution
The observed data fits the theoretical distribution perfectly

If the Chi-square test for goodness of fit is statistically significant, this means that the observed data differs significantly from what we would expect if the data followed the theoretical distribution.

Discuss it

In multiple regression, model selection aims to choose the most _______ model that best predicts the response variable.

complex
overfit
parsimonious
simple

In multiple regression, model selection aims to choose the most parsimonious model that best predicts the response variable. A parsimonious model is a model that accomplishes the desired level of explanation or prediction with as few predictor variables as possible.

Discuss it

What do we call an experiment in probability theory?

A process that produces outcomes
A statistical analysis
A test of a hypothesis
An observation of a random variable

In probability theory, an experiment refers to a process or procedure that produces outcomes. The outcomes depend on chance or randomness. For example, tossing a coin or rolling a die is considered a random experiment because the outcome is not certain but depends on chance.

Discuss it

The Central Limit Theorem allows us to make inferences about the ________ based on sample data.

Data Distribution
Hypothesis
Population
Sample

The Central Limit Theorem allows us to make inferences about the Population based on sample data. It states that, with a large enough sample size, the sample mean will be normally distributed around the population mean. This enables us to estimate the parameters of the population and make predictions based on the sample data.

Discuss it