Bayes' theorem combines our prior knowledge about an event with evidence from data to provide a ________ probability.
- joint
- marginal
- posterior
- prior
The theorem combines our prior knowledge (the prior probability) and evidence (the likelihood) to provide a new, updated probability of an event (the posterior probability).
An event that cannot possibly occur has a probability of ________.
- -1
- 0
- 0.5
- 1
An event that cannot possibly occur is said to be impossible and has a probability of 0. This is in line with the definition of probability as a measure that takes values between 0 and 1, inclusive.
What is the Central Limit Theorem and how does it relate to point and interval estimation?
- It implies that every data set is symmetrically distributed, which affects the reliability of point and interval estimations
- It suggests that all data has a central tendency and this affects the point and interval estimations
- It suggests that as sample size increases, the distribution of sample means approaches a normal distribution, which affects how we estimate population parameters
- It suggests that every large enough dataset is normally distributed, which is the foundation of point and interval estimations
The Central Limit Theorem states that when you have a sufficiently large sample, the distribution of the sample mean approximates a normal distribution, regardless of the shape of the population distribution. This allows us to make inferences about the population parameters using the sample mean and the standard error, which form the basis of point and interval estimation.
What happens to the width of a confidence interval as the confidence level increases?
- It decreases
- It fluctuates unpredictably
- It increases
- It stays the same
The width of a confidence interval increases as the confidence level increases. A higher confidence level means that you want to be more sure that you are capturing the true population parameter, which requires a wider interval.
The presence of a pattern in the residuals of a multiple linear regression model can indicate violations of the ________ assumption.
- homoscedasticity
- independence
- linearity
- normality
The presence of a pattern in the residuals of a multiple linear regression model can indicate a violation of the independence assumption. This assumption requires that the residuals, which are the differences between the observed and predicted values of the dependent variable, are independent of each other. If a pattern is observed in the residuals, it may indicate that the residuals are not independent, and the model may not provide valid results.
What are the common techniques used for model selection in multiple regression?
- Chi-square test
- F-test
- Forward selection, backward elimination, and stepwise regression.
- T-test
Techniques like forward selection, backward elimination, and stepwise regression are commonly used for model selection in multiple regression.
How is the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy used in factor analysis?
- It is used to assess the appropriateness of factor analysis
- It is used to determine the number of factors to retain
- It is used to test the assumption of homoscedasticity
- It is used to test the assumption of normality
The Kaiser-Meyer-Olkin (KMO) measure is a measure of how suitable the data is for factor analysis. It determines the adequacy for each observed variable and for the complete model. KMO estimates vary between 0 and 1. A value of 0 indicates that the sum of partial correlations is large relative to the sum correlations, implying diffusion in the pattern of correlations (hence, factor analysis will be likely inappropriate).
How does the sample size affect the width of the confidence interval?
- Larger sample size makes the interval narrower
- Larger sample size makes the interval wider
- Sample size has no effect on the interval
- nan
Larger sample sizes reduce the standard error and thus, the width of the confidence interval becomes narrower. This means that with larger samples, our estimates are more precise.
The Sign Test ignores the ________ of the differences between paired observations.
- direction
- distribution
- magnitude
- nan
The Sign Test ignores the magnitude of the differences between paired observations, and only considers the sign of the differences.
How do outliers affect the skewness of a dataset?
- Depends on the direction of the outliers
- They decrease skewness
- They do not affect skewness
- They increase skewness
Outliers can have a big impact on the skewness of a dataset. If the outlier is greater than the rest of the data, it will pull the skewness positive, and if it is less than the rest of the data, it will pull the skewness negative.
What type of data can be further classified as discrete and continuous?
- Categorical data
- Nominal data
- Qualitative data
- Quantitative data
Quantitative data can be further classified as discrete and continuous. Discrete data is countable and has a finite number of possible values, such as the number of students in a class. Continuous data can take any value within a given range, such as the weight of a person.
Scenario: An e-commerce website requires a fast and scalable solution for managing product catalog information. How could a Key-Value Store be utilized in this scenario, and what benefits would it offer?
- Implementing complex queries for product information
- Normalizing the database schema
- Storing product details and metadata as key-value pairs
- Utilizing joins between multiple tables
A Key-Value Store can be used by storing product details as key-value pairs, where the key is the product identifier, and the value is a serialized form of the product details. This allows for fast and scalable retrieval of product information without the need for complex joins or normalization.