Describe the relationship between the Logit function, Odds Ratio, and the likelihood function in Logistic Regression.

The Logit function is used for multi-class, Odds Ratio for binary, likelihood for regression
The Logit function maps probabilities to log-odds, Odds Ratio quantifies effect on odds, likelihood function is used for estimation
The Logit function maps probabilities to odds, Odds Ratio quantifies effect on odds, likelihood function maximizes probabilities
They are unrelated

In Logistic Regression, the Logit function maps probabilities to log-odds, the Odds Ratio quantifies the effect of predictors on odds, and the likelihood function is used to estimate the model parameters by maximizing the likelihood of observing the given data.

Discuss it

Explain how Ridge and Lasso handle multicollinearity among the features.

Both eliminate correlated features
Both keep correlated features
Ridge eliminates correlated features; Lasso keeps them
Ridge keeps correlated features; Lasso eliminates them

Ridge regularization keeps correlated features but shrinks coefficients; Lasso can eliminate some by setting coefficients to zero.

Discuss it

What are some common applications for each of the four types of Machine Learning: Supervised, Unsupervised, Semi-Supervised, and Reinforcement?

Specific to finance
Specific to healthcare
Specific to manufacturing
Varies based on the problem domain

The applications for these types of Machine Learning vary and can be tailored to various problem domains, not confined to specific industries.

Discuss it

What is the difference between simple linear regression and multiple linear regression?

Number of dependent variables
Number of equations
Number of independent variables
Number of observations

Simple linear regression involves one independent variable to predict a dependent variable, whereas multiple linear regression uses two or more independent variables for prediction. The inclusion of more variables in multiple linear regression allows for more complex models and can lead to a better understanding of the relationships between variables.

Discuss it

The performance of an LDA model can be evaluated using ___________, which considers both within-class and between-class variances.

accuracy metrics
error rate
feature selection
principal components

"Accuracy metrics" that consider both within-class and between-class variances can be used to evaluate the performance of an LDA model. It gives a comprehensive view of how well the model has separated the classes.

Discuss it

In K-Means clustering, the algorithm iteratively assigns each data point to the nearest _______, recalculating the centroids until convergence.

Centroid
Cluster
Data Point
Distance Metric

In K-Means, the algorithm assigns each data point to the nearest centroid and recalculates the centroids until convergence.

Discuss it

In KNN, how does an increase in the value of K generally affect the bias and variance of the model?

Decreases bias, increases variance
Decreases both bias and variance
Increases bias, decreases variance
Increases both bias and variance

Increasing the value of K generally increases bias and decreases variance in the KNN model.

Discuss it

You've trained a model with a small training set and a large testing set. What challenges might you encounter, and how could they be addressed?

Both Overfitting and Underfitting
Data is perfectly balanced
Overfitting
Underfitting

A small training set might lead to overfitting, where the model memorizes noise from the training data. Conversely, it might also lead to underfitting if the model fails to capture the underlying pattern. Cross-validation, bootstrapping, or augmenting the training set with additional relevant data can help balance the model's ability to generalize.

Discuss it

In Supervised Learning, _ and ___ are the two main types of problems.

Classification; Clustering
Classification; Regression
Regression; Clustering
Regression; Ensemble Learning

In Supervised Learning, the two main types of problems are Classification and Regression. Classification is about categorizing data into predefined classes, while Regression is predicting a continuous outcome.

Discuss it

You've built a multiple linear regression model and found that two or more predictors are highly correlated. What problems might this cause, and how can you solve them?

High bias, Address by increasing the model complexity
High variance, Address by using Lasso regression
Overfitting, Address by removing correlated features or using Ridge regression
Underfitting, Address by adding more features

Multicollinearity, where predictors are highly correlated, can cause overfitting and unstable estimates. This can be addressed by removing correlated features or using Ridge regression, which penalizes large coefficients and reduces the impact of multicollinearity.

Discuss it

Describe the relationship between the Logit function, Odds Ratio, and the likelihood function in Logistic Regression.

Explain how Ridge and Lasso handle multicollinearity among the features.

What are some common applications for each of the four types of Machine Learning: Supervised, Unsupervised, Semi-Supervised, and Reinforcement?

What is the difference between simple linear regression and multiple linear regression?

The performance of an LDA model can be evaluated using ___________, which considers both within-class and between-class variances.

In K-Means clustering, the algorithm iteratively assigns each data point to the nearest _______, recalculating the centroids until convergence.

In KNN, how does an increase in the value of K generally affect the bias and variance of the model?

You've trained a model with a small training set and a large testing set. What challenges might you encounter, and how could they be addressed?

In Supervised Learning, _________ and ___________ are the two main types of problems.

You've built a multiple linear regression model and found that two or more predictors are highly correlated. What problems might this cause, and how can you solve them?

In Supervised Learning, _ and ___ are the two main types of problems.