How does the Logit function transform the probability in Logistic Regression?

Maps odds to log-odds
Maps odds to probability
Maps probability to log-odds
Maps probability to odds

The Logit function in Logistic Regression takes a probability and maps it to log-odds. It's the inverse of the Sigmoid function used to model probabilities.

Discuss it

In unsupervised learning, the model learns to find patterns and structures from _________ data, where no specific output values are provided.

Balanced
Labelled
Sparse
Unlabelled

In unsupervised learning, the model learns from unlabeled data, finding hidden patterns and structures without specific output values or guidance.

Discuss it

A machine learning model is suffering from high computational costs and overfitting. How could dimensionality reduction be implemented to solve these problems?

Add more features
Apply PCA or LDA, depending on the data type
Increase the model's complexity
Reduce the dataset size

Applying dimensionality reduction techniques like PCA or LDA can significantly reduce the feature space and computational costs without losing important information. This can also help in addressing overfitting by simplifying the model, making it less likely to capture noise in the data. Increasing model complexity or adding more features would exacerbate the problem, and reducing the dataset size may lead to loss of information.

Discuss it

How does the choice of Epsilon affect the clustering results in DBSCAN?

It affects the minimum points in a cluster
It changes the clustering speed
It determines the radius of the neighborhood around a point
It modifies the clustering algorithm's underlying formula

The choice of Epsilon in DBSCAN determines the maximum radius of the neighborhood around a data point. By adjusting this value, one can control how close points must be to form a cluster, affecting the clustering's granularity, shape, and size. It's a crucial parameter to tune for achieving desired clustering results.

Discuss it

The K-Means clustering algorithm iteratively updates the _________ to minimize the sum of squared distances within each cluster.

Centroids
Distance metric
Learning rate
Number of clusters

The K-Means algorithm works by iteratively updating the centroids, minimizing the sum of squared distances from each point to its assigned centroid, thus forming cohesive clusters.

Discuss it

The R-Squared value can be artificially inflated by adding more predictors, but the ________ helps mitigate this issue.

Adjusted R-Squared
MAE
MSE
RMSE

The R-Squared value can be artificially increased by adding irrelevant predictors. Adjusted R-Squared helps mitigate this by accounting for the number of predictors, penalizing models for including unnecessary complexity. It provides a more balanced evaluation of the model's fit and helps to avoid the trap of overfitting by adding more predictors.

Discuss it

How does LDA differ from Principal Component Analysis (PCA)?

LDA and PCA have the same goal and method
LDA focuses on unsupervised learning while PCA focuses on supervised learning
LDA is concerned with maximizing between-class variance, while PCA focuses on maximizing total variance
LDA uses Eigenvalues, while PCA uses Eigenvectors

LDA aims to maximize between-class variance and minimize within-class variance for classification, while PCA focuses on "maximizing total variance" without considering class labels. PCA is used mainly for dimensionality reduction and does not consider class separation.

Discuss it

What are the assumptions that must be met in Simple Linear Regression?

Homogeneity, Variability, Linearity
Independence, Homoscedasticity, Linearity, Normality
Linearity, Categorization, Independence
Linearity, Quadratic, Exponential

The assumptions in Simple Linear Regression include Independence (of errors), Homoscedasticity (equal variance), Linearity, and Normality (of errors).

Discuss it

What is the concept of Polynomial Regression?

Linear equation with multiple variables
Linear equation with one variable
Non-linear equation using polynomial features
Non-linear equation with one variable

Polynomial Regression is a form of regression analysis where the relationship between the independent variable and the dependent variable is modeled as a polynomial. It allows for more complex relationships by including polynomial terms in the regression equation.

Discuss it

You trained a model that performs exceptionally well on the training data but poorly on the test data. What could be the issue, and how would you address it?

Increase complexity
Increase dataset size
Overfitting, add regularization
Reduce complexity

The issue is likely overfitting, where the model has learned the training data too well, including its noise and anomalies. Adding regularization would help to constrain the model and make it generalize better to unseen data.

Discuss it