How does the Logit function transform the probability in Logistic Regression?

  • Maps odds to log-odds
  • Maps odds to probability
  • Maps probability to log-odds
  • Maps probability to odds
The Logit function in Logistic Regression takes a probability and maps it to log-odds. It's the inverse of the Sigmoid function used to model probabilities.

In unsupervised learning, the model learns to find patterns and structures from _________ data, where no specific output values are provided.

  • Balanced
  • Labelled
  • Sparse
  • Unlabelled
In unsupervised learning, the model learns from unlabeled data, finding hidden patterns and structures without specific output values or guidance.

A machine learning model is suffering from high computational costs and overfitting. How could dimensionality reduction be implemented to solve these problems?

  • Add more features
  • Apply PCA or LDA, depending on the data type
  • Increase the model's complexity
  • Reduce the dataset size
Applying dimensionality reduction techniques like PCA or LDA can significantly reduce the feature space and computational costs without losing important information. This can also help in addressing overfitting by simplifying the model, making it less likely to capture noise in the data. Increasing model complexity or adding more features would exacerbate the problem, and reducing the dataset size may lead to loss of information.

How does the choice of Epsilon affect the clustering results in DBSCAN?

  • It affects the minimum points in a cluster
  • It changes the clustering speed
  • It determines the radius of the neighborhood around a point
  • It modifies the clustering algorithm's underlying formula
The choice of Epsilon in DBSCAN determines the maximum radius of the neighborhood around a data point. By adjusting this value, one can control how close points must be to form a cluster, affecting the clustering's granularity, shape, and size. It's a crucial parameter to tune for achieving desired clustering results.

The K-Means clustering algorithm iteratively updates the _________ to minimize the sum of squared distances within each cluster.

  • Centroids
  • Distance metric
  • Learning rate
  • Number of clusters
The K-Means algorithm works by iteratively updating the centroids, minimizing the sum of squared distances from each point to its assigned centroid, thus forming cohesive clusters.

The R-Squared value can be artificially inflated by adding more predictors, but the ________ helps mitigate this issue.

  • Adjusted R-Squared
  • MAE
  • MSE
  • RMSE
The R-Squared value can be artificially increased by adding irrelevant predictors. Adjusted R-Squared helps mitigate this by accounting for the number of predictors, penalizing models for including unnecessary complexity. It provides a more balanced evaluation of the model's fit and helps to avoid the trap of overfitting by adding more predictors.

How does LDA differ from Principal Component Analysis (PCA)?

  • LDA and PCA have the same goal and method
  • LDA focuses on unsupervised learning while PCA focuses on supervised learning
  • LDA is concerned with maximizing between-class variance, while PCA focuses on maximizing total variance
  • LDA uses Eigenvalues, while PCA uses Eigenvectors
LDA aims to maximize between-class variance and minimize within-class variance for classification, while PCA focuses on "maximizing total variance" without considering class labels. PCA is used mainly for dimensionality reduction and does not consider class separation.

What are the assumptions that must be met in Simple Linear Regression?

  • Homogeneity, Variability, Linearity
  • Independence, Homoscedasticity, Linearity, Normality
  • Linearity, Categorization, Independence
  • Linearity, Quadratic, Exponential
The assumptions in Simple Linear Regression include Independence (of errors), Homoscedasticity (equal variance), Linearity, and Normality (of errors).

What is the concept of Polynomial Regression?

  • Linear equation with multiple variables
  • Linear equation with one variable
  • Non-linear equation using polynomial features
  • Non-linear equation with one variable
Polynomial Regression is a form of regression analysis where the relationship between the independent variable and the dependent variable is modeled as a polynomial. It allows for more complex relationships by including polynomial terms in the regression equation.

You trained a model that performs exceptionally well on the training data but poorly on the test data. What could be the issue, and how would you address it?

  • Increase complexity
  • Increase dataset size
  • Overfitting, add regularization
  • Reduce complexity
The issue is likely overfitting, where the model has learned the training data too well, including its noise and anomalies. Adding regularization would help to constrain the model and make it generalize better to unseen data.