What are some common performance metrics used in evaluating classification models?

  • Clustering Coefficient, Density
  • Eigenvalues, Eigenvectors
  • Mean Squared Error, R-squared
  • Precision, Recall, F1 Score
Common performance metrics for classification include Precision (positive predictive value), Recall (sensitivity), and F1 Score (harmonic mean of precision and recall). These metrics help to assess the model's ability to correctly classify positive cases.

Imagine you need to classify documents but have only a few labeled examples. How would you leverage semi-supervised learning in this scenario?

  • Combine trial and error approaches
  • Use clustering exclusively
  • Utilize both labeled and unlabeled data
  • Utilize only the labeled data
In this scenario, Semi-Supervised Learning would leverage both the limited labeled examples and the abundant unlabeled data to create an effective classification model.

Can you explain the impact of different centroid initialization methods on the K-Means clustering results?

  • Alters convergence and final cluster formation
  • Has no impact
  • Increases accuracy but reduces speed
  • Increases the number of clusters
Different initialization methods in K-Means can alter the convergence rate and final cluster formation. Poor initialization may lead to suboptimal clustering or slow convergence.

A city is facing issues with traffic congestion and wants to use Machine Learning to manage traffic flow. What kind of data and algorithms would you suggest?

  • Drug Development, Weather Data
  • Image Recognition, Financial Data
  • Recommender Systems, Text Data
  • Time-Series Analysis, Traffic Data
Time-Series Analysis and Traffic Data, including real-time traffic conditions, vehicle counts, and traffic camera feeds, can be used to predict congestion patterns and optimize traffic flow using algorithms like ARIMA or LSTM.

How is the Adjusted R-Squared value computed, and why is it often preferred over R-Squared?

  • Adjusted R-Squared adds a penalty for more predictors; preferred for its robustness to outliers
  • Adjusted R-Squared considers bias; preferred for simplicity
  • Adjusted R-Squared includes a penalty for more predictors; preferred for its consideration of model complexity
  • Adjusted R-Squared includes mean error; preferred for interpretability
The Adjusted R-Squared value is computed by including a penalty term for the number of predictors in the model, unlike the regular R-Squared. This makes it often preferred over R-Squared, especially when dealing with multiple predictors, as it takes into consideration the complexity of the model. The adjustment ensures that only meaningful predictors enhance the model's performance, avoiding the tendency of R-Squared to increase with more variables.

You have a dataset with hundreds of features, some of which are redundant. How would you approach reducing the dimensionality?

  • Remove all redundant features manually
  • Apply PCA
  • Use only the first few features
  • Normalize the data
Applying Principal Component Analysis (PCA) would be the most efficient way to reduce dimensionality in this scenario. PCA transforms the data into a new set of uncorrelated features, effectively capturing the most important variance in fewer dimensions, and thus removing redundancy. Manually removing redundant features may not be practical with hundreds of features, and other options do not directly address dimensionality reduction.

Which learning paradigm does not require labeled data and finds hidden patterns in the data?

  • Reinforcement Learning
  • Semi-supervised Learning
  • Supervised Learning
  • Unsupervised Learning
Unsupervised Learning does not require labeled data and works by finding hidden patterns and structures in the data.

If a Polynomial Regression model is suspected of overfitting, you can perform _________ to validate the model's performance across different subsets of the data.

  • accuracy testing
  • cross-validation
  • noise filtering
  • stability testing
Cross-validation can be used to validate the model's performance across different subsets of the data and can help detect overfitting.

How does the Logit function transform the probability in Logistic Regression?

  • Maps odds to log-odds
  • Maps odds to probability
  • Maps probability to log-odds
  • Maps probability to odds
The Logit function in Logistic Regression takes a probability and maps it to log-odds. It's the inverse of the Sigmoid function used to model probabilities.

In unsupervised learning, the model learns to find patterns and structures from _________ data, where no specific output values are provided.

  • Balanced
  • Labelled
  • Sparse
  • Unlabelled
In unsupervised learning, the model learns from unlabeled data, finding hidden patterns and structures without specific output values or guidance.