What is classification and how does it differ from regression?

  • Predicting a category, differs by number of variables
  • Predicting a category, differs by output type
  • Predicting a number, differs by algorithm
  • Predicting a number, differs by input type
Classification aims to predict a categorical outcome, such as 'yes' or 'no', whereas regression predicts a continuous numerical value, such as a price or weight. While both are predictive modeling techniques, the key difference is in the type of output they produce. This makes classification suitable for discrete decisions, while regression is used for forecasting continuous quantities.

What is the main principle behind the K-Nearest Neighbors algorithm?

  • Calculating correlations
  • Finding nearest points
  • Grouping similar objects
  • Minimizing error
The main principle of KNN is to classify a new object by assigning it to the most common class among its K nearest neighbors.

In a situation where you have a large dataset with only a small portion of labeled data, which learning paradigm would be most appropriate and why?

  • Reinforcement Learning
  • Semi-Supervised Learning
  • Supervised Learning
  • Unsupervised Learning
Semi-Supervised Learning combines both labeled and unlabeled data, making it appropriate for scenarios with limited labeled data.

You're working with a dataset where different features are on wildly different scales. How can dimensionality reduction techniques like PCA be adapted to this scenario?

  • Apply PCA without any preprocessing
  • Ignore the scales
  • Scale the features before applying PCA
  • Use only large-scale features
When features are on different scales, scaling them before applying PCA is crucial. Standardizing the features ensures that each one contributes equally to the calculation of the principal components, which is vital for the accuracy of the transformation. Ignoring the scales, applying PCA without preprocessing, or focusing only on large-scale features may lead to biased or incorrect results.

What is Machine Learning and why is it important?

  • A brand of computer
  • A field of AI that learns from experience
  • A study of computers
  • A type of computer virus
Machine Learning is a subset of artificial intelligence that focuses on the development of algorithms and statistical models that enable computers to perform specific tasks without explicit instructions. It's important because it allows systems to learn from data, adapt, and improve over time, making it essential in fields like healthcare, finance, transportation, and more.

You are working on a real-world problem that requires clustering, but the Elbow Method doesn't show a clear elbow point. What might be the underlying issues, and how could you proceed?

  • Data doesn't have well-separated clusters; Consider other methods like Silhouette
  • Increase the number of data points
  • Reduce the number of features
  • Use a different clustering algorithm entirely
When the Elbow Method doesn't show a clear elbow point, it may be an indication that the data doesn't have well-separated clusters. In this case, considering other methods like the Silhouette Method to determine the optimal number of clusters is an appropriate course of action.

Explain the concept of regularization in Machine Learning. What are some common techniques?

  • Increasing complexity, Gradient Boosting
  • Increasing complexity, L1/L2
  • Reducing complexity, Gradient Descent
  • Reducing complexity, L1/L2
Regularization is a technique to reduce overfitting by adding a penalty term to the loss function. Common techniques include L1 (lasso) and L2 (ridge) regularization, which penalize large coefficients in a model.

In a dataset with fluctuating values, you've applied Polynomial Regression, and the model seems to fit even the noise. What are the potential risks, and how could you mitigate them?

  • Add more noise
  • Ignore the noise
  • Reduce model complexity through lower degree or regularization
  • Use a linear model
The risk is overfitting the noise, which will harm the model's generalization ability. Reducing the polynomial degree or using regularization techniques can mitigate this by constraining the model's complexity.

How does Deep Learning model complexity typically compare to traditional Machine Learning models, and what are the implications of this?

  • Less complex and easier to train
  • Less complex and requires less data
  • More complex and easier to interpret
  • More complex and requires more data and computation
Deep Learning models are typically more complex, requiring more data and computational resources, which can make training and tuning more challenging.

What term in Machine Learning refers to the input variables that the model uses to make predictions?

  • Features
  • Labels
  • Predictors
  • Targets
In Machine Learning, "Features" refer to the input variables that the model uses to make predictions. They are characteristics or attributes used to describe data instances.