Why might it be important to consider interaction effects in a Multiple Linear Regression model?

It captures complex relationships
It increases accuracy independently
It reduces bias
It simplifies the model

Considering interaction effects is essential to capture complex relationships between variables that might not be apparent when considering each variable separately.

Discuss it

Describe how Machine Learning algorithms are implemented in sentiment analysis and customer feedback systems.

Drug Discovery
Image Recognition
Inventory Management
Text Classification

Sentiment analysis in customer feedback systems often involves text classification techniques. Machine learning algorithms like SVM, Naïve Bayes, or deep learning models can categorize customer comments into positive, negative, or neutral sentiment.

Discuss it

When applying the K-Nearest Neighbors algorithm, scaling the features is essential because it ensures that each feature contributes __________ to the distance computation.

differently
equally
maximally
minimally

Scaling features in KNN ensures that each feature contributes equally to the distance computation, preventing features with larger scales from dominating.

Discuss it

When multicollinearity is present in a dataset, it can make the coefficients of the variables ___________ and hard to interpret.

insignificant
reliable
stable
unstable

Multicollinearity can make the coefficients of the variables unstable and sensitive to small changes in the data. This makes the interpretation of individual coefficients unreliable and the model difficult to interpret.

Discuss it

A core point in DBSCAN is a point that has at least MinPts within _________ distance from itself.

Epsilon
border point
cluster
noise point

A core point in DBSCAN has at least MinPts within the Epsilon distance from itself. The Epsilon value defines the radius of the neighborhood around the point, and if there are enough points (MinPts or more) within this radius, the point is considered a core point.

Discuss it

You've developed a Polynomial Regression model with a high-degree polynomial, and it's performing exceptionally well on the training data but poorly on the test data. What might be the issue, and how would you address it?

Add more features
Increase the degree
Reduce the degree or apply regularization
Use a different algorithm entirely

The issue likely is overfitting due to the high-degree polynomial. Reducing the degree or applying regularization techniques like Ridge or Lasso can help to reduce the model's complexity and improve generalization to unseen data.

Discuss it

You are facing an overfitting problem in a linear model. How would you use Ridge, Lasso, or ElasticNet to address this issue?

Decrease regularization strength
Increase regularization strength
Remove all regularization
nan

Increasing the regularization strength can help to prevent overfitting by constraining the model complexity and reducing variance.

Discuss it

Can you list some applications of Machine Learning?

Finance, Cooking
Games, Cooking
Games, Healthcare
Healthcare, Finance, Marketing

Machine Learning is applied in various domains such as healthcare (for predicting diseases, personalizing treatments), finance (for fraud detection, risk management), marketing (for customer segmentation, targeted advertising), and more. Its versatility has made it an essential tool in modern technology.

Discuss it

You notice that a Decision Tree is providing inconsistent results on different runs. How might you investigate and correct the underlying issue, possibly involving entropy, Gini Index, or pruning techniques?

Analyze the randomness in splitting and apply consistent pruning techniques
Change to a different algorithm
Ignore inconsistent results
Increase tree depth

Inconsistent results may stem from the randomness in splitting the data. Analyzing this aspect and applying consistent pruning techniques can help create more stable, reproducible results. Attention to the splitting criteria, such as entropy or Gini Index, can further refine the model's behavior.

Discuss it

What is the primary goal of clustering algorithms?

To classify labeled data
To find patterns and group similar data together
To predict outcomes
To solve reinforcement learning problems

The primary goal of clustering algorithms is to find patterns in the data and group similar data points together without using any labeled responses.

Discuss it