What is the primary purpose of using Logistic Regression?
- Clustering data
- Finding correlations
- Predicting binary outcomes
- Predicting continuous outcomes
Logistic Regression is mainly used to predict binary outcomes (e.g., yes/no, true/false). It models the probability that the dependent variable belongs to a particular category.
What is Gradient Boosting, and how does it work?
- Gradient Boosting always uses a Random Forest
- Gradient Boosting builds trees sequentially, correcting errors using gradients
- Gradient Boosting is a bagging method
- Gradient Boosting reduces model complexity
Gradient Boosting is a boosting method that builds decision trees sequentially. Each tree tries to correct the errors of the previous one by using gradients (direction of the steepest ascent) to minimize the loss function. This leads to a powerful model with improved accuracy.
How do ensemble methods like Random Forest and Gradient Boosting help in improving the model's performance?
- By focusing on one strong model
- By increasing overfitting
- By leveraging multiple models to achieve better accuracy and robustness
- By reducing computational complexity
Ensemble methods like Random Forest and Gradient Boosting combine the strengths of multiple models to achieve better accuracy and robustness. By leveraging a diverse set of models, they often outperform single models, especially on complex tasks, and reduce the risks of overfitting.
Why might it be important to consider interaction effects in a Multiple Linear Regression model?
- It captures complex relationships
- It increases accuracy independently
- It reduces bias
- It simplifies the model
Considering interaction effects is essential to capture complex relationships between variables that might not be apparent when considering each variable separately.
Describe how Machine Learning algorithms are implemented in sentiment analysis and customer feedback systems.
- Drug Discovery
- Image Recognition
- Inventory Management
- Text Classification
Sentiment analysis in customer feedback systems often involves text classification techniques. Machine learning algorithms like SVM, Naïve Bayes, or deep learning models can categorize customer comments into positive, negative, or neutral sentiment.
When applying the K-Nearest Neighbors algorithm, scaling the features is essential because it ensures that each feature contributes __________ to the distance computation.
- differently
- equally
- maximally
- minimally
Scaling features in KNN ensures that each feature contributes equally to the distance computation, preventing features with larger scales from dominating.
What is the Confusion Matrix, and what information does it provide about a classification model?
- A matrix representing classification errors
- A matrix representing feature importance
- A matrix representing model's coefficients
- A matrix representing model's hyperparameters
The Confusion Matrix is a table that describes the performance of a classification model by categorizing predictions into True Positives, False Positives, True Negatives, and False Negatives. It gives detailed insight into where the model is making mistakes.
A set of input variables and corresponding target values used to evaluate a model's performance is referred to as a _________ set.
- evaluation
- testing
- training
- validation
A "testing" set consists of input variables and corresponding target values used to assess a machine learning model's performance on unseen data, allowing for a more robust evaluation.
The assumption that the relationship between the independent and dependent variable is linear in Simple Linear Regression is called the assumption of _________.
- Homoscedasticity
- Independence
- Linearity
- Normality
The assumption of linearity ensures that the relationship between the independent and dependent variable is linear, which is fundamental to Simple Linear Regression.
What type of learning algorithm utilizes labeled data to make predictions?
- Reinforcement Learning
- Semi-supervised Learning
- Supervised Learning
- Unsupervised Learning
Supervised Learning uses labeled data, where the output is known, to train the algorithm and make predictions.