How can you assess the accuracy and reliability of a regression model's predictions?
- Through classification metrics
- Through clustering metrics
- Through regression metrics like RMSE, R-Squared, MAE
- Through text analysis
The accuracy and reliability of a regression model's predictions can be assessed through regression metrics like Root Mean Squared Error (RMSE), R-Squared, Mean Absolute Error (MAE), etc. These metrics provide quantitative measures of how well the model's predictions align with the actual values, considering both the direction and magnitude of errors.
Differentiate between feature selection and feature extraction in the context of dimensionality reduction.
- Both are the same
- Depends on the data
- Feature selection picks, extraction transforms
- Feature selection transforms, extraction picks
Feature selection involves picking a subset of the original features, whereas feature extraction involves transforming the original features into a new set. Feature extraction usually leads to new features that are combinations of the original ones, while feature selection maintains the original features but reduces their number.
Your task is to detect fraudulent activities in financial transactions. What would be the considerations in choosing between AI, Machine Learning, or Deep Learning for this task?
- AI, for its expert systems
- Deep Learning, for its complex pattern recognition
- Machine Learning, for its ability to learn from historical data
- nan
Machine Learning can be trained on historical data to detect patterns indicative of fraudulent activities, making it a suitable choice for this task.
In what situations would ElasticNet be preferred over Ridge or Lasso?
- When all features are equally important
- When features are uncorrelated
- When model complexity is not a concern
- When multicollinearity is high
ElasticNet is preferred when there's multicollinearity and you want to balance between Ridge and Lasso, as it combines the properties of both.
The ________ component in PCA explains the highest amount of variance within the data.
- first
- last
- median
- random
The "first" principal component in PCA explains the highest amount of variance within the data. It is aligned with the direction of the maximum spread of the data and forms the most substantial part of the dataset's structure.
What is classification and how does it differ from regression?
- Predicting a category, differs by number of variables
- Predicting a category, differs by output type
- Predicting a number, differs by algorithm
- Predicting a number, differs by input type
Classification aims to predict a categorical outcome, such as 'yes' or 'no', whereas regression predicts a continuous numerical value, such as a price or weight. While both are predictive modeling techniques, the key difference is in the type of output they produce. This makes classification suitable for discrete decisions, while regression is used for forecasting continuous quantities.
What is the main principle behind the K-Nearest Neighbors algorithm?
- Calculating correlations
- Finding nearest points
- Grouping similar objects
- Minimizing error
The main principle of KNN is to classify a new object by assigning it to the most common class among its K nearest neighbors.
In a situation where you have a large dataset with only a small portion of labeled data, which learning paradigm would be most appropriate and why?
- Reinforcement Learning
- Semi-Supervised Learning
- Supervised Learning
- Unsupervised Learning
Semi-Supervised Learning combines both labeled and unlabeled data, making it appropriate for scenarios with limited labeled data.
Bagging stands for Bootstrap __________, which involves creating subsets of the original dataset and training individual models on them.
- Adjustment
- Aggregation
- Algorithm
- Alignment
Bagging, or Bootstrap Aggregation, involves creating subsets of the original dataset through bootstrapping and training individual models on these subsets, which are then combined to make the final prediction.
How can you test for multicollinearity in Simple Linear Regression, and why is it important?
- By Checking Accuracy, Improves Prediction
- By Checking Residuals, Reduces Overfitting
- By Checking Variance Inflation Factor (VIF), Prevents Unstable Estimates
- By Examining Correlations between Variables, Prevents Confounding Effects
Multicollinearity can be detected by checking the Variance Inflation Factor (VIF). It is important as multicollinearity can lead to unstable estimates and make it difficult to interpret individual coefficients.