The percentage of total variance explained by a principal component in PCA can be calculated by dividing the Eigenvalue of that component by the ________.
- magnitude of Eigenvectors
- number of Eigenvectors
- number of components
- sum of all Eigenvalues
The percentage of total variance explained by a principal component is calculated by dividing its Eigenvalue by the "sum of all Eigenvalues." This ratio gives the proportion of the dataset's total variance that is captured by that specific component.
In ElasticNet regularization, the mixing parameter 'alpha' balances the effects of ________ and ________.
- L1, L2
- L1, L3
- L2, L3
- nan
The 'alpha' parameter in ElasticNet regularization balances the effects of L1 and L2 penalties, providing a compromise between Ridge and Lasso.
The process of fine-tuning a Machine Learning model by changing its settings or _________ is vital for achieving optimal performance.
- Algorithms
- Features
- Hyperparameters
- Targets
Hyperparameters are the settings or parameters of a machine learning model that are defined prior to training and are fine-tuned to optimize performance.
The _________ hyperplane in SVM maximizes the margin between the support vectors of different classes.
- Decision
- Fixed
- Optimal
- Random
The optimal hyperplane in SVM is the one that maximizes the margin between support vectors of different classes.
Hierarchical Clustering can be either agglomerative, where clusters are built from the bottom up, or divisive, where clusters are split from the top down. The most common method used is _________.
- Agglomerative
- Complete Linkage
- Divisive
- Single Linkage
Agglomerative method is the most commonly used approach in Hierarchical Clustering. It builds clusters from the bottom up, starting with individual data points and merging them into progressively larger clusters. This method allows for the creation of a dendrogram, which can be analyzed to choose the optimal number of clusters and understand the hierarchical relationships within the data.
How does the Kernel Trick help in SVM?
- Enhances data visualization
- Reduces data size
- Speeds up computation
- Transforms data into higher dimension
The Kernel Trick in SVM transforms the data into a higher-dimensional space to make it linearly separable.
Why is the choice of distance metric significant in the K-Nearest Neighbors (KNN) algorithm?
- It affects clustering efficiency
- It defines the complexity of the model
- It determines the similarity measure
- It influences feature selection
The choice of distance metric in KNN significantly impacts how similarity between instances is measured, affecting the neighbors chosen.
What is an interaction effect in Multiple Linear Regression?
- A combined effect of two variables
- Linear relationship between variables
- Model optimization
- Removing irrelevant features
An interaction effect occurs when the effect of one variable on the dependent variable depends on the level of another variable. It shows the combined effect.
Differentiate between feature selection and feature extraction in the context of dimensionality reduction.
- Both are the same
- Depends on the data
- Feature selection picks, extraction transforms
- Feature selection transforms, extraction picks
Feature selection involves picking a subset of the original features, whereas feature extraction involves transforming the original features into a new set. Feature extraction usually leads to new features that are combinations of the original ones, while feature selection maintains the original features but reduces their number.
Your task is to detect fraudulent activities in financial transactions. What would be the considerations in choosing between AI, Machine Learning, or Deep Learning for this task?
- AI, for its expert systems
- Deep Learning, for its complex pattern recognition
- Machine Learning, for its ability to learn from historical data
- nan
Machine Learning can be trained on historical data to detect patterns indicative of fraudulent activities, making it a suitable choice for this task.