In the context of PCA, what do the principal components represent?

Clustered Data
Error in Data
Features of Data
Variance of Data Explained

Principal components represent the directions in the data space where the variance of data is maximized. They capture the most significant information and reduce the dimensionality of data.

Discuss it

What metric would be more appropriate to use when the classes in a classification problem are imbalanced?

Accuracy
F1 Score
Mean Absolute Error
Root Mean Square Error

When dealing with imbalanced classes, the F1 Score is a more appropriate metric. It considers both precision and recall, making it suitable for situations where one class is significantly more prevalent than the other.

Discuss it

The drive to make machine learning models more transparent and understandable is often termed as the quest for model ________.

Interpretability
Complexity
Unpredictability
Accuracy

Model interpretability focuses on making models more transparent, understandable, and interpretable, enhancing trust and insight.

Discuss it

Why is it crucial for machine learning models, especially in critical applications like healthcare or finance, to be interpretable?

Trust and Accountability
Improved Training Data
Increased Model Complexity
Speed of Prediction

It is crucial for interpretability to establish trust and accountability. In critical areas like healthcare or finance, understanding the model's decision process is essential to ensure safe and ethical use.

Discuss it

Unlike PCA, which assumes that the data components are orthogonally distributed, ICA assumes that the components are ________.

Independent
Correlated
Uncorrelated
Randomly Distributed

ICA (Independent Component Analysis) assumes that the components are independent of each other, not necessarily orthogonal, which is different from PCA. PCA assumes orthogonality, but ICA allows for any type of independence.

Discuss it

In which learning approach does the model learn to make decisions by receiving rewards or penalties for its actions?

Reinforcement Learning
Semi-Supervised Learning
Supervised Learning
Unsupervised Learning

Reinforcement Learning involves learning through trial and error. A model learns to make decisions by receiving rewards for good actions and penalties for bad ones. It's commonly used in areas like game-playing and robotics.

Discuss it

A researcher is working with a large dataset of patient medical records with numerous features. They want to visualize the data in 2D to spot any potential patterns or groupings but without necessarily clustering the data. Which technique would they most likely employ?

Principal Component Analysis
t-Distributed Stochastic Neighbor Embedding (t-SNE)
K-Means Clustering
DBSCAN

The researcher would most likely employ t-Distributed Stochastic Neighbor Embedding (t-SNE). t-SNE is a dimensionality reduction technique suitable for visualizing high-dimensional data in 2D while preserving data relationships and patterns without imposing clusters.

Discuss it

You are given a dataset of customer reviews but without any labels indicating sentiment. You want to group similar reviews together. Which type of learning approach will you employ?

Reinforcement Learning
Semi-supervised Learning
Supervised Learning
Unsupervised Learning

In this scenario, you will use unsupervised learning. Unsupervised learning is employed when you have unlabelled data and aim to discover patterns or group similar data points without prior guidance.

Discuss it

You're working with a large dataset of facial images. You want to reduce the dimensionality of the images while preserving their primary features for facial recognition. Which neural network structure would you employ?

Autoencoder
Convolutional Neural Network
Recurrent Neural Network
Generative Adversarial Network

Autoencoders are used to reduce the dimensionality of data while preserving essential features. They are commonly employed in facial recognition for feature extraction.

Discuss it

A spam filter is being designed to classify emails. The model needs to consider the presence of certain words in the email (e.g., "sale," "discount") and their likelihood to indicate spam. Which classifier is more suited for this kind of problem?

K-Means Clustering
Naive Bayes
Random Forest
Support Vector Machine (SVM)

Naive Bayes is effective for text classification tasks, such as spam filtering, as it models the likelihood of words (e.g., "sale," "discount") indicating spam or non-spam, considering word presence.

Discuss it