Which type of learning is typically employed when there's neither complete supervision nor complete absence of supervision, but a mix where an agent learns to act in an environment?

  • Reinforcement Learning
  • Self-supervised Learning
  • Semi-supervised Learning
  • Unsupervised Learning
Semi-supervised Learning fits this scenario. It combines labeled and unlabeled data to train a model. In situations where you have some labeled data but not enough for full supervision, or when labeling is expensive, semi-supervised learning is a practical choice.

While linear regression is concerned with estimating the mean of the dependent variable, logistic regression estimates the probability that the dependent variable belongs to a particular ________.

  • Category
  • Class
  • Cluster
  • Group
Logistic regression estimates the probability that the dependent variable belongs to a particular class or category. Unlike linear regression, which predicts continuous values, logistic regression is used for classification problems.

CNNs are particularly effective for image data due to their ability to preserve the ________ structure of the data.

  • Spatial
  • Color
  • Temporal
  • Frequency
CNNs are effective for image data due to their ability to preserve the spatial structure of the data, which is crucial for detecting patterns in pixels' proximity.

While supervised learning requires explicit labels, ________ learning operates on data without explicit instructions.

  • Deep
  • Reinforcement
  • Semi-supervised
  • Unsupervised
In machine learning, unsupervised learning operates on data without explicit labels or instructions. Supervised learning, on the other hand, relies heavily on labeled data.

When NLP systems try to understand the context of words in medical documents to extract meaningful information, they are leveraging a technique called ________.

  • Named Entity Recognition
  • Sentiment Analysis
  • Document Summarization
  • Word Embeddings
Named Entity Recognition is a technique in NLP used to identify and classify entities in medical documents, such as drugs, diseases, or patient names.

A company wants to segment its customers based on their purchasing behavior. They have a fair idea that there are around 5 distinct segments but want to confirm this. Which clustering algorithm might they start with?

  • K-Means Clustering
  • Agglomerative Hierarchical Clustering
  • Mean-Shift Clustering
  • Spectral Clustering
The company might start with K-Means Clustering to confirm their idea of five distinct segments. K-Means is often used for partitioning data into a pre-specified number of clusters and can be a good choice when you have a rough idea of the number of clusters.

In the k-NN algorithm, as the value of k increases, the decision boundary becomes __________.

  • Linear
  • More complex
  • More simplified
  • Non-existent
As the value of k in k-NN increases, the decision boundary becomes more simplified because it is based on fewer neighboring data points.

When dealing with high-dimensional data, which of the two algorithms (k-NN or Naive Bayes) is likely to be more efficient in terms of computational time?

  • Both Equally Efficient
  • Naive Bayes
  • Neither is Efficient
  • k-NN
Naive Bayes is typically more efficient in high-dimensional data due to its simple probabilistic calculations, while k-NN can suffer from the "curse of dimensionality."

________ learning is often used for discovering hidden patterns in data.

  • Reinforcement
  • Semi-supervised
  • Supervised
  • Unsupervised
Unsupervised learning is a machine learning approach where algorithms are used to identify patterns in data without explicit guidance. It is commonly employed for data exploration and pattern discovery.

The term "exploitation" in reinforcement learning refers to which of the following?

  • Utilizing the best-known actions
  • Trying new, unexplored actions
  • Maximizing exploration
  • Modifying the environment
Exploitation involves utilizing the best-known actions to maximize rewards based on current knowledge, minimizing risk and uncertainty.

A medical research company is working on image data, where they want to classify microscopic images into cancerous and non-cancerous categories. The boundary between these categories is not linear. Which algorithm would be a strong candidate for this problem?

  • Convolutional Neural Network (CNN)
  • Logistic Regression
  • Naive Bayes Classifier
  • Principal Component Analysis
Convolutional Neural Networks (CNNs) are excellent for image classification tasks, especially when dealing with non-linear boundaries. They use convolutional layers to extract features from images, making them suitable for tasks like cancerous/non-cancerous image classification.

Which regression method assumes a linear relationship between the independent and dependent variables?

  • Decision Tree Regression
  • Logistic Regression
  • Polynomial Regression
  • Ridge Regression
Polynomial Regression assumes a linear relationship between the independent and dependent variables. It models relationships as polynomial functions, and other regression methods may assume different relationships.