Which of the following is not typically a layer in a CNN?

  • Convolutional Layer
  • Fully Connected Layer
  • Recurrent Layer
  • Pooling Layer
Recurrent Layers are not typically used in Convolutional Neural Networks. They are more common in Recurrent Neural Networks (RNNs) and are used for sequential data processing, unlike CNNs, which are designed for grid-like data.

The operation in CNNs that combines the outputs of neuron clusters and produces a single output for the cluster is known as _______.

  • Activation Function
  • Pooling
  • Convolutions
  • Fully Connected
In CNNs, the operation that combines the outputs of neuron clusters and produces a single output for the cluster is called "Pooling." Pooling reduces the spatial dimensions of the feature maps, making them smaller and more computationally efficient while retaining important features.

A healthcare organization stores patient records in a database. Each record contains structured fields like name, age, and diagnosis. Additionally, there are scanned documents and notes from doctors. Which term best describes the type of data in this healthcare database?

  • Structured data
  • Semi-structured data
  • Unstructured data
  • Big data
The healthcare database contains a mix of structured data (name, age, diagnosis) and semi-structured data (scanned documents and doctor's notes). Semi-structured data includes elements with partial structure, like documents, which can be tagged or indexed for better retrieval.

When a model performs well on training data but poorly on unseen data, what issue might it be facing?

  • Overfitting
  • Underfitting
  • Data leakage
  • Bias-variance tradeoff
The model is likely facing the issue of overfitting. Overfitting occurs when the model learns the training data too well, including noise, resulting in excellent performance on the training set but poor generalization to unseen data. It's an example of a high-variance problem in the bias-variance tradeoff. To address overfitting, techniques like regularization and more data are often used.

While working with a dataset about car sales, you discover that the "Brand" column has many brands with very low frequency. To avoid having too many sparse categories, which technique can you apply to the "Brand" column?

  • One-Hot Encoding
  • Label Encoding
  • Brand grouping based on frequency
  • Principal Component Analysis (PCA)
To handle low-frequency categories in the "Brand" column, you can group the brands based on their frequency. This reduces the number of sparse categories and can improve model performance. You can also consider techniques like label encoding or one-hot encoding, but they might not be ideal for low-frequency categories. PCA is used for dimensionality reduction and not for handling categorical variables.

A neural network without any hidden layers is typically referred to as a _______.

  • Deep Neural Network
  • Shallow Neural Network
  • Multilayer
  • Perceptron
A neural network without any hidden layers is often referred to as a "Perceptron." It consists of only the input and output layers, and it's the simplest form of a neural network.

_________ is a popular open-source framework used for real-time processing and analytics of large streams of data.

  • Hadoop
  • Spark
  • Hive
  • Kafka
Apache Spark is a widely used open-source framework for real-time processing and analytics of large streams of data. It provides powerful tools for data processing, machine learning, and more, making it a popular choice in the field of big data and data science.

A common task in supervised learning where the output variable is categorical, such as 'spam' or 'not spam', is called _______.

  • Classification
  • Regression
  • Clustering
  • Association
The correct term is "Classification." In supervised learning, the goal is to predict a categorical output variable based on input features. Common examples include classifying emails as 'spam' or 'not spam' (binary classification) or classifying objects into multiple categories (multi-class classification). Classification models aim to assign inputs to predefined categories, making it an essential task in supervised learning.

When considering the Data Science Life Cycle, which step involves assessing the performance of your model and ensuring it meets the project's objectives?

  • Data Collection
  • Data Preprocessing
  • Model Building and Training
  • Model Evaluation and Deployment
Model Evaluation and Deployment is the phase where you assess the performance of your data model and ensure it meets the project's objectives. During this step, you use various metrics and techniques to evaluate how well the model is performing and decide whether it's ready for deployment. This phase is crucial for ensuring that the data-driven solution is effective and meets the desired outcomes.

One of the challenges with Gradient Boosting is its sensitivity to _______ parameters, which can affect the model's performance.

  • Hyperparameters
  • Feature selection
  • Model architecture
  • Data preprocessing
Gradient Boosting is indeed sensitive to hyperparameters like the learning rate, tree depth, and the number of estimators. These parameters need to be carefully tuned to achieve optimal model performance. Hyperparameter tuning is a critical step in using gradient boosting effectively.