When evaluating models for a multi-class classification problem, which method computes the average metric score for each class, considering the other classes as the negative class?

Micro-averaging
Macro-averaging
Weighted averaging
Mini-batch averaging

Macro-averaging computes the average metric score for each class, treating all other classes as the "negative" class. It provides an equal weight to each class and is useful when you want to assess the model's overall performance while giving equal importance to each class, regardless of class size. Macro-averaging can be particularly useful in imbalanced multi-class classification problems.

Discuss it

Which technique considers the spread of data points around the median to identify outliers?

Box Plot
Z-Score (Standardization)
One-Hot Encoding
K-Means Clustering

The Box Plot, also known as a box-and-whisker plot, considers the spread of data points around the median and helps identify outliers based on the interquartile range (IQR). Outliers are data points that fall outside the whiskers of the box plot. Z-Score is used for standardization, One-Hot Encoding is used for categorical variables, and K-Means Clustering is a clustering technique and not used for identifying outliers.

Discuss it

In Big Data processing, _ operations filter and sort data, while _ operations perform aggregations and transformations.

Map, Reduce
Filter, Join
Shuffle, Merge
Merge, Filter

In Big Data processing, the first blank should be filled with "Filter," and the second blank with "Join." Filtering and sorting are common operations in data preparation, while aggregations and transformations are typically done using join operations.

Discuss it

Which activation function can alleviate the vanishing gradient problem to some extent?

Sigmoid
ReLU (Rectified Linear Unit)
Tanh (Hyperbolic Tangent)
Leaky ReLU

The ReLU activation function is known for mitigating the vanishing gradient problem, which is a common issue in deep learning. ReLU allows gradients to flow more freely during backpropagation, making it easier to train deep neural networks.

Discuss it

In Tableau, you can connect to various data sources and create a unified view known as a _______.

Dashboard
Workbook
Storyboard
Data source

In Tableau, a "Workbook" is where you can connect to various data sources, design visualizations, and create a unified view of your data. It serves as a container for creating and organizing your data visualizations and analyses.

Discuss it

In L2 regularization, the penalty is proportional to the _______ of the magnitude of the coefficients.

Square
Absolute
Exponential
Logarithmic

In L2 regularization (Ridge), the penalty is proportional to the square of the magnitude of the coefficients. This regularization technique adds a penalty term to the loss function based on the sum of squared coefficients, which helps prevent overfitting by discouraging large coefficients.

Discuss it

Which statistical concept measures how much individual data points vary from the mean of the dataset?

Standard Deviation
Median Absolute Deviation (MAD)
Mean Deviation
Z-Score

Standard Deviation is a measure of the spread or variability of data points around the mean. It quantifies how much individual data points deviate from the average, making it a crucial concept in understanding data variability and distribution.

Discuss it

What is the main function of Hadoop's MapReduce?

Data storage and retrieval
Data visualization
Data cleaning and preparation
Distributed data processing

The main function of Hadoop's MapReduce is "Distributed data processing." MapReduce is a programming model and processing technique used to process and analyze large datasets in a distributed and parallel manner.

Discuss it

Which ensemble method adjusts weights for misclassified instances in iterative training?

Bagging
Gradient Boosting
Random Forest
K-Means Clustering

Gradient Boosting is an ensemble method that adjusts weights for misclassified instances in iterative training. It aims to correct the errors made by the previous models in the ensemble, with a focus on improving prediction accuracy. This method is particularly effective in building strong predictive models by iteratively focusing on the data points that are challenging to classify correctly.

Discuss it

You are a data engineer tasked with setting up a real-time data processing system for a large e-commerce platform. The goal is to analyze user behavior in real-time to provide instant recommendations. Which technology would be most appropriate for this task?

Apache Hadoop
Apache Kafka
Apache Spark
MySQL

Apache Spark is the most suitable choice for real-time data processing and analytics. It offers in-memory processing, which allows for fast data analysis, making it ideal for providing instant recommendations based on user behavior. Apache Kafka is used for data streaming, not real-time analytics. Hadoop and MySQL are not optimized for real-time processing.

Discuss it

When evaluating models for a multi-class classification problem, which method computes the average metric score for each class, considering the other classes as the negative class?

Which technique considers the spread of data points around the median to identify outliers?

In Big Data processing, _______ operations filter and sort data, while _______ operations perform aggregations and transformations.

Which activation function can alleviate the vanishing gradient problem to some extent?

In Tableau, you can connect to various data sources and create a unified view known as a _______.

In L2 regularization, the penalty is proportional to the _______ of the magnitude of the coefficients.

Which statistical concept measures how much individual data points vary from the mean of the dataset?

What is the main function of Hadoop's MapReduce?

Which ensemble method adjusts weights for misclassified instances in iterative training?

You are a data engineer tasked with setting up a real-time data processing system for a large e-commerce platform. The goal is to analyze user behavior in real-time to provide instant recommendations. Which technology would be most appropriate for this task?

In Big Data processing, _ operations filter and sort data, while _ operations perform aggregations and transformations.