In Cassandra, data retrieval is fast because it uses a _______ based data model.

Relational
Document-oriented
Columnar
Key-Value

Cassandra uses a columnar-based data model. This model allows for efficient data retrieval and storage, making it suitable for applications with high read and write workloads, such as time-series data or analytics.

Discuss it

The range of a dataset is calculated by taking the difference between the maximum and the _______ value.

Minimum
Median
Mean
Mode

The range of a dataset is calculated by subtracting the minimum value from the maximum value. This measures the spread of data from the smallest to the largest value, making option A the correct answer.

Discuss it

What is the main challenge addressed by the transformer architecture in NLP?

Handling sequential data effectively
Capturing long-range dependencies
Image classification
Speech recognition

The main challenge addressed by the transformer architecture is capturing long-range dependencies in sequential data. Transformers use self-attention mechanisms to understand the relationship between distant words in a sentence, making them effective for various NLP tasks like machine translation and text summarization.

Discuss it

Which type of data is typically stored in relational databases with defined rows and columns?

Unstructured data
Tabular data
Hierarchical data
NoSQL data store

Relational databases are designed for storing structured data with well-defined rows and columns. This structured format allows for efficient storage and querying of data. Unstructured data, on the other hand, lacks a predefined structure.

Discuss it

In SQL, how can you prevent SQL injection in your queries?

Use stored procedures
Encrypt the database
Use Object-Relational Mapping (ORM)
Sanitize and parameterize inputs

To prevent SQL injection, you should sanitize and parameterize user inputs in your queries. This involves validating and escaping user input data to ensure that it cannot be used to execute malicious SQL commands. Other options, while important, do not directly prevent SQL injection.

Discuss it

In NoSQL databases, the absence of a fixed schema means that databases are _______.

Structured
Relational
Schemaless
Document-oriented

NoSQL databases are schemaless, which means they do not require a fixed schema for data storage. This flexibility allows for the storage of various types of data without predefined structure constraints.

Discuss it

When scaling features, which method is less influenced by outliers?

Standardization (Z-score scaling)
Min-Max Scaling
Robust Scaling
Log Transformation

Robust Scaling is less influenced by outliers because it scales the data based on the interquartile range (IQR) rather than the mean and standard deviation. This makes it a suitable choice when dealing with datasets that contain outliers.

Discuss it

The process of adjusting the weights in a neural network based on the error rate is known as _______.

Backpropagation
Data Preprocessing
Hyperparameter Tuning
Reinforcement Learning

Backpropagation is the process of adjusting the weights of a neural network to minimize the error between predicted and actual values. It is a fundamental training algorithm for neural networks, and it involves calculating gradients and updating weights to optimize the network's performance.

Discuss it

In the context of Big Data, which system is designed to provide high availability and fault tolerance by replicating data blocks across multiple nodes?

Hadoop Distributed File System (HDFS)
Apache Kafka
Apache Spark
NoSQL databases

The Hadoop Distributed File System (HDFS) is designed for high availability and fault tolerance. It achieves this by replicating data blocks across multiple nodes in a distributed cluster, ensuring data integrity and reliable data storage. This is a fundamental feature of Hadoop's file system.

Discuss it

A self-driving car company has millions of images labeled with either "pedestrian" or "no pedestrian". They want the car to automatically detect pedestrians. Which type of learning and algorithm would be optimal for this task?

Supervised Learning with Convolutional Neural Networks
Unsupervised Learning with Apriori Algorithm
Reinforcement Learning with Monte Carlo Methods
Semi-Supervised Learning with DBSCAN

Supervised Learning with Convolutional Neural Networks (CNNs) is the optimal choice for image classification tasks like pedestrian detection. CNNs are designed for such tasks, while the other options are not suitable for image classification. Apriori is used for association rule mining, reinforcement learning for decision-making, and DBSCAN for clustering.

Discuss it