In Cassandra, data retrieval is fast because it uses a _______ based data model.
- Relational
- Document-oriented
- Columnar
- Key-Value
Cassandra uses a columnar-based data model. This model allows for efficient data retrieval and storage, making it suitable for applications with high read and write workloads, such as time-series data or analytics.
The range of a dataset is calculated by taking the difference between the maximum and the _______ value.
- Minimum
- Median
- Mean
- Mode
The range of a dataset is calculated by subtracting the minimum value from the maximum value. This measures the spread of data from the smallest to the largest value, making option A the correct answer.
What is the main challenge addressed by the transformer architecture in NLP?
- Handling sequential data effectively
- Capturing long-range dependencies
- Image classification
- Speech recognition
The main challenge addressed by the transformer architecture is capturing long-range dependencies in sequential data. Transformers use self-attention mechanisms to understand the relationship between distant words in a sentence, making them effective for various NLP tasks like machine translation and text summarization.
Which type of data is typically stored in relational databases with defined rows and columns?
- Unstructured data
- Tabular data
- Hierarchical data
- NoSQL data store
Relational databases are designed for storing structured data with well-defined rows and columns. This structured format allows for efficient storage and querying of data. Unstructured data, on the other hand, lacks a predefined structure.
In SQL, how can you prevent SQL injection in your queries?
- Use stored procedures
- Encrypt the database
- Use Object-Relational Mapping (ORM)
- Sanitize and parameterize inputs
To prevent SQL injection, you should sanitize and parameterize user inputs in your queries. This involves validating and escaping user input data to ensure that it cannot be used to execute malicious SQL commands. Other options, while important, do not directly prevent SQL injection.
In NoSQL databases, the absence of a fixed schema means that databases are _______.
- Structured
- Relational
- Schemaless
- Document-oriented
NoSQL databases are schemaless, which means they do not require a fixed schema for data storage. This flexibility allows for the storage of various types of data without predefined structure constraints.
When scaling features, which method is less influenced by outliers?
- Standardization (Z-score scaling)
- Min-Max Scaling
- Robust Scaling
- Log Transformation
Robust Scaling is less influenced by outliers because it scales the data based on the interquartile range (IQR) rather than the mean and standard deviation. This makes it a suitable choice when dealing with datasets that contain outliers.
The process of adjusting the weights in a neural network based on the error rate is known as _______.
- Backpropagation
- Data Preprocessing
- Hyperparameter Tuning
- Reinforcement Learning
Backpropagation is the process of adjusting the weights of a neural network to minimize the error between predicted and actual values. It is a fundamental training algorithm for neural networks, and it involves calculating gradients and updating weights to optimize the network's performance.
In the context of Big Data, which system is designed to provide high availability and fault tolerance by replicating data blocks across multiple nodes?
- Hadoop Distributed File System (HDFS)
- Apache Kafka
- Apache Spark
- NoSQL databases
The Hadoop Distributed File System (HDFS) is designed for high availability and fault tolerance. It achieves this by replicating data blocks across multiple nodes in a distributed cluster, ensuring data integrity and reliable data storage. This is a fundamental feature of Hadoop's file system.
A self-driving car company has millions of images labeled with either "pedestrian" or "no pedestrian". They want the car to automatically detect pedestrians. Which type of learning and algorithm would be optimal for this task?
- Supervised Learning with Convolutional Neural Networks
- Unsupervised Learning with Apriori Algorithm
- Reinforcement Learning with Monte Carlo Methods
- Semi-Supervised Learning with DBSCAN
Supervised Learning with Convolutional Neural Networks (CNNs) is the optimal choice for image classification tasks like pedestrian detection. CNNs are designed for such tasks, while the other options are not suitable for image classification. Apriori is used for association rule mining, reinforcement learning for decision-making, and DBSCAN for clustering.