A _______ tree is a data structure that allows fast search, insert, delete, and nearest-neighbor operations.
- AVL
- B-Tree
- Heap
- Trie
A B-Tree is a self-balancing tree data structure that allows for efficient search, insert, delete, and nearest-neighbor operations. It is commonly used in databases and file systems for its balanced nature, ensuring consistent performance.
For real-time data processing, ETL uses ________ to handle streaming data.
- Apache Kafka
- Hadoop
- MongoDB
- SQL Server
Apache Kafka is commonly used in ETL processes for real-time data processing. It is a distributed event streaming platform that excels at handling high-throughput, fault-tolerant, and scalable data streams, making it suitable for managing streaming data in ETL pipelines.
How does a NoSQL database differ from a traditional SQL database?
- NoSQL databases are limited to a single data model.
- NoSQL databases are schema-less, allowing for flexible and dynamic data models.
- SQL databases are only suitable for small-scale applications.
- SQL databases use a key-value pair storage mechanism.
NoSQL databases provide a flexible and dynamic data model, allowing for schema-less data storage. This contrasts with traditional SQL databases, which follow a structured, tabular format with a fixed schema.
In a binary search algorithm, what is the time complexity for searching an element in a sorted array of n elements?
- O(1)
- O(log n)
- O(n)
- O(n^2)
The time complexity of a binary search algorithm is O(log n), as it repeatedly divides the search interval in half, resulting in a logarithmic time complexity. This makes it more efficient than linear search algorithms (O(n)).
When using Pandas, how do you check the first five rows of a DataFrame?
- head(5)
- first(5)
- top(5)
- show(5)
To check the first five rows of a DataFrame in Pandas, you use the head(5) method. This function returns the first N rows of the DataFrame, and it is a common practice to use head() with the argument 5 to display the initial rows. The other options are not valid methods for achieving this task in Pandas.
Which metric is commonly used to evaluate the accuracy of a predictive model in classification tasks?
- Accuracy
- Mean Squared Error
- Precision
- R-squared
Accuracy is a common metric used to evaluate the performance of a predictive model in classification tasks. It represents the ratio of correctly predicted instances to the total instances and provides a general measure of the model's correctness. Other metrics, such as precision, recall, and F1 score, are also used depending on the specific requirements of the task.
When creating a financial forecast model in Excel, what techniques would be crucial for accurate predictions and data integrity?
- Auditing Tools
- Data Validation
- Scenario Manager
- Sensitivity Analysis
Scenario Manager in Excel is crucial for creating different scenarios in a financial forecast model, allowing for better analysis of potential outcomes. Sensitivity Analysis, Data Validation, and Auditing Tools are important for maintaining data integrity and accuracy in financial models.
_______ is a technique in unsupervised learning used to reduce the dimensionality of data.
- Decision Trees
- K-Means Clustering
- Principal Component Analysis (PCA)
- Support Vector Machines (SVM)
Principal Component Analysis (PCA) is a technique in unsupervised learning used to reduce the dimensionality of data by transforming it into a set of linearly uncorrelated variables known as principal components.
In the context of dashboard design, what is the significance of the 'data-ink ratio'?
- It calculates the ratio of data points to the size of the dashboard, optimizing space utilization.
- It evaluates the ratio of data points to the ink color used, emphasizing the importance of color coding.
- It measures the ratio of data points to the total number of points on a chart, ensuring data accuracy.
- It represents the ratio of data to the total ink used in a visualization, emphasizing the importance of minimizing non-data ink.
The 'data-ink ratio' represents the proportion of ink in a visualization that conveys meaningful information. It emphasizes the importance of maximizing the ink used to represent data while minimizing non-data ink, promoting clarity and efficiency in dashboard design.
When the following is executed: data = [1, 2, 3, 4, 5]; filtered = filter(lambda x: x % 2 == 0, data); print(list(filtered)), what is the output?
- [1, 2, 3, 4, 5]
- [1, 3, 5]
- [2, 4]
- [4]
The filter function with the lambda expression filters out the even numbers from data, resulting in the output [2, 4].
How does 'snowflake schema' in a data warehouse improve upon the star schema?
- It adds more complexity to the data model.
- It eliminates the need for dimension tables.
- It increases the number of redundant fields in dimension tables.
- It normalizes dimension tables, reducing redundancy and improving data integrity.
The 'snowflake schema' improves upon the star schema by normalizing dimension tables, reducing redundancy, and improving data integrity. This makes the schema more flexible and scalable, allowing for efficient storage and maintenance of data in the data warehouse.
The process of transforming a complex query into a simpler query without changing the query result is known as SQL ________.
- Query Minimization
- Query Optimization
- Query Refactoring
- Query Simplification
SQL Query Optimization involves transforming a complex query into a simpler and more efficient form without altering the query result. It aims to improve performance and make the query more readable and maintainable.