In data warehousing, what does ETL stand for?

  • Efficient Transactional Logic
  • Export, Transform, Load
  • Extract, Transfer, Load
  • Extract, Transform, Load
ETL stands for Extract, Transform, Load. It is a process used in data warehousing to extract data from source systems, transform it into a usable format, and then load it into a data warehouse for analysis and reporting.

In time series analysis, how is the term 'stationarity' best described?

  • The ability of a time series to move in a straight line
  • The predictability of future values in a time series
  • The presence of external factors affecting a time series
  • The statistical properties of a time series remaining constant over time
Stationarity refers to the statistical properties of a time series remaining constant over time. Achieving stationarity is important for accurate modeling and forecasting in time series analysis.

When you need to create a lagged feature in a time series dataset in Pandas, which function would you use?

  • delay()
  • diff()
  • lag()
  • shift()
The shift() function in Pandas is used to create lagged features in a time series dataset. It shifts the values of a column by a specified number of periods, allowing you to create lagged versions of the original data for time series analysis.

A _______ tree is a data structure that allows fast search, insert, delete, and nearest-neighbor operations.

  • AVL
  • B-Tree
  • Heap
  • Trie
A B-Tree is a self-balancing tree data structure that allows for efficient search, insert, delete, and nearest-neighbor operations. It is commonly used in databases and file systems for its balanced nature, ensuring consistent performance.

For real-time data processing, ETL uses ________ to handle streaming data.

  • Apache Kafka
  • Hadoop
  • MongoDB
  • SQL Server
Apache Kafka is commonly used in ETL processes for real-time data processing. It is a distributed event streaming platform that excels at handling high-throughput, fault-tolerant, and scalable data streams, making it suitable for managing streaming data in ETL pipelines.

How does a NoSQL database differ from a traditional SQL database?

  • NoSQL databases are limited to a single data model.
  • NoSQL databases are schema-less, allowing for flexible and dynamic data models.
  • SQL databases are only suitable for small-scale applications.
  • SQL databases use a key-value pair storage mechanism.
NoSQL databases provide a flexible and dynamic data model, allowing for schema-less data storage. This contrasts with traditional SQL databases, which follow a structured, tabular format with a fixed schema.

In a binary search algorithm, what is the time complexity for searching an element in a sorted array of n elements?

  • O(1)
  • O(log n)
  • O(n)
  • O(n^2)
The time complexity of a binary search algorithm is O(log n), as it repeatedly divides the search interval in half, resulting in a logarithmic time complexity. This makes it more efficient than linear search algorithms (O(n)).

When using Pandas, how do you check the first five rows of a DataFrame?

  • head(5)
  • first(5)
  • top(5)
  • show(5)
To check the first five rows of a DataFrame in Pandas, you use the head(5) method. This function returns the first N rows of the DataFrame, and it is a common practice to use head() with the argument 5 to display the initial rows. The other options are not valid methods for achieving this task in Pandas.

Which metric is commonly used to evaluate the accuracy of a predictive model in classification tasks?

  • Accuracy
  • Mean Squared Error
  • Precision
  • R-squared
Accuracy is a common metric used to evaluate the performance of a predictive model in classification tasks. It represents the ratio of correctly predicted instances to the total instances and provides a general measure of the model's correctness. Other metrics, such as precision, recall, and F1 score, are also used depending on the specific requirements of the task.

When creating a financial forecast model in Excel, what techniques would be crucial for accurate predictions and data integrity?

  • Auditing Tools
  • Data Validation
  • Scenario Manager
  • Sensitivity Analysis
Scenario Manager in Excel is crucial for creating different scenarios in a financial forecast model, allowing for better analysis of potential outcomes. Sensitivity Analysis, Data Validation, and Auditing Tools are important for maintaining data integrity and accuracy in financial models.

_______ is a technique in unsupervised learning used to reduce the dimensionality of data.

  • Decision Trees
  • K-Means Clustering
  • Principal Component Analysis (PCA)
  • Support Vector Machines (SVM)
Principal Component Analysis (PCA) is a technique in unsupervised learning used to reduce the dimensionality of data by transforming it into a set of linearly uncorrelated variables known as principal components.

In the context of dashboard design, what is the significance of the 'data-ink ratio'?

  • It calculates the ratio of data points to the size of the dashboard, optimizing space utilization.
  • It evaluates the ratio of data points to the ink color used, emphasizing the importance of color coding.
  • It measures the ratio of data points to the total number of points on a chart, ensuring data accuracy.
  • It represents the ratio of data to the total ink used in a visualization, emphasizing the importance of minimizing non-data ink.
The 'data-ink ratio' represents the proportion of ink in a visualization that conveys meaningful information. It emphasizes the importance of maximizing the ink used to represent data while minimizing non-data ink, promoting clarity and efficiency in dashboard design.