In a scenario where an organization is facing data breaches, which data governance policy should be prioritized for improvement?

  • Access Control
  • Data Classification
  • Data Encryption
  • Data Retention
In the context of data breaches, improving data encryption policies is crucial. Encryption helps protect sensitive data, making it unreadable to unauthorized users even if they gain access. Strengthening this policy enhances the organization's data security measures.

Which HTTP method is most commonly used to retrieve data from a RESTful API?

  • DELETE
  • GET
  • POST
  • PUT
The most common HTTP method used to retrieve data from a RESTful API is GET. The GET method is used to request data from a specified resource and is considered a safe and idempotent operation in the context of RESTful architecture.

What is kurtosis in a data set, and how does it inform about the data distribution?

  • Kurtosis measures the "tailedness" of a distribution
  • Kurtosis measures the central tendency of a distribution
  • Kurtosis measures the shape of a distribution's peak
  • Kurtosis measures the spread of a distribution
Kurtosis is a measure of the "tailedness" or shape of the tails of a distribution. It informs about the presence of outliers and the sharpness of the distribution's peak.

The _______ project allows SQL-like queries to be executed on various data sources including Hadoop and NoSQL databases.

  • Apache Drill
  • Apache Hive
  • Apache Impala
  • Apache Presto
Apache Presto is a distributed SQL query engine designed for executing SQL-like queries on various data sources, including Hadoop and NoSQL databases. It enables users to query and analyze data across different data sources seamlessly. Apache Hive, Apache Impala, and Apache Drill are also tools for querying big data, but Apache Presto is known for its flexibility and performance.

In a banking context, how can predictive analytics be used to detect potential fraudulent transactions?

  • Anomaly Detection
  • Clustering
  • Decision Trees
  • Linear Regression
Anomaly Detection is an effective method for detecting potential fraudulent transactions in a banking context. This approach identifies deviations from normal patterns, helping to flag transactions that exhibit unusual behavior. Clustering, Linear Regression, and Decision Trees are valuable for other types of predictions but may not be as effective in capturing the anomalous patterns associated with fraud.

Data mining often involves sorting data into different groups. What is this process called?

  • Anomaly detection
  • Classification
  • Clustering
  • Regression
The process of sorting data into different groups based on similarities is called clustering. This technique helps in identifying patterns and relationships within the data, allowing for better analysis and decision-making.

What is the time complexity of the Floyd-Warshall algorithm used for finding shortest paths in a weighted graph?

  • O(E log V)
  • O(V log V)
  • O(V^2)
  • O(V^3)
The time complexity of the Floyd-Warshall algorithm is O(V^3), where V is the number of vertices in the graph. This algorithm efficiently computes the shortest paths between all pairs of vertices in a weighted graph, making it suitable for dense graphs.

What visualization technique is most appropriate for multi-dimensional data analysis?

  • Box Plot
  • Parallel Coordinates
  • Radar Chart
  • Scatter Plot
Parallel Coordinates is a powerful visualization technique for multi-dimensional data analysis. It allows simultaneous visualization of multiple dimensions, making it easier to identify patterns and relationships in complex datasets. Scatter plots are typically used for two-dimensional data, while radar charts and box plots have different applications.

In Excel, which feature allows you to view different parts of the same worksheet at the same time?

  • Arrange
  • Merge
  • Split
  • View Side by Side
The "View Side by Side" feature in Excel allows you to view different parts of the same worksheet simultaneously. This is useful for comparing or editing data in different sections of a large worksheet.

What is the benefit of using branches in Git?

  • Allows for parallel development, isolation of features, and experimentation without affecting the main codebase.
  • Branches are used for code documentation.
  • Enhances the performance of Git commands.
  • Facilitates automatic code deployment.
Using branches in Git allows developers to work on separate features or bug fixes independently, keeping changes isolated until they are ready to be merged. This parallel development ensures that the main codebase remains stable, and different features can be worked on simultaneously.

What is the purpose of the GROUP BY clause in an SQL query?

  • It is used to aggregate data based on specified columns, grouping the results.
  • It is used to filter records based on a specified condition.
  • It is used to join multiple tables in a query.
  • It is used to sort records in ascending or descending order.
The GROUP BY clause is used to aggregate data based on specified columns. It groups the results and allows for the application of aggregate functions like COUNT, SUM, AVG, etc., on each group separately.

For large-scale data sets, _______ techniques are applied to manage and interpret the data efficiently.

  • Clustering
  • Normalization
  • Sampling
  • Stratification
Sampling techniques are applied to large-scale data sets to manage and interpret the data efficiently. By analyzing a subset of the data, meaningful insights can be derived without the need to process the entire dataset.