Which backup method only captures the changes since the last full backup?

  • Differential Backup
  • Full Backup
  • Incremental Backup
  • Snap Backup
The backup method that captures only the changes made since the last full backup is called an "Incremental Backup." It helps in conserving storage space and time by backing up only the data that has changed since the last backup, whether it's a full or incremental one.

In terms of data warehousing, why might a cold backup be preferable to a hot backup?

  • Cold backups are faster to restore
  • Cold backups capture all changes in real-time
  • Cold backups do not disrupt normal operations
  • Cold backups require less storage space
In data warehousing, a cold backup is preferable to a hot backup when data needs to be backed up without disrupting the normal operations of the data warehouse. Unlike hot backups, cold backups can be taken when the system is offline, making them ideal for maintaining data integrity without interruptions.

Which tool or method is commonly used for monitoring the health and performance of a data warehouse?

  • Data Compression
  • Data Encryption
  • Data Obfuscation
  • Data Profiling
Data profiling is a common tool or method used for monitoring the health and performance of a data warehouse. Data profiling helps in assessing data quality, identifying anomalies, and ensuring that data conforms to the expected standards. It is essential for maintaining data warehouse data integrity.

A multinational corporation wants to ensure that its data warehouses in various regions can operate independently yet can be consolidated when needed. Which data warehousing approach would be most suitable?

  • Centralized Data Warehouse
  • Data Lake
  • Data Mart
  • Federated Data Warehouse
A federated data warehouse approach allows data warehouses in different regions to operate independently while also providing the capability to consolidate data when needed. It enables each regional data warehouse to maintain autonomy over its data, schema, and operations while still allowing global analysis and consolidation when required.

During the recovery of a data warehouse, what is the process of applying logs called?

  • Data Aggregation
  • Data Loading
  • Data Mining
  • Data Rollback
During the recovery of a data warehouse, the process of applying logs to restore the database to a consistent state is known as data loading. This process involves reapplying the transaction logs to recreate the state of the database at the time of the failure or recovery point.

In a three-tier data warehouse architecture, what is typically contained in the middle tier?

  • Data Access and Query
  • Data Presentation
  • Data Storage
  • Data Transformation
In a three-tier data warehouse architecture, the middle tier typically contains the data transformation layer. This layer is responsible for ETL (Extract, Transform, Load) processes, data cleansing, and ensuring data consistency before it is presented to users.

You're performing a regression analysis on a dataset, and you notice that small changes in the data lead to significantly different parameter estimates. What could be the potential cause for this?

  • Data leakage
  • Low variance
  • Multicollinearity
  • Underfitting
This instability of parameter estimates is a typical symptom of multicollinearity. When predictors are highly correlated, it becomes hard for the model to determine the effect of each predictor independently, hence slight changes in data can lead to very different parameter estimates.

What does a Pearson's correlation coefficient value of 0 indicate?

  • No relationship
  • Perfect negative relationship
  • Perfect positive relationship
  • Strong relationship
A Pearson's correlation coefficient value of 0 indicates no relationship between the two variables. Pearson's correlation measures the linear relationship between two datasets. A value of 0 suggests no linear relationship.

In a dataset, the type of data that can have an infinite number of possible values within a selected range is called _____ data.

  • Continuous
  • Discrete
  • Nominal
  • Ordinal
Continuous data can take any value within a range and can be subdivided infinitely.

What would be a potential problem when treating discrete data as continuous?

  • It can improve the accuracy of a machine learning model
  • It can lead to inaccurate conclusions due to incorrect statistical analyses
  • It can make the data cleaning process easier
  • It can simplify the data visualization process
Treating discrete data as continuous can lead to inaccurate conclusions due to incorrect statistical analyses. For example, it can affect the choice of statistical tests or machine learning models, leading to potential misinterpretation of the data.

Kendall's Tau is commonly used for which type of data?

  • Continuous data
  • Interval data
  • Nominal data
  • Ordinal data
Kendall's Tau is commonly used for ordinal data. It measures the ordinal association between two measured quantities. Like Spearman's correlation, it's based on ranks and is a suitable measure of association for ordinal data.

What are the implications of a low standard deviation in a data set?

  • The data values are close to the mean
  • The data values are spread out widely from the mean
  • The data values are uniformly distributed
  • The data values have many outliers
A "Low Standard Deviation" implies that the data values are "Close to the mean". In other words, most data points are close to the average data point.