A ________ is a unique identifier for each row in a table and is often used to establish relationships between tables in a relational database.

  • Candidate Key
  • Composite Key
  • Foreign Key
  • Primary Key
A Primary Key is a unique identifier for each row in a table, ensuring that no two rows have the same value. It is commonly used to establish relationships between tables in a relational database.

Scenario: A new regulation requires your organization to implement stricter data governance policies. How would you incorporate these policies into your data modeling best practices?

  • Define data ownership and access controls for sensitive data
  • Establish data retention and archival policies to comply with regulatory requirements
  • Implement metadata management practices to track data lineage and usage
  • Integrate data governance controls into the modeling process to ensure compliance with regulations
Incorporating stricter data governance policies into data modeling involves integrating data governance controls into the modeling process to ensure compliance, including defining ownership, access controls, metadata management, and adherence to retention policies.

What is the difference between a clustered index and a non-clustered index in an RDBMS?

  • Contains only key columns and row locators
  • Determines the order of data rows on disk
  • Has a higher level of fragmentation compared to clustered
  • Stores a separate copy of the table data
A clustered index in an RDBMS determines the order of data rows on disk, physically rearranging the data rows according to the index key. In contrast, a non-clustered index stores a separate copy of the index key columns along with row locators, maintaining a logical ordering of data without altering the physical order.

What does the term "index seek" refer to in the context of database indexing?

  • Identifying duplicate entries in the index
  • Scanning the entire table for matching records
  • Searching for specific data using an index structure
  • Sorting the data in the index in ascending order
"Index seek" in the context of database indexing refers to the process of searching for specific data by utilizing the index structure, resulting in efficient data retrieval based on the search criteria.

In which scenarios would you recommend denormalizing a database?

  • When data integrity is of utmost importance
  • When storage space is limited
  • When there are few tables with simple relationships
  • When there are performance bottlenecks in query processing
Denormalizing a database is recommended when there are performance bottlenecks in query processing, and optimization of query performance is necessary. By reducing the need for joins and simplifying data retrieval, denormalization can address these bottlenecks.

What type of diagrams can be created using data modeling tools like ERWin or Visio?

  • Entity-Relationship Diagrams (ERDs)
  • Flowcharts
  • Pie Charts
  • Scatter Plots
Data modeling tools like ERWin or Visio can be used to create Entity-Relationship Diagrams (ERDs). ERDs depict the relationships between different entities in a database schema, helping data engineers understand the structure of their databases and how data entities relate to each other.

What is a fact table in the context of data warehousing?

  • A table that contains numerical or quantitative data
  • A table that defines relationships between dimensions
  • A table that stores descriptive attributes
  • A table that stores historical data
In data warehousing, a fact table typically contains quantitative data related to business processes or events, forming the core of analytical queries. It often contains foreign keys to related dimension tables.

Scenario: Your company is migrating data from an on-premises data warehouse to a cloud-based platform. Describe how you would approach the data transformation process to ensure a seamless transition.

  • Change Data Capture (CDC)
  • Data Encryption
  • Parallel Processing
  • Schema-on-Read
Change Data Capture (CDC) would be utilized to capture and replicate changes from the on-premises data warehouse to the cloud-based platform in real-time, ensuring data consistency and minimizing downtime during the migration process.

A(n) ________ entity in an ERD depends on another entity for its existence and cannot be uniquely identified by its attributes alone.

  • Derived
  • Linked
  • Strong
  • Weak
A weak entity in an Entity-Relationship Diagram (ERD) is one that depends on another entity (the parent entity) for its existence and cannot be uniquely identified by its own attributes alone.

What is the difference between a producer and a consumer in Kafka?

  • Consumers publish messages to Kafka topics
  • Consumers subscribe to Kafka topics
  • Producers consume messages from Kafka topics
  • Producers publish messages to Kafka topics
In Kafka, producers publish messages to Kafka topics, while consumers subscribe to these topics to consume messages. Producers are responsible for generating data, while consumers process and use that data.