A(n) ________ entity in an ERD depends on another entity for its existence and cannot be uniquely identified by its attributes alone.
- Derived
- Linked
- Strong
- Weak
A weak entity in an Entity-Relationship Diagram (ERD) is one that depends on another entity (the parent entity) for its existence and cannot be uniquely identified by its own attributes alone.
Scenario: Your company is migrating data from an on-premises data warehouse to a cloud-based platform. Describe how you would approach the data transformation process to ensure a seamless transition.
- Change Data Capture (CDC)
- Data Encryption
- Parallel Processing
- Schema-on-Read
Change Data Capture (CDC) would be utilized to capture and replicate changes from the on-premises data warehouse to the cloud-based platform in real-time, ensuring data consistency and minimizing downtime during the migration process.
What is a fact table in the context of data warehousing?
- A table that contains numerical or quantitative data
- A table that defines relationships between dimensions
- A table that stores descriptive attributes
- A table that stores historical data
In data warehousing, a fact table typically contains quantitative data related to business processes or events, forming the core of analytical queries. It often contains foreign keys to related dimension tables.
What type of diagrams can be created using data modeling tools like ERWin or Visio?
- Entity-Relationship Diagrams (ERDs)
- Flowcharts
- Pie Charts
- Scatter Plots
Data modeling tools like ERWin or Visio can be used to create Entity-Relationship Diagrams (ERDs). ERDs depict the relationships between different entities in a database schema, helping data engineers understand the structure of their databases and how data entities relate to each other.
In which scenarios would you recommend denormalizing a database?
- When data integrity is of utmost importance
- When storage space is limited
- When there are few tables with simple relationships
- When there are performance bottlenecks in query processing
Denormalizing a database is recommended when there are performance bottlenecks in query processing, and optimization of query performance is necessary. By reducing the need for joins and simplifying data retrieval, denormalization can address these bottlenecks.
What does the term "index seek" refer to in the context of database indexing?
- Identifying duplicate entries in the index
- Scanning the entire table for matching records
- Searching for specific data using an index structure
- Sorting the data in the index in ascending order
"Index seek" in the context of database indexing refers to the process of searching for specific data by utilizing the index structure, resulting in efficient data retrieval based on the search criteria.
What is the difference between a clustered index and a non-clustered index in an RDBMS?
- Contains only key columns and row locators
- Determines the order of data rows on disk
- Has a higher level of fragmentation compared to clustered
- Stores a separate copy of the table data
A clustered index in an RDBMS determines the order of data rows on disk, physically rearranging the data rows according to the index key. In contrast, a non-clustered index stores a separate copy of the index key columns along with row locators, maintaining a logical ordering of data without altering the physical order.
Scenario: A new regulation requires your organization to implement stricter data governance policies. How would you incorporate these policies into your data modeling best practices?
- Define data ownership and access controls for sensitive data
- Establish data retention and archival policies to comply with regulatory requirements
- Implement metadata management practices to track data lineage and usage
- Integrate data governance controls into the modeling process to ensure compliance with regulations
Incorporating stricter data governance policies into data modeling involves integrating data governance controls into the modeling process to ensure compliance, including defining ownership, access controls, metadata management, and adherence to retention policies.
A ________ is a predefined set of rules used to identify and correct errors in incoming data during the loading process.
- Data pipeline
- Data schema
- Data validation rule
- Data warehouse
A data validation rule is a predefined set of rules used to identify and correct errors in incoming data during the loading process. These rules ensure data integrity and consistency in the target system.
Which technology is commonly used for real-time data processing?
- Apache Kafka
- Hadoop
- MongoDB
- PostgreSQL
Apache Kafka is a widely used technology for real-time data processing. It is a distributed streaming platform that enables applications to publish, subscribe to, store, and process streams of records in real-time. Kafka's architecture provides fault tolerance, scalability, and high throughput, making it suitable for building real-time data pipelines and stream processing applications across various industries.
In the ETL process, data is extracted from multiple sources such as ________.
- APIs
- All of the above
- Databases
- Spreadsheets
In the ETL (Extract, Transform, Load) process, data can be extracted from various sources such as databases, APIs (Application Programming Interfaces), spreadsheets, and more.
The ________ index is a type of index that organizes data in the order of the index key and physically reorders the rows in the table accordingly.
- Clustered
- Composite
- Non-clustered
- Unique
The clustered index is a type of index that organizes data in the order of the index key. It physically reorders the rows in the table according to the index key, which can improve performance for certain types of queries.