_______ is a technique in unsupervised learning used to reduce the dimensionality of data.
- Decision Trees
- K-Means Clustering
- Principal Component Analysis (PCA)
- Support Vector Machines (SVM)
Principal Component Analysis (PCA) is a technique in unsupervised learning used to reduce the dimensionality of data by transforming it into a set of linearly uncorrelated variables known as principal components.
In Excel, what is the difference between relative and absolute cell references?
- Absolute references are used for text data, and relative references are used for numeric data.
- Absolute references change automatically when a formula is copied, while relative references stay the same.
- Relative references adjust when a formula is copied to another cell, while absolute references remain constant.
- Relative references are only used in complex formulas.
The key difference is that relative references adjust when a formula is copied to another cell, whereas absolute references remain constant. This distinction is crucial for maintaining the integrity of formulas in Excel.
In database normalization, the process of organizing data to reduce redundancy is referred to as _______.
- Aggregation
- Denormalization
- Indexing
- Normalization
In database normalization, the process of organizing data to reduce redundancy is referred to as "Normalization." This involves organizing data to minimize duplication and dependency, leading to a more efficient and structured database design.
_______ thinking is a method used to explore complex problems by viewing them from different perspectives.
- Analytical
- Creative
- Critical
- Design
Design thinking is a method used to explore complex problems by viewing them from different perspectives. It emphasizes creativity and innovation in problem-solving. Critical thinking is important but doesn't necessarily focus on different perspectives in the same way as design thinking. Analytical thinking is more about breaking down problems logically, and creative thinking is more about generating novel ideas.
When integrating real-time data into a dashboard, what is a key factor to ensure data accuracy and timeliness?
- Data complexity
- Data latency
- Data storage
- Data volume
Data latency is a critical factor when integrating real-time data into a dashboard. It refers to the delay between the occurrence of an event and its reflection in the dashboard. Minimizing data latency ensures that the dashboard displays accurate and timely information.
In DBMS, what does ACID stand for in the context of transactions?
- Access, Control, Integration, Distribution
- Accuracy, Cohesion, Inheritance, Dependency
- Association, Collaboration, Inheritance, Division
- Atomicity, Consistency, Isolation, Durability
ACID stands for Atomicity, Consistency, Isolation, and Durability. These properties ensure the reliability and integrity of transactions in a database, guaranteeing that they are processed reliably.
In dplyr, to perform operations on multiple columns at once, the _______ function is used.
- across()
- group_by()
- mutate()
- summarize()
The across() function in dplyr is used to perform operations on multiple columns simultaneously. It is particularly useful when you want to apply the same operation to multiple columns in a data frame.
What advanced metric is used to assess the long-term value of a customer to a business?
- Cost per Acquisition (CPA)
- Customer Lifetime Value (CLV)
- Net Promoter Score (NPS)
- Return on Investment (ROI)
Customer Lifetime Value (CLV) is a key metric in assessing the long-term value of a customer to a business. It represents the total revenue a business expects to earn from a customer throughout their entire relationship. ROI, NPS, and CPA are important metrics but focus on different aspects.
What is the impact of big data technologies on data-driven decision making?
- Enhanced scalability and processing speed
- Increased data security concerns
- Limited applicability to small datasets
- Reduced need for data analysis
Big data technologies, with enhanced scalability and processing speed, enable organizations to process and analyze vast amounts of data quickly. This facilitates more informed and timely data-driven decision making.
In a scenario where a business needs to perform complex data analyses with minimal upfront investment, which cloud service would be most appropriate?
- AWS Glue
- AWS Redshift
- Azure Data Lake Analytics
- Google BigQuery
Google BigQuery would be most appropriate. It is a serverless, highly scalable, and cost-effective data warehouse that allows complex data analyses with minimal upfront investment.