A healthcare organization stores patient records in a database. Each record contains structured fields like name, age, and diagnosis. Additionally, there are scanned documents and notes from doctors. Which term best describes the type of data in this healthcare database?

Structured data
Semi-structured data
Unstructured data
Big data

The healthcare database contains a mix of structured data (name, age, diagnosis) and semi-structured data (scanned documents and doctor's notes). Semi-structured data includes elements with partial structure, like documents, which can be tagged or indexed for better retrieval.

Discuss it

When a model performs well on training data but poorly on unseen data, what issue might it be facing?

Overfitting
Underfitting
Data leakage
Bias-variance tradeoff

The model is likely facing the issue of overfitting. Overfitting occurs when the model learns the training data too well, including noise, resulting in excellent performance on the training set but poor generalization to unseen data. It's an example of a high-variance problem in the bias-variance tradeoff. To address overfitting, techniques like regularization and more data are often used.

Discuss it

Which type of database is ideal for handling hierarchical data and provides better scalability, MongoDB or MySQL?

MongoDB
MySQL
Both MongoDB and MySQL
Neither MongoDB nor MySQL

MongoDB is a NoSQL database that is ideal for handling hierarchical data and provides better scalability for unstructured data. MongoDB uses BSON (Binary JSON) format, which makes it a good choice for applications that require flexibility and scalability in dealing with complex data structures.

Discuss it

A company uses an AI model for recruitment, and it's observed that the model is selecting more male candidates than female candidates for a tech role, even when both genders have similar qualifications. What ethical concern does this scenario highlight?

Data bias in AI
Lack of transparency in AI
Data security and privacy issues in AI
Ethical AI governance and accountability

This scenario highlights the ethical concern of "Data bias in AI." The AI model's biased selection towards male candidates indicates that the training data may be biased, leading to unfair and discriminatory outcomes. Addressing data bias is essential to ensure fairness and diversity in AI-driven recruitment.

Discuss it

Which algorithm would you use when you have a mix of input features (both categorical and continuous) and you need to ensure interpretability of the model?

Random Forest
Support Vector Machines (SVM)
Neural Networks
Naive Bayes Classifier

Random Forest is a suitable choice for mixed input features when interpretability is important. It combines decision trees and is often used for feature selection and interpretability, making it a good option for such cases.

Discuss it

In a relational database, what is used to ensure data integrity across multiple tables?

Primary Key
Foreign Key
Index
Trigger

A Foreign Key is used in a relational database to ensure data integrity by creating a link between tables. It enforces referential integrity, ensuring that values in one table match values in another. Primary Keys are used to uniquely identify records in a table, not to maintain integrity across tables. Indexes and Triggers serve different purposes.

Discuss it

The _______ is a measure of the relationship between two variables and ranges between -1 and 1.

P-value
Correlation coefficient
Standard error
Regression

The measure of the relationship between two variables, ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation), is known as the "correlation coefficient." It quantifies the strength and direction of the linear relationship between variables.

Discuss it

How do federated learning approaches differ from traditional machine learning in terms of data handling?

Federated learning doesn't use data
Federated learning relies on centralized data storage
Federated learning trains models on decentralized data
Traditional machine learning trains models on a single dataset

Federated learning trains machine learning models on decentralized data sources without transferring them to a central server. This approach is privacy-preserving and efficient. In contrast, traditional machine learning typically trains models on a single, centralized dataset, which may raise data privacy concerns.

Discuss it

A common task in supervised learning where the output variable is categorical, such as 'spam' or 'not spam', is called _______.

Classification
Regression
Clustering
Association

The correct term is "Classification." In supervised learning, the goal is to predict a categorical output variable based on input features. Common examples include classifying emails as 'spam' or 'not spam' (binary classification) or classifying objects into multiple categories (multi-class classification). Classification models aim to assign inputs to predefined categories, making it an essential task in supervised learning.

Discuss it

When considering the Data Science Life Cycle, which step involves assessing the performance of your model and ensuring it meets the project's objectives?

Data Collection
Data Preprocessing
Model Building and Training
Model Evaluation and Deployment

Model Evaluation and Deployment is the phase where you assess the performance of your data model and ensure it meets the project's objectives. During this step, you use various metrics and techniques to evaluate how well the model is performing and decide whether it's ready for deployment. This phase is crucial for ensuring that the data-driven solution is effective and meets the desired outcomes.

Discuss it