The operation in CNNs that combines the outputs of neuron clusters and produces a single output for the cluster is known as _______.
- Activation Function
- Pooling
- Convolutions
- Fully Connected
In CNNs, the operation that combines the outputs of neuron clusters and produces a single output for the cluster is called "Pooling." Pooling reduces the spatial dimensions of the feature maps, making them smaller and more computationally efficient while retaining important features.
A healthcare organization stores patient records in a database. Each record contains structured fields like name, age, and diagnosis. Additionally, there are scanned documents and notes from doctors. Which term best describes the type of data in this healthcare database?
- Structured data
- Semi-structured data
- Unstructured data
- Big data
The healthcare database contains a mix of structured data (name, age, diagnosis) and semi-structured data (scanned documents and doctor's notes). Semi-structured data includes elements with partial structure, like documents, which can be tagged or indexed for better retrieval.
When a model performs well on training data but poorly on unseen data, what issue might it be facing?
- Overfitting
- Underfitting
- Data leakage
- Bias-variance tradeoff
The model is likely facing the issue of overfitting. Overfitting occurs when the model learns the training data too well, including noise, resulting in excellent performance on the training set but poor generalization to unseen data. It's an example of a high-variance problem in the bias-variance tradeoff. To address overfitting, techniques like regularization and more data are often used.
Which type of database is ideal for handling hierarchical data and provides better scalability, MongoDB or MySQL?
- MongoDB
- MySQL
- Both MongoDB and MySQL
- Neither MongoDB nor MySQL
MongoDB is a NoSQL database that is ideal for handling hierarchical data and provides better scalability for unstructured data. MongoDB uses BSON (Binary JSON) format, which makes it a good choice for applications that require flexibility and scalability in dealing with complex data structures.
A company uses an AI model for recruitment, and it's observed that the model is selecting more male candidates than female candidates for a tech role, even when both genders have similar qualifications. What ethical concern does this scenario highlight?
- Data bias in AI
- Lack of transparency in AI
- Data security and privacy issues in AI
- Ethical AI governance and accountability
This scenario highlights the ethical concern of "Data bias in AI." The AI model's biased selection towards male candidates indicates that the training data may be biased, leading to unfair and discriminatory outcomes. Addressing data bias is essential to ensure fairness and diversity in AI-driven recruitment.
Which algorithm would you use when you have a mix of input features (both categorical and continuous) and you need to ensure interpretability of the model?
- Random Forest
- Support Vector Machines (SVM)
- Neural Networks
- Naive Bayes Classifier
Random Forest is a suitable choice for mixed input features when interpretability is important. It combines decision trees and is often used for feature selection and interpretability, making it a good option for such cases.
In a relational database, what is used to ensure data integrity across multiple tables?
- Primary Key
- Foreign Key
- Index
- Trigger
A Foreign Key is used in a relational database to ensure data integrity by creating a link between tables. It enforces referential integrity, ensuring that values in one table match values in another. Primary Keys are used to uniquely identify records in a table, not to maintain integrity across tables. Indexes and Triggers serve different purposes.
The _______ is a measure of the relationship between two variables and ranges between -1 and 1.
- P-value
- Correlation coefficient
- Standard error
- Regression
The measure of the relationship between two variables, ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation), is known as the "correlation coefficient." It quantifies the strength and direction of the linear relationship between variables.
_________ is a popular open-source framework used for real-time processing and analytics of large streams of data.
- Hadoop
- Spark
- Hive
- Kafka
Apache Spark is a widely used open-source framework for real-time processing and analytics of large streams of data. It provides powerful tools for data processing, machine learning, and more, making it a popular choice in the field of big data and data science.
A common task in supervised learning where the output variable is categorical, such as 'spam' or 'not spam', is called _______.
- Classification
- Regression
- Clustering
- Association
The correct term is "Classification." In supervised learning, the goal is to predict a categorical output variable based on input features. Common examples include classifying emails as 'spam' or 'not spam' (binary classification) or classifying objects into multiple categories (multi-class classification). Classification models aim to assign inputs to predefined categories, making it an essential task in supervised learning.