In a task involving the classification of hand-written digits, the model is failing to capture intricate patterns in the data. Adding more layers seems to exacerbate the problem due to a certain issue in training deep networks. What is this issue likely called?
- Overfitting
- Vanishing Gradient
- Underfitting
- Exploding Gradient
The issue where adding more layers to a deep neural network exacerbates the training problem due to diminishing gradients is called "Vanishing Gradient." It occurs when gradients become too small during backpropagation, making it challenging for deep networks to learn intricate patterns in the data.
Which of the following stages in the ETL process is responsible for cleaning and validating the data to ensure quality?
- Extraction
- Transformation
- Loading
- Transformation
The "Transformation" stage in the ETL (Extract, Transform, Load) process is responsible for cleaning, validating, and transforming data to ensure its quality. This phase involves data cleaning, data type conversion, and other operations to make the data suitable for analysis and reporting.
When handling outliers in a dataset with skewed distributions, which measure of central tendency is preferred for imputation?
- Mean
- Median
- Mode
- Geometric Mean
When dealing with skewed datasets, the median is preferred for imputation. The median is robust to extreme values and is less affected by outliers than the mean. Using the median as the measure of central tendency helps maintain the integrity of the dataset in the presence of outliers.
Which role in Data Science primarily focuses on collecting, storing, and processing large datasets efficiently?
- Data Scientist
- Data Engineer
- Data Analyst
- Machine Learning Engineer
Data Engineers are responsible for the efficient collection, storage, and processing of data. They create the infrastructure necessary for Data Scientists and Analysts to work with data effectively.
When a dataset has values ranging from 0 to 1000 in one column and 0 to 1 in another column, which transformation can be used to scale them to a similar range?
- Normalization
- Log Transformation
- Standardization
- Min-Max Scaling
Min-Max Scaling, also known as feature scaling, is used to transform values within a specific range (typically 0 to 1) for different features. It ensures that variables with different scales have a similar impact on the analysis.
For datasets with multiple features, EDA often involves dimensionality reduction techniques like PCA to visualize data in two or three _______.
- Planes
- Points
- Dimensions
- Directions
Exploratory Data Analysis (EDA) often employs dimensionality reduction techniques like Principal Component Analysis (PCA) to visualize data in lower-dimensional spaces (2 or 3 dimensions) for better understanding, hence the term "dimensions."
The operation in CNNs that combines the outputs of neuron clusters and produces a single output for the cluster is known as _______.
- Activation Function
- Pooling
- Convolutions
- Fully Connected
In CNNs, the operation that combines the outputs of neuron clusters and produces a single output for the cluster is called "Pooling." Pooling reduces the spatial dimensions of the feature maps, making them smaller and more computationally efficient while retaining important features.
A healthcare organization stores patient records in a database. Each record contains structured fields like name, age, and diagnosis. Additionally, there are scanned documents and notes from doctors. Which term best describes the type of data in this healthcare database?
- Structured data
- Semi-structured data
- Unstructured data
- Big data
The healthcare database contains a mix of structured data (name, age, diagnosis) and semi-structured data (scanned documents and doctor's notes). Semi-structured data includes elements with partial structure, like documents, which can be tagged or indexed for better retrieval.
When a model performs well on training data but poorly on unseen data, what issue might it be facing?
- Overfitting
- Underfitting
- Data leakage
- Bias-variance tradeoff
The model is likely facing the issue of overfitting. Overfitting occurs when the model learns the training data too well, including noise, resulting in excellent performance on the training set but poor generalization to unseen data. It's an example of a high-variance problem in the bias-variance tradeoff. To address overfitting, techniques like regularization and more data are often used.
Which type of database is ideal for handling hierarchical data and provides better scalability, MongoDB or MySQL?
- MongoDB
- MySQL
- Both MongoDB and MySQL
- Neither MongoDB nor MySQL
MongoDB is a NoSQL database that is ideal for handling hierarchical data and provides better scalability for unstructured data. MongoDB uses BSON (Binary JSON) format, which makes it a good choice for applications that require flexibility and scalability in dealing with complex data structures.