While training a deep neural network, you notice that the gradients are becoming extremely small, making the weights of the initial layers change very slowly. What might be the primary cause of this issue?
- Overfitting
- Vanishing gradients due to the use of deep activation functions
- Underfitting due to a small learning rate
- Excessive learning rate causing divergence
The primary cause of extremely small gradients in deep neural networks is vanishing gradients, often caused by the use of deep activation functions like sigmoid or tanh. As gradients propagate backward through many layers, they tend to approach zero, which can slow down training. Proper initialization techniques and activation functions like ReLU can help mitigate this issue.
What is a common technique to prevent overfitting in linear regression models?
- Increasing the model complexity
- Reducing the number of features
- Regularization
- Using a smaller training dataset
Regularization is a common technique used to prevent overfitting in linear regression models. It adds a penalty term to the linear regression's cost function to discourage overly complex models. Regularization techniques include L1 (Lasso) and L2 (Ridge) regularization.
In which type of data do you often encounter a mix of structured tables and unstructured text?
- Structured Data
- Semi-Structured Data
- Unstructured Data
- Multivariate Data
Semi-structured data often contains a mix of structured tables and unstructured text. It's a flexible data format that can combine organized data elements with more free-form content, making it suitable for a wide range of data types and use cases, such as web data and NoSQL databases.
In transfer learning, a model trained on a large dataset is used as a starting point, and the knowledge gained is transferred to a new, _______ task.
- Completely unrelated
- Identical
- Similar
- Smaller-scale
In transfer learning, a model trained on a large dataset is used as a starting point, and the knowledge gained is transferred to a new, similar task. This leverages the pre-trained model's knowledge to improve performance on the new task, particularly when the tasks are related.
In Data Science, when dealing with large datasets that do not fit into memory, the Python library _______ can be a useful tool for efficient computations.
- NumPy
- Pandas
- Dask
- SciPy
When working with large datasets that do not fit into memory, the Python library "Dask" is a useful tool for efficient computations. Dask provides parallel and distributed computing capabilities, enabling data scientists to handle larger-than-memory datasets using familiar Python tools.
Which layer type in a neural network is primarily responsible for feature extraction and spatial hierarchy?
- Input Layer
- Convolutional Layer
- Fully Connected Layer
- Recurrent Layer
Convolutional Layers in neural networks are responsible for feature extraction and learning spatial hierarchies, making them crucial in tasks such as image recognition. They apply filters to the input data, capturing different features.
In time-series data, creating lag features involves using previous time steps as new _______.
- Predictors
- Observations
- Predictions
- Variables
In time-series analysis, creating lag features means using previous time steps (observations) as new data points. This allows you to incorporate historical information into your model, which can be valuable for forecasting future values in time series data.
Which CNN architecture is known for its residual connections and improved training performance?
- LeNet
- VGGNet
- AlexNet
- ResNet
Residual Networks (ResNets) are known for their residual connections, which allow for easier training of very deep networks. ResNets have become a standard in deep learning due to their ability to mitigate the vanishing gradient problem, enabling the training of much deeper architectures.
In the context of outlier detection, what is the commonly used plot to visually detect outliers in a single variable?
- Box Plot
- Scatter Plot
- Histogram
- Line Chart
A Box Plot is a commonly used visualization for detecting outliers in a single variable. It displays the distribution of data and identifies potential outliers based on the interquartile range (IQR). Data points outside the whiskers of the box plot are often considered outliers. Box plots are useful for identifying data anomalies.
Which step in the Data Science Life Cycle is concerned with cleaning the data and handling missing values?
- Data Exploration
- Data Collection
- Data Preprocessing
- Data Visualization
Data Preprocessing is the step in the Data Science Life Cycle that involves cleaning the data, handling missing values, and preparing it for analysis. This step is crucial for ensuring the quality and reliability of the data used in subsequent analysis.