Which NLP model captures the context of words by representing them as vectors?
- Word2Vec
- Regular Expressions
- Decision Trees
- Linear Regression
Word2Vec is a widely used NLP model that captures word context by representing words as vectors in a continuous space. It preserves the semantic meaning of words, making it a powerful tool for various NLP tasks like word embeddings and text analysis. The other options are not NLP models and do not capture word context in the same way.
The technique where spatial transformations are applied to input images to boost the performance and versatility of models is called _______ in computer vision.
- Edge Detection
- Data Augmentation
- Optical Flow
- Feature Extraction
Data augmentation involves applying spatial transformations to input images, such as rotation, flipping, or cropping, to increase the diversity of the training data. This technique enhances model generalization and performance.
In the context of recommender systems, what is the primary challenge addressed by matrix factorization techniques?
- Cold start problem
- Sparsity problem
- Scalability problem
- User diversity problem
Matrix factorization techniques primarily address the sparsity problem in recommender systems. In such systems, user-item interaction data is typically sparse, and matrix factorization helps in predicting missing values by factoring the observed data matrix into latent factors. This mitigates the sparsity challenge.
One of the most popular algorithms used in collaborative filtering for recommender systems is _______.
- Apriori Algorithm
- K-Means Algorithm
- Singular Value Decomposition
- Naive Bayes Algorithm
One of the most popular algorithms used in collaborative filtering for recommender systems is Singular Value Decomposition (SVD). SVD is a matrix factorization technique that can be used to make recommendations based on user-item interactions.
You're tasked with performing real-time analysis on streaming data. Which programming language or tool would be most suited for this task due to its performance capabilities and extensive libraries?
- Python
- R
- Java
- Apache Spark
For real-time analysis on streaming data, Apache Spark is a powerful tool. It provides excellent performance capabilities and extensive libraries for stream processing, making it suitable for handling and analyzing large volumes of data in real-time.
In a normal distribution, approximately 95% of the data falls within _______ standard deviations of the mean.
- One
- Two
- Three
- Four
In a normal distribution, approximately 95% of the data falls within two standard deviations of the mean. This is a fundamental property of the normal distribution, as specified by the Empirical Rule or the 68-95-99.7 rule, which describes the percentage of data within one, two, and three standard deviations of the mean.
The process of using only the architecture of a pre-trained model and retraining it entirely with new data is known as _______ in transfer learning.
- Fine-tuning
- Warm-starting
- Model augmentation
- Zero initialization
Fine-tuning in transfer learning involves taking a pre-trained model's architecture and training it with new data, adjusting the model's parameters to suit the specific task. It's a common technique for leveraging pre-trained models for custom tasks.
What does the ROC in AUC-ROC stand for?
- Receiver
- Receiver Operating
- Receiver of
- Receiver Characteristics
AUC-ROC stands for Area Under the Receiver Operating Characteristic curve. The ROC curve is a graphical representation of a model's performance, particularly its ability to distinguish between the positive and negative classes. AUC (Area Under the Curve) quantifies the overall performance of the model, with higher AUC values indicating better discrimination.
In complex ETL processes, _________ can be used to ensure data quality and accuracy throughout the pipeline.
- Data modeling
- Data lineage
- Data profiling
- Data visualization
In complex ETL (Extract, Transform, Load) processes, "Data lineage" is crucial for ensuring data quality and accuracy. Data lineage helps track the origin and transformation of data, ensuring that the data remains reliable and traceable throughout the pipeline.
When normalizing a database in SQL, separating data into two tables and creating a new primary and foreign key relationship is part of the _______ normal form.
- First
- Second
- Third
- Fourth
When normalizing a database, creating a new primary and foreign key relationship by separating data into two tables is part of the Second Normal Form (2NF). 2NF eliminates partial dependencies and ensures that every non-key attribute is functionally dependent on the entire primary key. This is an essential step in achieving a fully normalized database.
What is one major drawback of using the sigmoid activation function in deep networks?
- Prone to vanishing gradient
- Limited to binary classification
- Efficiently handles negative values
- Non-smooth gradient behavior
One major drawback of using the sigmoid activation function in deep networks is its susceptibility to the vanishing gradient problem. This can hinder training deep networks as the gradient becomes very small for extreme values, slowing down learning.
Which activation function is commonly used in the output layer of a binary classification neural network?
- ReLU (Rectified Linear Activation)
- Sigmoid Activation
- Tanh (Hyperbolic Tangent) Activation
- Softmax Activation
The Sigmoid activation function is commonly used in the output layer of a binary classification neural network. It maps the network's output to a probability between 0 and 1, making it suitable for binary classification tasks. The other activation functions are more commonly used in hidden layers or for other types of problems.