You are working on a project where you need to predict the next word in a sentence. Which type of neural network architecture would be most suitable for this task?

Convolutional Neural Network (CNN)
Recurrent Neural Network (RNN)
Long Short-Term Memory (LSTM) Network
Generative Adversarial Network (GAN)

Predicting the next word in a sentence is a sequential data problem, making it suitable for recurrent neural networks. LSTMs are particularly effective for this task as they can capture long-term dependencies in the data, which is essential for predicting words in a sentence.

Discuss it

In the realm of Data Science, the library _______ in Python is widely used for data manipulation and cleaning.

TensorFlow
Pandas
Matplotlib
Scikit-learn

Pandas is a popular Python library for data manipulation and cleaning. It provides data structures and functions for working with structured data, making it a valuable tool in data science, which makes option B the correct answer.

Discuss it

The method where data values are shifted and rescaled to range between 0 and 1 is called _______.

Data Normalization
Data Imputation
Data Resampling
Data Transformation

The method of shifting and rescaling data values to range between 0 and 1 is known as "data normalization." This is commonly used in machine learning to ensure that all features have the same scale, preventing certain features from dominating others.

Discuss it

The _______ typically works closely with business stakeholders to understand their requirements and translate them into data-driven insights.

Data Scientist
Data Analyst
Data Engineer
Business Analyst

Data Scientists often work closely with business stakeholders to understand their requirements and translate them into data-driven insights. They use statistical and analytical techniques to derive insights that support decision-making.

Discuss it

In deep learning, the technique used to skip one or more layers by connecting non-adjacent layers is called _______.

Dropout
Batch Normalization
Skip Connections
Pooling

In deep learning, the technique used to skip one or more layers by connecting non-adjacent layers is called "Skip Connections." Skip connections allow the model to bypass one or more layers and facilitate the flow of information from one layer to another, helping in the training of deep neural networks.

Discuss it

Which of the following tools is typically used to manage and query relational databases in Data Science?

Excel
Hadoop
SQL (Structured Query Language)
Tableau

SQL (Structured Query Language) is a standard tool used for managing and querying relational databases. Data scientists frequently use SQL to extract, manipulate, and analyze data from these databases, making it an essential skill for working with structured data.

Discuss it

You're working on a real estate dataset where the price of the house is significantly influenced by its age and square footage. To capture this combined effect, what type of new feature could you create?

Interaction feature
Categorical feature with age groups
Time-series feature
Ordinal feature

To capture the combined effect of age and square footage on house price, you can create an interaction feature. This feature multiplies or combines the two variables to represent their interaction, allowing the model to consider how they jointly affect the target variable. An interaction feature is valuable in regression models.

Discuss it

In a traditional relational database, the data stored in a tabular format is often referred to as _______ data.

Structured Data
Unstructured Data
Semi-Structured Data
Raw Data

In a traditional relational database, the data is structured and organized in tables with a predefined schema. It's commonly referred to as "Structured Data" because it adheres to a strict structure and schema.

Discuss it

In which type of learning does the model discover patterns or structures without any prior labeling of data?

Supervised Learning
Unsupervised Learning
Semi-Supervised Learning
Reinforcement Learning

Unsupervised Learning is the type where the model discovers patterns or structures without prior data labeling. Common tasks in this category include clustering and dimensionality reduction, helping find hidden insights in data without any guidance.

Discuss it

For time-series data, which variation of gradient boosting might be more appropriate?

XGBoost
AdaBoost
LightGBM
Random Forest

Time-series data often has specific characteristics, such as seasonality and trends. LightGBM is well-suited for such data as it can handle categorical features efficiently and is capable of capturing complex patterns, making it a strong choice for time-series forecasting.

Discuss it