What type of data is Spearman's correlation most suitable for?

Categorical data
Continuous, normally distributed data
Nominal data
Ordinal data

Spearman's correlation is most suitable for ordinal data. It assesses how well the relationship between two variables can be described using a monotonic function. Because it's based on ranks, it can be used with ordinal data, where the order is important but not the difference between values.

Discuss it

Regularization techniques like Ridge and Lasso can indirectly perform feature selection by assigning a _______ coefficient to irrelevant features.

Negative
Non-zero
Positive
Zero

Regularization techniques like Ridge and Lasso can indirectly perform feature selection by assigning a zero coefficient to irrelevant features. This is achieved by adding a penalty term to the loss function that encourages smaller or zero coefficients, effectively removing the irrelevant features from the model.

Discuss it

What is a correlation matrix and what is its primary purpose in Exploratory Data Analysis?

A graphical representation of the correlation between variables
A representation of missing values in the data
A representation of the data distribution
A visual representation of data clusters

A correlation matrix is a tabular data representing the correlations between pairs of variables. Each cell in the table shows the correlation between two variables. It's primary use in EDA is to understand the linear relationship between the variables.

Discuss it

How can histograms be used to detect outliers?

Outliers are represented by bars that are far away from others
Outliers are represented by the shortest bars
Outliers are represented by the tallest bars
Outliers cannot be detected with histograms

In a histogram, outliers can often be represented by bars that are noticeably separated from the rest of the data distribution.

Discuss it

You have a dataset where a few outliers are caused due to measurement errors. Which method would be appropriate for handling these outliers?

Binning
Removal
Transformation
nan

Outliers due to measurement errors do not provide meaningful information and might mislead the analysis, hence removal would be appropriate in this case.

Discuss it

You have found that your dataset has a high degree of multicollinearity. What steps would you consider to rectify this issue?

Add more data points
Increase the model bias
Increase the model complexity
Use Principal Component Analysis (PCA)

One way to rectify multicollinearity is to use Principal Component Analysis (PCA). PCA transforms the original variables into a new set of uncorrelated variables, thereby removing multicollinearity.

Discuss it

Which of the following best describes qualitative data?

Data that can be categorized
Data that can be ordered
Data that can take any value
Data that is numerical in nature

Qualitative data refers to non-numerical information that can be categorized based on traits and characteristics. It captures information that cannot be simply expressed in numbers.

Discuss it

In the context of EDA, what does the concept of "data wrangling" entail?

Calculating descriptive statistics for the dataset
Cleaning, transforming, and reshaping raw data
Training and validating a machine learning model
Visualizing the data using charts and graphs

In the context of EDA, "data wrangling" involves cleaning, transforming, and reshaping raw data. This could include dealing with missing or inconsistent data, transforming variables, or restructuring data frames for easier analysis.

Discuss it

Which library would you typically use for creating 3D plots in Python?

Matplotlib
Pandas
Plotly
Seaborn

Matplotlib has a toolkit 'mplot3d' which is used for creating 3D plots. It provides functions for plotting in three dimensions, making it versatile for a variety of 3D plots.

Discuss it

You have a dataset that follows a Uniform Distribution. You are asked to transform this data so it follows a Normal Distribution. How would you approach this task?

By adding a constant to each value in the dataset
By applying the Central Limit Theorem
By normalizing the dataset using min-max normalization
By squaring each value in the dataset

A Uniform Distribution can be approximated to a Normal Distribution by the application of the Central Limit Theorem, which states that the sum of a large number of independent and identically distributed variables, irrespective of their shape, tends towards a Normal Distribution.

Discuss it