The Orange Book of Machine Learning

This book is based on my five day course which I had the pleasure of teaching in the following Spanish cities: A Coruña, Algeciras, Alicante, Bilbao, Cáceres, Granada, Huesca, Jaén, Madrid, Málaga, Murcia, Sevilla, Valencia, Valladolid, and Zaragoza. I would like to take this opportunity to say a big thank you to all of my students ¡un gran placer!

This book uses the python programming language largely in conjunction with the scikit-learn machine learning library, and pandas for data manipulation. All example notebooks use the Jupyter Notebook environment.

Conditions of Use

This book is licensed under a Creative Commons License (CC BY). You can download the ebook The Orange Book of Machine Learning for free.

Title: The Orange Book of Machine Learning
Subtitle: The essentials of making predictions using supervised regression and classification for tabular data
Publisher: Leanpub
Author(s): Carl McBride Ellis
Published: 2024-07-05
Edition: 1
Format: eBook (pdf, epub, mobi)
Pages: 135
Language: English
License: CC BY
Book Homepage: Free eBook, Errata, Code, Solutions, etc.

1 Introduction
	1.1 The and the
	1.2 Interpolation and curve fitting
	1.3 Errors and residuals
	1.4 Sources of uncertainty: aleatoric and epistemic
	1.5 Confidence and prediction intervals
	1.6 Explainability and interpretability
2 Statistics
	2.1 Centrality: Mean, median, and mode
	2.2 Dispersion: Variance, MAD, and quartiles
		2.2.1 Quantiles, quartiles and the interquartile range (IQR)
	2.3 Gaussian distribution: additive
		2.3.1 Tests for normality
	2.4 Chebyshev's inequality
	2.5 Galton distribution: multiplicative
	2.6 Skewness and kurtosis
3 Exploratory data analysis (EDA)
	3.1 Data quality
	3.2 Getting to know your dataframe
		3.2.1 The curse of dimensionality
		3.2.2 Descriptive statistics
	3.3 Anscombe's quartet
	3.4 Box, violin and raincloud plots
	3.5 Outliers, inliers and extreme values
	3.6 Correlation coefficients
		3.6.1 Mutual information (MI)
	3.7 Scatter plot
	3.8 Histograms and eCDF
		3.8.1 Kolmogorov-Smirnov test
	3.9 Pairplots (or not)
4 Data cleaning
	4.1 Missing values: NULL and NaN
		4.1.1 Visualization of NaN with missingno
		4.1.2 MCAR, MAR, and MNAR
		4.1.3 Global fill
		4.1.4 Global delete
		4.1.5 Average value imputation
		4.1.6 Multiple imputation
		4.1.7 Do nothing!
		4.1.8 Binary indicator column
	4.2 Outliers and inliers
		4.2.1 Outliers
		4.2.2 Inliers: Isolation forest
	4.3 Duplicated rows
	4.4 Boolean columns
	4.5 Zero variance columns
	4.6 Feature scaling: standardization and normalization
	4.7 Categorical features
		4.7.1 Ordinal
		4.7.2 Nominal
5 Cross-validation
	5.1 Train test split
	5.2 Cross-validation
	5.3 Nested cross-validation
	5.4 Data leakage
	5.5 Covariate shift and Concept drift
6 Regression
	6.1 Regression baseline model
	6.2 Univariate linear regression
	6.3 Calculating 1 and
		6.3.1 Ordinary least squares
		6.3.2 Normal equation
		6.3.3 Scikit-learn LinearRegression
	6.4 Assumptions of linear regression
	6.5 Polynomial regression
	6.6 Extrapolation
		6.6.1 Convex hull
	6.7 Explainability
	6.8 The loss and cost functions
		6.8.1 Gradient descent
	6.9 Metrics
		6.9.1 Root mean square error (RMSE)
		6.9.2 Mean absolute error (MAE)
		6.9.3 The R2 metric
	6.10 Decision tree regressor
		6.10.1 Hyperparameter: max_depth
	6.11 Overfitting
		6.11.1 Parametric models: regularization
		6.11.2 Tree models: min_samples_leaf
	6.12 Quantile regression
		6.12.1 Pinball loss function
	6.13 Conformal prediction intervals
		6.13.1 Conformalized quantile regression (CQR)
		6.13.2 Locally-weighted conformal regression
		6.13.3 Prediction interval metric: Winkler interval score
	6.14 Summary
7 Classification
	7.1 Logistic regression
		7.1.1 Explainability
	7.2 Log-loss function
	7.3 Decision tree classifier
	7.4 Classification baseline model
	7.5 Classification metrics
		7.5.1 Strictly proper scoring rules
		7.5.2 Accuracy score
		7.5.3 Confusion matrix
		7.5.4 Precision and recall
		7.5.5 Decision threshold
		7.5.6 AUC ROC
	7.6 Imbalanced classification
		7.6.1 What to do about imbalanced data?
	7.7 Overfitting
	7.8 No free lunch theorem
	7.9 Classifier calibration
		7.9.1 Reliability diagrams
		7.9.2 Venn-ABERS calibration
	7.10 Multiclass classification
		7.10.1 Multiclass metrics
8 Ensemble estimators
	8.1 Random Forest
		8.1.1 Bootstrapping: row subsampling with replacement
		8.1.2 Feature subsampling
		8.1.3 Results
	8.2 Weak learners and boosting
		8.2.1 AdaBoost (Adaptive Boosting)
	8.3 Gradient boosted decision trees (GBDT)
		8.3.1 Extrapolation
	8.4 Convex combination of model predictions (CCMP)
	8.5 Stacking
9 Hyperparameter optimization
10 Feature engineering and selection
	10.1 Feature engineering
		10.1.1 Interaction and cross features
		10.1.2 Bucketing of continuous features
		10.1.3 Power transforms: Yeo-Johnson
		10.1.4 User defined transform
		10.1.5 External secondary features
	10.2 Feature selection
		10.2.1 Correlation
		10.2.2 Permutation importance
		10.2.3 Stepwise regression
		10.2.4 LASSO
		10.2.5 Boruta trick
		10.2.6 Native feature importance plots
	10.3 Principal component analysis (PCA)
11 Why no neural networks/deep learning?
	11.0.1 Single neuron regressor
	11.0.2 Single neuron classifier
Essential reading