Mathematics for Machine Learning

The fundamental mathematical tools needed to understand machine learning include linear algebra, analytic geometry, matrix decompositions, vector calculus, optimization, probability and statistics. These topics are traditionally taught in disparate courses, making it hard for data science or computer science students, or professionals, to efficiently learn the mathematics. This self contained textbook bridges the gap between mathematical and machine learning texts, introducing the mathematical concepts with a minimum of prerequisites. It uses these concepts to derive four central machine learning methods: linear regression, principal component analysis, Gaussian mixture models and support vector machines. For students and others with a mathematical background, these derivations provide a starting point to machine learning texts. For those learning the mathematics for the first time, the methods help build intuition and practical experience with applying mathematical concepts. Every chapter includes worked examples and exercises to test understanding. Programming tutorials are offered on the book's web site.

Conditions of Use

This book is licensed under a Creative Commons License (CC BY-NC-SA). You can download the ebook Mathematics for Machine Learning for free.

Title: Mathematics for Machine Learning
Publisher: Cambridge University Press
Author(s): A. Aldo Faisal, Cheng Soon Ong, Marc Peter Deisenroth
Published: 2020-04-23
Edition: 1
Format: eBook (pdf, epub, mobi)
Pages: 390
Language: English
ISBN-10: 110845514X
ISBN-13: 9781108455145
License: CC BY-NC-SA
Book Homepage: Free eBook, Errata, Code, Solutions, etc.

Foreword
Part I Mathematical Foundations
	1 Introduction and Motivation
		1.1 Finding Words for Intuitions
		1.2 Two Ways to Read This Book
		1.3 Exercises and Feedback
	2 Linear Algebra
		2.1 Systems of Linear Equations
		2.2 Matrices
		2.3 Solving Systems of Linear Equations
		2.4 Vector Spaces
		2.5 Linear Independence
		2.6 Basis and Rank
		2.7 Linear Mappings
		2.8 Affine Spaces
		2.9 Further Reading
	Exercises
	3 Analytic Geometry
		3.1 Norms
		3.2 Inner Products
		3.3 Lengths and Distances
		3.4 Angles and Orthogonality
		3.5 Orthonormal Basis
		3.6 Orthogonal Complement
		3.7 Inner Product of Functions
		3.8 Orthogonal Projections
		3.9 Rotations
		3.10 Further Reading
	Exercises
	4 Matrix Decompositions
		4.1 Determinant and Trace
		4.2 Eigenvalues and Eigenvectors
		4.3 Cholesky Decomposition
		4.4 Eigendecomposition and Diagonalization
		4.5 Singular Value Decomposition
		4.6 Matrix Approximation
		4.7 Matrix Phylogeny
		4.8 Further Reading
	Exercises
	5 Vector Calculus
		5.1 Differentiation of Univariate Functions
		5.2 Partial Differentiation and Gradients
		5.3 Gradients of Vector-Valued Functions
		5.4 Gradients of Matrices
		5.5 Useful Identities for Computing Gradients
		5.6 Backpropagation and Automatic Differentiation
		5.7 Higher-Order Derivatives
		5.8 Linearization and Multivariate Taylor Series
		5.9 Further Reading
	Exercises
	6 Probability and Distributions
		6.1 Construction of a Probability Space
		6.2 Discrete and Continuous Probabilities
		6.3 Sum Rule, Product Rule, and Bayes' Theorem
		6.4 Summary Statistics and Independence
		6.5 Gaussian Distribution
		6.6 Conjugacy and the Exponential Family
		6.7 Change of Variables/Inverse Transform
		6.8 Further Reading
	Exercises
	7 Continuous Optimization
		7.1 Optimization Using Gradient Descent
		7.2 Constrained Optimization and Lagrange Multipliers
		7.3 Convex Optimization
		7.4 Further Reading
	Exercises
Part II Central Machine Learning Problems
	8 When Models Meet Data
		8.1 Data, Models, and Learning
		8.2 Empirical Risk Minimization
		8.3 Parameter Estimation
		8.4 Probabilistic Modeling and Inference
		8.5 Directed Graphical Models
		8.6 Model Selection
	9 Linear Regression
		9.1 Problem Formulation
		9.2 Parameter Estimation
		9.3 Bayesian Linear Regression
		9.4 Maximum Likelihood as Orthogonal Projection
		9.5 Further Reading
	10 Dimensionality Reduction with Principal Component Analysis
		10.1 Problem Setting
		10.2 Maximum Variance Perspective
		10.3 Projection Perspective
		10.4 Eigenvector Computation and Low-Rank Approximations
		10.5 PCA in High Dimensions
		10.6 Key Steps of PCA in Practice
		10.7 Latent Variable Perspective
		10.8 Further Reading
	11 Density Estimation with Gaussian Mixture Models
		11.1 Gaussian Mixture Model
		11.2 Parameter Learning via Maximum Likelihood
		11.3 EM Algorithm
		11.4 Latent-Variable Perspective
		11.5 Further Reading
	12 Classification with Support Vector Machines
		12.1 Separating Hyperplanes
		12.2 Primal Support Vector Machine
		12.3 Dual Support Vector Machine
		12.4 Kernels
		12.5 Numerical Solution
		12.6 Further Reading
	References
	Index