The fundamental mathematical tools needed to understand machine learning include linear algebra, analytic geometry, matrix decompositions, vector calculus, optimization, probability and statistics. These topics are traditionally taught in disparate courses, making it hard for data science or computer science students, or professionals, to efficiently learn the mathematics. This self contained textbook bridges the gap between mathematical and machine learning texts, introducing the mathematical concepts with a minimum of prerequisites. It uses these concepts to derive four central machine learning methods: linear regression, principal component analysis, Gaussian mixture models and support vector machines. For students and others with a mathematical background, these derivations provide a starting point to machine learning texts. For those learning the mathematics for the first time, the methods help build intuition and practical experience with applying mathematical concepts. Every chapter includes worked examples and exercises to test understanding. Programming tutorials are offered on the book's web site.
Conditions of Use
This book is licensed under a Creative Commons License (CC BY-NC-SA). You can download the ebook Mathematics for Machine Learning for free.
- Title
- Mathematics for Machine Learning
- Publisher
- Cambridge University Press
- Author(s)
- A. Aldo Faisal, Cheng Soon Ong, Marc Peter Deisenroth
- Published
- 2020-04-23
- Edition
- 1
- Format
- eBook (pdf, epub, mobi)
- Pages
- 390
- Language
- English
- ISBN-10
- 110845514X
- ISBN-13
- 9781108455145
- License
- CC BY-NC-SA
- Book Homepage
- Free eBook, Errata, Code, Solutions, etc.
Foreword Part I Mathematical Foundations 1 Introduction and Motivation 1.1 Finding Words for Intuitions 1.2 Two Ways to Read This Book 1.3 Exercises and Feedback 2 Linear Algebra 2.1 Systems of Linear Equations 2.2 Matrices 2.3 Solving Systems of Linear Equations 2.4 Vector Spaces 2.5 Linear Independence 2.6 Basis and Rank 2.7 Linear Mappings 2.8 Affine Spaces 2.9 Further Reading Exercises 3 Analytic Geometry 3.1 Norms 3.2 Inner Products 3.3 Lengths and Distances 3.4 Angles and Orthogonality 3.5 Orthonormal Basis 3.6 Orthogonal Complement 3.7 Inner Product of Functions 3.8 Orthogonal Projections 3.9 Rotations 3.10 Further Reading Exercises 4 Matrix Decompositions 4.1 Determinant and Trace 4.2 Eigenvalues and Eigenvectors 4.3 Cholesky Decomposition 4.4 Eigendecomposition and Diagonalization 4.5 Singular Value Decomposition 4.6 Matrix Approximation 4.7 Matrix Phylogeny 4.8 Further Reading Exercises 5 Vector Calculus 5.1 Differentiation of Univariate Functions 5.2 Partial Differentiation and Gradients 5.3 Gradients of Vector-Valued Functions 5.4 Gradients of Matrices 5.5 Useful Identities for Computing Gradients 5.6 Backpropagation and Automatic Differentiation 5.7 Higher-Order Derivatives 5.8 Linearization and Multivariate Taylor Series 5.9 Further Reading Exercises 6 Probability and Distributions 6.1 Construction of a Probability Space 6.2 Discrete and Continuous Probabilities 6.3 Sum Rule, Product Rule, and Bayes' Theorem 6.4 Summary Statistics and Independence 6.5 Gaussian Distribution 6.6 Conjugacy and the Exponential Family 6.7 Change of Variables/Inverse Transform 6.8 Further Reading Exercises 7 Continuous Optimization 7.1 Optimization Using Gradient Descent 7.2 Constrained Optimization and Lagrange Multipliers 7.3 Convex Optimization 7.4 Further Reading Exercises Part II Central Machine Learning Problems 8 When Models Meet Data 8.1 Data, Models, and Learning 8.2 Empirical Risk Minimization 8.3 Parameter Estimation 8.4 Probabilistic Modeling and Inference 8.5 Directed Graphical Models 8.6 Model Selection 9 Linear Regression 9.1 Problem Formulation 9.2 Parameter Estimation 9.3 Bayesian Linear Regression 9.4 Maximum Likelihood as Orthogonal Projection 9.5 Further Reading 10 Dimensionality Reduction with Principal Component Analysis 10.1 Problem Setting 10.2 Maximum Variance Perspective 10.3 Projection Perspective 10.4 Eigenvector Computation and Low-Rank Approximations 10.5 PCA in High Dimensions 10.6 Key Steps of PCA in Practice 10.7 Latent Variable Perspective 10.8 Further Reading 11 Density Estimation with Gaussian Mixture Models 11.1 Gaussian Mixture Model 11.2 Parameter Learning via Maximum Likelihood 11.3 EM Algorithm 11.4 Latent-Variable Perspective 11.5 Further Reading 12 Classification with Support Vector Machines 12.1 Separating Hyperplanes 12.2 Primal Support Vector Machine 12.3 Dual Support Vector Machine 12.4 Kernels 12.5 Numerical Solution 12.6 Further Reading References Index