This introductory textbook in undergraduate probability emphasizes the inseparability between data (computing) and probability (theory) in our time. It examines the motivation, intuition, and implication of the probabilistic tools used in science and engineering:
- Motivation: In the ocean of mathematical definitions, theorems, and equations, why should we spend our time on this particular topic but not another?
- Intuition: When going through the deviations, is there a geometric interpretation or physics beyond those equations?
- Implication: After we have learned a topic, what new problems can we solve?
Conditions of Use
This book is licensed under a Creative Commons License (CC BY-NC-SA). You can download the ebook Introduction to Probability for Data Science for free.
- Title
- Introduction to Probability for Data Science
- Publisher
- Michigan Publishing Services
- Author(s)
- Stanley Chan
- Published
- 2021-11-05
- Edition
- 1
- Format
- eBook (pdf, epub, mobi)
- Pages
- 704
- Language
- English
- ISBN-10
- 1607857464
- ISBN-13
- 9781607857464
- License
- CC BY-NC-SA
- Book Homepage
- Free eBook, Errata, Code, Solutions, etc.
Mathematical Background Infinite Series Geometric Series Binomial Series Approximation Taylor approximation Exponential series Logarithmic approximation Integration Odd and even functions Fundamental Theorem of Calculus Linear Algebra Why do we need linear algebra in data science? Everything you need to know about linear algebra Inner products and norms Matrix calculus Basic Combinatorics Birthday paradox Permutation Combination Summary Reference Problems Probability Set Theory Why study set theory? Basic concepts of a set Subsets Empty set and universal set Union Intersection Complement and difference Disjoint and partition Set operations Closing remarks about set theory Probability Space Sample space Event space F Probability law P Measure zero sets Summary of the probability space Axioms of Probability Why these three probability axioms? Axioms through the lens of measure Corollaries derived from the axioms Conditional Probability Definition of conditional probability Independence Bayes' theorem and the law of total probability The Three Prisoners problem Summary References Problems Discrete Random Variables Random Variables A motivating example Definition of a random variable Probability measure on random variables Probability Mass Function Definition of probability mass function PMF and probability measure Normalization property PMF versus histogram Estimating histograms from real data Cumulative Distribution Functions (Discrete) Definition of the cumulative distribution function Properties of the CDF Converting between PMF and CDF Expectation Definition of expectation Existence of expectation Properties of expectation Moments and variance Common Discrete Random Variables Bernoulli random variable Binomial random variable Geometric random variable Poisson random variable Summary References Problems Continuous Random Variables Probability Density Function Some intuitions about probability density functions More in-depth discussion about PDFs Connecting with the PMF Expectation, Moment, and Variance Definition and properties Existence of expectation Moment and variance Cumulative Distribution Function CDF for continuous random variables Properties of CDF Retrieving PDF from CDF CDF: Unifying discrete and continuous random variables Median, Mode, and Mean Median Mode Mean Uniform and Exponential Random Variables Uniform random variables Exponential random variables Origin of exponential random variables Applications of exponential random variables Gaussian Random Variables Definition of a Gaussian random variable Standard Gaussian Skewness and kurtosis Origin of Gaussian random variables Functions of Random Variables General principle Examples Generating Random Numbers General principle Examples Summary Reference Problems Joint Distributions Joint PMF and Joint PDF Probability measure in 2D Discrete random variables Continuous random variables Normalization Marginal PMF and marginal PDF Independent random variables Joint CDF Joint Expectation Definition and interpretation Covariance and correlation coefficient Independence and correlation Computing correlation from data Conditional PMF and PDF Conditional PMF Conditional PDF Conditional Expectation Definition The law of total expectation Sum of Two Random Variables Intuition through convolution Main result Sum of common distributions Random Vectors and Covariance Matrices PDF of random vectors Expectation of random vectors Covariance matrix Multidimensional Gaussian Transformation of Multidimensional Gaussians Linear transformation of mean and covariance Eigenvalues and eigenvectors Covariance matrices are always positive semi-definite Gaussian whitening Principal-Component Analysis The main idea: Eigendecomposition The eigenface problem What cannot be analyzed by PCA? Summary References Problems Sample Statistics Moment-Generating and Characteristic Functions Moment-generating function Sum of independent variables via MGF Characteristic functions Probability Inequalities Union bound The Cauchy-Schwarz inequality Jensen's inequality Markov's inequality Chebyshev's inequality Chernoff's bound Comparing Chernoff and Chebyshev Hoeffding's inequality Law of Large Numbers Sample average Weak law of large numbers (WLLN) Convergence in probability Can we prove WLLN using Chernoff's bound? Does the weak law of large numbers always hold? Strong law of large numbers Almost sure convergence Proof of the strong law of large numbers Central Limit Theorem Convergence in distribution Central Limit Theorem Examples Limitation of the Central Limit Theorem Summary References Problems Regression Principles of Regression Intuition: How to fit a straight line? Solving the linear regression problem Extension: Beyond a straight line Overdetermined and underdetermined systems Robust linear regression Overfitting Overview of overfitting Analysis of the linear case Interpreting the linear analysis results Bias and Variance Trade-Off Decomposing the testing error Analysis of the bias Variance Bias and variance on the learning curve Regularization Ridge regularization LASSO regularization Summary References Problems Estimation Maximum-Likelihood Estimation Likelihood function Maximum-likelihood estimate Application 1: Social network analysis Application 2: Reconstructing images More examples of ML estimation Regression versus ML estimation Properties of ML Estimates Estimators Unbiased estimators Consistent estimators Invariance principle Maximum A Posteriori Estimation The trio of likelihood, prior, and posterior Understanding the priors MAP formulation and solution Analyzing the MAP solution Analysis of the posterior distribution Conjugate prior Linking MAP with regression Minimum Mean-Square Estimation Positioning the minimum mean-square estimation Mean squared error MMSE estimate = conditional expectation MMSE estimator for multidimensional Gaussian Linking MMSE and neural networks Summary References Problems Confidence and Hypothesis Confidence Interval The randomness of an estimator Understanding confidence intervals Constructing a confidence interval Properties of the confidence interval Student's t-distribution Comparing Student's t-distribution and Gaussian Bootstrapping A brute force approach Bootstrapping Hypothesis Testing What is a hypothesis? Critical-value test p-value test Z-test and T-test Neyman-Pearson Test Null and alternative distributions Type 1 and type 2 errors Neyman-Pearson decision ROC and Precision-Recall Curve Receiver Operating Characteristic (ROC) Comparing ROC curves The ROC curve in practice The Precision-Recall (PR) curve Summary Reference Problems Random Processes Basic Concepts Everything you need to know about a random process Statistical and temporal perspectives Mean and Correlation Functions Mean function Autocorrelation function Independent processes Wide-Sense Stationary Processes Definition of a WSS process Properties of RX() Physical interpretation of RX() Power Spectral Density Basic concepts Origin of the power spectral density WSS Process through LTI Systems Review of linear time-invariant systems Mean and autocorrelation through LTI Systems Power spectral density through LTI systems Cross-correlation through LTI Systems Optimal Linear Filter Discrete-time random processes Problem formulation Yule-Walker equation Linear prediction Wiener filter Summary Appendix The Mean-Square Ergodic Theorem References Problems Appendix
Related Books