Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Deep learning has revolutionized pattern recognition, introducing tools that power a wide range of technologies in such diverse fields as computer vision, natural language processing, and automatic speech recognition. Applying deep learning requires you to simultaneously understand how to cast a problem, the basic mathematics of modeling, the algorithms for fitting your models to data, and the engineering techniques to implement it all. This book is a comprehensive resource that makes deep learning approachable, while still providing sufficient technical depth to enable engineers, scientists, and students to use deep learning in their own work. No previous background in machine learning or deep learning is required―every concept is explained from scratch and the appendix provides a refresher on the mathematics needed. Runnable code is featured throughout, allowing you to develop your own intuition by putting key ideas into practice.
Conditions of Use
This book is licensed under a Creative Commons License (CC BY-NC-SA). You can download the ebook Dive into Deep Learning for free.
- Title
- Dive into Deep Learning
- Publisher
- Cambridge University Press
- Author(s)
- Alexander J. Smola, Aston Zhang, Mu Li, Zachary C. Lipton
- Published
- 2024-02-01
- Edition
- 1
- Format
- eBook (pdf, epub, mobi)
- Pages
- 574
- Language
- English
- ISBN-10
- 1009389432
- ISBN-13
- 9781009389433
- License
- CC BY-NC-SA
- Book Homepage
- Free eBook, Errata, Code, Solutions, etc.
Preface Installation Notation Introduction A Motivating Example Key Components Data Models Objective Functions Optimization Algorithms Kinds of Machine Learning Problems Supervised Learning Regression Classification Tagging Search Recommender Systems Sequence Learning Unsupervised and Self-Supervised Learning Interacting with an Environment Reinforcement Learning Roots The Road to Deep Learning Success Stories The Essence of Deep Learning Summary Exercises Preliminaries Data Manipulation Getting Started Indexing and Slicing Operations Broadcasting Saving Memory Conversion to Other Python Objects Summary Exercises Data Preprocessing Reading the Dataset Data Preparation Conversion to the Tensor Format Discussion Exercises Linear Algebra Scalars Vectors Matrices Tensors Basic Properties of Tensor Arithmetic Reduction Non-Reduction Sum Dot Products Matrix–Vector Products Matrix–Matrix Multiplication Norms Discussion Exercises Calculus Derivatives and Differentiation Visualization Utilities Partial Derivatives and Gradients Chain Rule Discussion Exercises Automatic Differentiation A Simple Function Backward for Non-Scalar Variables Detaching Computation Gradients and Python Control Flow Discussion Exercises Probability and Statistics A Simple Example: Tossing Coins A More Formal Treatment Random Variables Multiple Random Variables An Example Expectations Discussion Exercises Documentation Functions and Classes in a Module Specific Functions and Classes Linear Neural Networks for Regression Linear Regression Basics Model Loss Function Analytic Solution Minibatch Stochastic Gradient Descent Predictions Vectorization for Speed The Normal Distribution and Squared Loss Linear Regression as a Neural Network Biology Summary Exercises Object-Oriented Design for Implementation Utilities Models Data Training Summary Exercises Synthetic Regression Data Generating the Dataset Reading the Dataset Concise Implementation of the Data Loader Summary Exercises Linear Regression Implementation from Scratch Defining the Model Defining the Loss Function Defining the Optimization Algorithm Training Summary Exercises Concise Implementation of Linear Regression Defining the Model Defining the Loss Function Defining the Optimization Algorithm Training Summary Exercises Generalization Training Error and Generalization Error Model Complexity Underfitting or Overfitting? Polynomial Curve Fitting Dataset Size Model Selection Cross-Validation Summary Exercises Weight Decay Norms and Weight Decay High-Dimensional Linear Regression Implementation from Scratch Defining 2 Norm Penalty Defining the Model Training without Regularization Using Weight Decay Concise Implementation Summary Exercises Linear Neural Networks for Classification Softmax Regression Classification Linear Model The Softmax Vectorization Loss Function Log-Likelihood Softmax and Cross-Entropy Loss Information Theory Basics Entropy Surprisal Cross-Entropy Revisited Summary and Discussion Exercises The Image Classification Dataset Loading the Dataset Reading a Minibatch Visualization Summary Exercises The Base Classification Model The Classifier Class Accuracy Summary Exercises Softmax Regression Implementation from Scratch The Softmax The Model The Cross-Entropy Loss Training Prediction Summary Exercises Concise Implementation of Softmax Regression Defining the Model Softmax Revisited Training Summary Exercises Generalization in Classification The Test Set Test Set Reuse Statistical Learning Theory Summary Exercises Environment and Distribution Shift Types of Distribution Shift Covariate Shift Label Shift Concept Shift Examples of Distribution Shift Medical Diagnostics Self-Driving Cars Nonstationary Distributions More Anecdotes Correction of Distribution Shift Empirical Risk and Risk Covariate Shift Correction Label Shift Correction Concept Shift Correction A Taxonomy of Learning Problems Batch Learning Online Learning Bandits Control Reinforcement Learning Considering the Environment Fairness, Accountability, and Transparency in Machine Learning Summary Exercises Multilayer Perceptrons Multilayer Perceptrons Hidden Layers Limitations of Linear Models Incorporating Hidden Layers From Linear to Nonlinear Universal Approximators Activation Functions ReLU Function Sigmoid Function Tanh Function Summary and Discussion Exercises Implementation of Multilayer Perceptrons Implementation from Scratch Initializing Model Parameters Model Training Concise Implementation Model Training Summary Exercises Forward Propagation, Backward Propagation, and Computational Graphs Forward Propagation Computational Graph of Forward Propagation Backpropagation Training Neural Networks Summary Exercises Numerical Stability and Initialization Vanishing and Exploding Gradients Vanishing Gradients Exploding Gradients Breaking the Symmetry Parameter Initialization Default Initialization Xavier Initialization Beyond Summary Exercises Generalization in Deep Learning Revisiting Overfitting and Regularization Inspiration from Nonparametrics Early Stopping Classical Regularization Methods for Deep Networks Summary Exercises Dropout Dropout in Practice Implementation from Scratch Defining the Model Training Concise Implementation Summary Exercises Predicting House Prices on Kaggle Downloading Data Kaggle Accessing and Reading the Dataset Data Preprocessing Error Measure K-Fold Cross-Validation Model Selection Submitting Predictions on Kaggle Summary and Discussion Exercises Builders' Guide Layers and Modules A Custom Module The Sequential Module Executing Code in the Forward Propagation Method Summary Exercises Parameter Management Parameter Access Targeted Parameters All Parameters at Once Tied Parameters Summary Exercises Parameter Initialization Built-in Initialization Custom Initialization Summary Exercises Lazy Initialization Summary Exercises Custom Layers Layers without Parameters Layers with Parameters Summary Exercises File I/O Loading and Saving Tensors Loading and Saving Model Parameters Summary Exercises GPUs Computing Devices Tensors and GPUs Storage on the GPU Copying Side Notes Neural Networks and GPUs Summary Exercises Convolutional Neural Networks From Fully Connected Layers to Convolutions Invariance Constraining the MLP Translation Invariance Locality Convolutions Channels Summary and Discussion Exercises Convolutions for Images The Cross-Correlation Operation Convolutional Layers Object Edge Detection in Images Learning a Kernel Cross-Correlation and Convolution Feature Map and Receptive Field Summary Exercises Padding and Stride Padding Stride Summary and Discussion Exercises Multiple Input and Multiple Output Channels Multiple Input Channels Multiple Output Channels 11 Convolutional Layer Discussion Exercises Pooling Maximum Pooling and Average Pooling Padding and Stride Multiple Channels Summary Exercises Convolutional Neural Networks (LeNet) LeNet Training Summary Exercises Modern Convolutional Neural Networks Deep Convolutional Neural Networks (AlexNet) Representation Learning Missing Ingredient: Data Missing Ingredient: Hardware AlexNet Architecture Activation Functions Capacity Control and Preprocessing Training Discussion Exercises Networks Using Blocks (VGG) VGG Blocks VGG Network Training Summary Exercises Network in Network (NiN) NiN Blocks NiN Model Training Summary Exercises Multi-Branch Networks (GoogLeNet) Inception Blocks GoogLeNet Model Training Discussion Exercises Batch Normalization Training Deep Networks Batch Normalization Layers Fully Connected Layers Convolutional Layers Layer Normalization Batch Normalization During Prediction Implementation from Scratch LeNet with Batch Normalization Concise Implementation Discussion Exercises Residual Networks (ResNet) and ResNeXt Function Classes Residual Blocks ResNet Model Training ResNeXt Summary and Discussion Exercises Densely Connected Networks (DenseNet) From ResNet to DenseNet Dense Blocks Transition Layers DenseNet Model Training Summary and Discussion Exercises Designing Convolution Network Architectures The AnyNet Design Space Distributions and Parameters of Design Spaces RegNet Training Discussion Exercises Recurrent Neural Networks Working with Sequences Autoregressive Models Sequence Models Markov Models The Order of Decoding Training Prediction Summary Exercises Converting Raw Text into Sequence Data Reading the Dataset Tokenization Vocabulary Putting It All Together Exploratory Language Statistics Summary Exercises Language Models Learning Language Models Markov Models and n-grams Word Frequency Laplace Smoothing Perplexity Partitioning Sequences Summary and Discussion Exercises Recurrent Neural Networks Neural Networks without Hidden States Recurrent Neural Networks with Hidden States RNN-Based Character-Level Language Models Summary Exercises Recurrent Neural Network Implementation from Scratch RNN Model RNN-Based Language Model One-Hot Encoding Transforming RNN Outputs Gradient Clipping Training Decoding Summary Exercises Concise Implementation of Recurrent Neural Networks Defining the Model Training and Predicting Summary Exercises Backpropagation Through Time Analysis of Gradients in RNNs Full Computation Truncating Time Steps Randomized Truncation Comparing Strategies Backpropagation Through Time in Detail Summary Exercises Modern Recurrent Neural Networks Long Short-Term Memory (LSTM) Gated Memory Cell Gated Hidden State Input Gate, Forget Gate, and Output Gate Input Node Memory Cell Internal State Hidden State Implementation from Scratch Initializing Model Parameters Training and Prediction Concise Implementation Summary Exercises Gated Recurrent Units (GRU) Reset Gate and Update Gate Candidate Hidden State Hidden State Implementation from Scratch Initializing Model Parameters Defining the Model Training Concise Implementation Summary Exercises Deep Recurrent Neural Networks Implementation from Scratch Concise Implementation Summary Exercises Bidirectional Recurrent Neural Networks Implementation from Scratch Concise Implementation Summary Exercises Machine Translation and the Dataset Downloading and Preprocessing the Dataset Tokenization Loading Sequences of Fixed Length Reading the Dataset Summary Exercises The Encoder-Decoder Architecture Encoder Decoder Putting the Encoder and Decoder Together Summary Exercises Sequence-to-Sequence Learning for Machine Translation Teacher Forcing Encoder Decoder Encoder–Decoder for Sequence-to-Sequence Learning Loss Function with Masking Training Prediction Evaluation of Predicted Sequences Summary Exercises Beam Search Greedy Search Exhaustive Search Beam Search Summary Exercises Attention Mechanisms and Transformers Queries, Keys, and Values Visualization Summary Exercises Attention Pooling by Similarity Kernels and Data Attention Pooling via Nadaraya–Watson Regression Adapting Attention Pooling Summary Exercises Attention Scoring Functions Dot Product Attention Convenience Functions Masked Softmax Operation Batch Matrix Multiplication Scaled Dot Product Attention Additive Attention Summary Exercises The Bahdanau Attention Mechanism Model Defining the Decoder with Attention Training Summary Exercises Multi-Head Attention Model Implementation Summary Exercises Self-Attention and Positional Encoding Self-Attention Comparing CNNs, RNNs, and Self-Attention Positional Encoding Absolute Positional Information Relative Positional Information Summary Exercises The Transformer Architecture Model Positionwise Feed-Forward Networks Residual Connection and Layer Normalization Encoder Decoder Training Summary Exercises Transformers for Vision Model Patch Embedding Vision Transformer Encoder Putting It All Together Training Summary and Discussion Exercises Large-Scale Pretraining with Transformers Encoder-Only Pretraining BERT Fine-Tuning BERT Encoder–Decoder Pretraining T5 Fine-Tuning T5 Decoder-Only GPT and GPT-2 GPT-3 and Beyond Scalability Large Language Models Summary and Discussion Exercises Optimization Algorithms Optimization and Deep Learning Goal of Optimization Optimization Challenges in Deep Learning Local Minima Saddle Points Vanishing Gradients Summary Exercises Convexity Definitions Convex Sets Convex Functions Jensen’s Inequality Properties Local Minima Are Global Minima Below Sets of Convex Functions Are Convex Convexity and Second Derivatives Constraints Lagrangian Penalties Projections Summary Exercises Gradient Descent One-Dimensional Gradient Descent Learning Rate Local Minima Multivariate Gradient Descent Adaptive Methods Newton’s Method Convergence Analysis Preconditioning Gradient Descent with Line Search Summary Exercises Stochastic Gradient Descent Stochastic Gradient Updates Dynamic Learning Rate Convergence Analysis for Convex Objectives Stochastic Gradients and Finite Samples Summary Exercises Minibatch Stochastic Gradient Descent Vectorization and Caches Minibatches Reading the Dataset Implementation from Scratch Concise Implementation Summary Exercises Momentum Basics Leaky Averages An Ill-conditioned Problem The Momentum Method Effective Sample Weight Practical Experiments Implementation from Scratch Concise Implementation Theoretical Analysis Quadratic Convex Functions Scalar Functions Summary Exercises Adagrad Sparse Features and Learning Rates Preconditioning The Algorithm Implementation from Scratch Concise Implementation Summary Exercises RMSProp The Algorithm Implementation from Scratch Concise Implementation Summary Exercises Adadelta The Algorithm Implementation Summary Exercises Adam The Algorithm Implementation Yogi Summary Exercises Learning Rate Scheduling Toy Problem Schedulers Policies Factor Scheduler Multi Factor Scheduler Cosine Scheduler Warmup Summary Exercises Computational Performance Compilers and Interpreters Symbolic Programming Hybrid Programming Hybridizing the Sequential Class Acceleration by Hybridization Serialization Summary Exercises Asynchronous Computation Asynchrony via Backend Barriers and Blockers Improving Computation Summary Exercises Automatic Parallelism Parallel Computation on GPUs Parallel Computation and Communication Summary Exercises Hardware Computers Memory Storage Hard Disk Drives Solid State Drives Cloud Storage CPUs Microarchitecture Vectorization Cache GPUs and other Accelerators Networks and Buses More Latency Numbers Summary Exercises Training on Multiple GPUs Splitting the Problem Data Parallelism A Toy Network Data Synchronization Distributing Data Training Summary Exercises Concise Implementation for Multiple GPUs A Toy Network Network Initialization Training Summary Exercises Parameter Servers Data-Parallel Training Ring Synchronization Multi-Machine Training Key–Value Stores Summary Exercises Computer Vision Image Augmentation Common Image Augmentation Methods Flipping and Cropping Changing Colors Combining Multiple Image Augmentation Methods Training with Image Augmentation Multi-GPU Training Summary Exercises Fine-Tuning Steps Hot Dog Recognition Reading the Dataset Defining and Initializing the Model Fine-Tuning the Model Summary Exercises Object Detection and Bounding Boxes Bounding Boxes Summary Exercises Anchor Boxes Generating Multiple Anchor Boxes Intersection over Union (IoU) Labeling Anchor Boxes in Training Data Assigning Ground-Truth Bounding Boxes to Anchor Boxes Labeling Classes and Offsets An Example Predicting Bounding Boxes with Non-Maximum Suppression Summary Exercises Multiscale Object Detection Multiscale Anchor Boxes Multiscale Detection Summary Exercises The Object Detection Dataset Downloading the Dataset Reading the Dataset Demonstration Summary Exercises Single Shot Multibox Detection Model Class Prediction Layer Bounding Box Prediction Layer Concatenating Predictions for Multiple Scales Downsampling Block Base Network Block The Complete Model Training Reading the Dataset and Initializing the Model Defining Loss and Evaluation Functions Training the Model Prediction Summary Exercises Region-based CNNs (R-CNNs) R-CNNs Fast R-CNN Faster R-CNN Mask R-CNN Summary Exercises Semantic Segmentation and the Dataset Image Segmentation and Instance Segmentation The Pascal VOC2012 Semantic Segmentation Dataset Data Preprocessing Custom Semantic Segmentation Dataset Class Reading the Dataset Putting It All Together Summary Exercises Transposed Convolution Basic Operation Padding, Strides, and Multiple Channels Connection to Matrix Transposition Summary Exercises Fully Convolutional Networks The Model Initializing Transposed Convolutional Layers Reading the Dataset Training Prediction Summary Exercises Neural Style Transfer Method Reading the Content and Style Images Preprocessing and Postprocessing Extracting Features Defining the Loss Function Content Loss Style Loss Total Variation Loss Loss Function Initializing the Synthesized Image Training Summary Exercises Image Classification (CIFAR-10) on Kaggle Obtaining and Organizing the Dataset Downloading the Dataset Organizing the Dataset Image Augmentation Reading the Dataset Defining the Model Defining the Training Function Training and Validating the Model Classifying the Testing Set and Submitting Results on Kaggle Summary Exercises Dog Breed Identification (ImageNet Dogs) on Kaggle Obtaining and Organizing the Dataset Downloading the Dataset Organizing the Dataset Image Augmentation Reading the Dataset Fine-Tuning a Pretrained Model Defining the Training Function Training and Validating the Model Classifying the Testing Set and Submitting Results on Kaggle Summary Exercises Natural Language Processing: Pretraining Word Embedding (word2vec) One-Hot Vectors Are a Bad Choice Self-Supervised word2vec The Skip-Gram Model Training The Continuous Bag of Words (CBOW) Model Training Summary Exercises Approximate Training Negative Sampling Hierarchical Softmax Summary Exercises The Dataset for Pretraining Word Embeddings Reading the Dataset Subsampling Extracting Center Words and Context Words Negative Sampling Loading Training Examples in Minibatches Putting It All Together Summary Exercises Pretraining word2vec The Skip-Gram Model Embedding Layer Defining the Forward Propagation Training Binary Cross-Entropy Loss Initializing Model Parameters Defining the Training Loop Applying Word Embeddings Summary Exercises Word Embedding with Global Vectors (GloVe) Skip-Gram with Global Corpus Statistics The GloVe Model Interpreting GloVe from the Ratio of Co-occurrence Probabilities Summary Exercises Subword Embedding The fastText Model Byte Pair Encoding Summary Exercises Word Similarity and Analogy Loading Pretrained Word Vectors Applying Pretrained Word Vectors Word Similarity Word Analogy Summary Exercises Bidirectional Encoder Representations from Transformers (BERT) From Context-Independent to Context-Sensitive From Task-Specific to Task-Agnostic BERT: Combining the Best of Both Worlds Input Representation Pretraining Tasks Masked Language Modeling Next Sentence Prediction Putting It All Together Summary Exercises The Dataset for Pretraining BERT Defining Helper Functions for Pretraining Tasks Generating the Next Sentence Prediction Task Generating the Masked Language Modeling Task Transforming Text into the Pretraining Dataset Summary Exercises Pretraining BERT Pretraining BERT Representing Text with BERT Summary Exercises Natural Language Processing: Applications Sentiment Analysis and the Dataset Reading the Dataset Preprocessing the Dataset Creating Data Iterators Putting It All Together Summary Exercises Sentiment Analysis: Using Recurrent Neural Networks Representing Single Text with RNNs Loading Pretrained Word Vectors Training and Evaluating the Model Summary Exercises Sentiment Analysis: Using Convolutional Neural Networks One-Dimensional Convolutions Max-Over-Time Pooling The textCNN Model Defining the Model Loading Pretrained Word Vectors Training and Evaluating the Model Summary Exercises Natural Language Inference and the Dataset Natural Language Inference The Stanford Natural Language Inference (SNLI) Dataset Reading the Dataset Defining a Class for Loading the Dataset Putting It All Together Summary Exercises Natural Language Inference: Using Attention The Model Attending Comparing Aggregating Putting It All Together Training and Evaluating the Model Reading the dataset Creating the Model Training and Evaluating the Model Using the Model Summary Exercises Fine-Tuning BERT for Sequence-Level and Token-Level Applications Single Text Classification Text Pair Classification or Regression Text Tagging Question Answering Summary Exercises Natural Language Inference: Fine-Tuning BERT Loading Pretrained BERT The Dataset for Fine-Tuning BERT Fine-Tuning BERT Summary Exercises Reinforcement Learning Markov Decision Process (MDP) Definition of an MDP Return and Discount Factor Discussion of the Markov Assumption Summary Exercises Value Iteration Stochastic Policy Value Function Action-Value Function Optimal Stochastic Policy Principle of Dynamic Programming Value Iteration Policy Evaluation Implementation of Value Iteration Summary Exercises Q-Learning The Q-Learning Algorithm An Optimization Problem Underlying Q-Learning Exploration in Q-Learning The “Self-correcting” Property of Q-Learning Implementation of Q-Learning Summary Exercises Gaussian Processes Introduction to Gaussian Processes Summary Exercises Gaussian Process Priors Definition A Simple Gaussian Process From Weight Space to Function Space The Radial Basis Function (RBF) Kernel The Neural Network Kernel Summary Exercises Gaussian Process Inference Posterior Inference for Regression Equations for Making Predictions and Learning Kernel Hyperparameters in GP Regression Interpreting Equations for Learning and Predictions Worked Example from Scratch Making Life Easy with GPyTorch Summary Exercises Hyperparameter Optimization What Is Hyperparameter Optimization? The Optimization Problem The Objective Function The Configuration Space Random Search Summary Exercises Hyperparameter Optimization API Searcher Scheduler Tuner Bookkeeping the Performance of HPO Algorithms Example: Optimizing the Hyperparameters of a Convolutional Neural Network Comparing HPO Algorithms Summary Exercises Asynchronous Random Search Objective Function Asynchronous Scheduler Visualize the Asynchronous Optimization Process Summary Exercises Multi-Fidelity Hyperparameter Optimization Successive Halving Summary Asynchronous Successive Halving Objective Function Asynchronous Scheduler Visualize the Optimization Process Summary Generative Adversarial Networks Generative Adversarial Networks Generate Some “Real” Data Generator Discriminator Training Summary Exercises Deep Convolutional Generative Adversarial Networks The Pokemon Dataset The Generator Discriminator Training Summary Exercises Recommender Systems Overview of Recommender Systems Collaborative Filtering Explicit Feedback and Implicit Feedback Recommendation Tasks Summary Exercises Appendix AMathematics for Deep Learning Geometry and Linear Algebraic Operations Geometry of Vectors Dot Products and Angles Cosine Similarity Hyperplanes Geometry of Linear Transformations Linear Dependence Rank Invertibility Numerical Issues Determinant Tensors and Common Linear Algebra Operations Common Examples from Linear Algebra Expressing in Code Summary Exercises Eigendecompositions Finding Eigenvalues An Example Decomposing Matrices Operations on Eigendecompositions Eigendecompositions of Symmetric Matrices Gershgorin Circle Theorem A Useful Application: The Growth of Iterated Maps Eigenvectors as Long Term Behavior Behavior on Random Data Relating Back to Eigenvectors An Observation Fixing the Normalization Discussion Summary Exercises Single Variable Calculus Differential Calculus Rules of Calculus Common Derivatives Derivative Rules Linear Approximation Higher Order Derivatives Taylor Series Summary Exercises Multivariable Calculus Higher-Dimensional Differentiation Geometry of Gradients and Gradient Descent A Note on Mathematical Optimization Multivariate Chain Rule The Backpropagation Algorithm Hessians A Little Matrix Calculus Summary Exercises Integral Calculus Geometric Interpretation The Fundamental Theorem of Calculus Change of Variables A Comment on Sign Conventions Multiple Integrals Change of Variables in Multiple Integrals Summary Exercises Random Variables Continuous Random Variables From Discrete to Continuous Probability Density Functions Cumulative Distribution Functions Means Variances Standard Deviations Means and Variances in the Continuum Joint Density Functions Marginal Distributions Covariance Correlation Summary Exercises Maximum Likelihood The Maximum Likelihood Principle A Concrete Example Numerical Optimization and the Negative Log-Likelihood Maximum Likelihood for Continuous Variables Summary Exercises Distributions Bernoulli Discrete Uniform Continuous Uniform Binomial Poisson Gaussian Exponential Family Summary Exercises Naive Bayes Optical Character Recognition The Probabilistic Model for Classification The Naive Bayes Classifier Training Summary Exercises Statistics Evaluating and Comparing Estimators Mean Squared Error Statistical Bias Variance and Standard Deviation The Bias-Variance Trade-off Evaluating Estimators in Code Conducting Hypothesis Tests Statistical Significance Statistical Power Test Statistic p-value One-side Test and Two-sided Test General Steps of Hypothesis Testing Constructing Confidence Intervals Definition Interpretation A Gaussian Example Summary Exercises Information Theory Information Self-information Entropy Motivating Entropy Definition Interpretations Properties of Entropy Mutual Information Joint Entropy Conditional Entropy Mutual Information Properties of Mutual Information Pointwise Mutual Information Applications of Mutual Information Kullback–Leibler Divergence Definition KL Divergence Properties Example Cross-Entropy Formal Definition Properties Cross-Entropy as An Objective Function of Multi-class Classification Summary Exercises Appendix BTools for Deep Learning Using Jupyter Notebooks Editing and Running the Code Locally Advanced Options Markdown Files in Jupyter Running Jupyter Notebooks on a Remote Server Timing Summary Exercises Using Amazon SageMaker Signing Up Creating a SageMaker Instance Running and Stopping an Instance Updating Notebooks Summary Exercises Using AWS EC2 Instances Creating and Running an EC2 Instance Presetting Location Increasing Limits Launching an Instance Connecting to the Instance Installing CUDA Installing Libraries for Running the Code Running the Jupyter Notebook remotely Closing Unused Instances Summary Exercises Using Google Colab Summary Exercises Selecting Servers and GPUs Selecting Servers Selecting GPUs Summary Contributing to This Book Submitting Minor Changes Proposing Major Changes Submitting Major Changes Installing Git Logging in to GitHub Cloning the Repository Editing and Pushing Submitting Pull Requests Summary Exercises Utility Functions and Classes The d2l API Document Classes Functions References