Inside Deep Learning - Open Tech Book

Journey through the theory and practice of modern deep learning, and apply innovative techniques to solve everyday data problems.

In Inside Deep Learning, you will learn how to:

Implement deep learning with PyTorch
Select the right deep learning components
Train and evaluate a deep learning model
Fine tune deep learning models to maximize performance
Understand deep learning terminology
Adapt existing PyTorch code to solve new problems

Inside Deep Learning is an accessible guide to implementing deep learning with the PyTorch framework. It demystifies complex deep learning concepts and teaches you to understand the vocabulary of deep learning so you can keep pace in a rapidly evolving field. No detail is skipped—you’ll dive into math, theory, and practical applications. Everything is clearly explained in plain English.

About the technology

Deep learning doesn’t have to be a black box! Knowing how your models and algorithms actually work gives you greater control over your results. And you don’t have to be a mathematics expert or a senior data scientist to grasp what’s going on inside a deep learning system. This book gives you the practical insight you need to understand and explain your work with confidence.

About the book

Inside Deep Learning illuminates the inner workings of deep learning algorithms in a way that even machine learning novices can understand. You’ll explore deep learning concepts and tools through plain language explanations, annotated code, and dozens of instantly useful PyTorch examples. Each type of neural network is clearly presented without complex math, and every solution in this book can run using readily available GPU hardware!

Title: Inside Deep Learning
Subtitle: Math, Algorithms, Models
Publisher: Manning
Author(s): Edward Raff
Published: 2022-06-27
Edition: 1
Format: eBook (pdf, epub, mobi)
Pages: 425
Language: English
ISBN-10: 1617298638
ISBN-13: 9781617298639
License: Read online for free
Book Homepage: Free eBook, Errata, Code, Solutions, etc.

inside front cover
Inside Deep Learning
Copyright
dedication
contents
front matter
    Foreword
    Preface
    Acknowledgments
    About this book
        Who should read this book?
        How this book is organized: A road map
        About the mathematical notations
        About the exercises
        About Google Colab
        About the code
        liveBook discussion forum
        Other online resources
    About the author
    About the cover
Part 1. Foundational methods
1 The mechanics of learning
    1.1 Getting started with Colab
    1.2 The world as tensors
        1.2.1 PyTorch GPU acceleration
    1.3 Automatic differentiation
        1.3.1 Using derivatives to minimize losses
        1.3.2 Calculating a derivative with automatic differentiation
        1.3.3 Putting it together: Minimizing a function with derivatives
    1.4 Optimizing parameters
    1.5 Loading dataset objects
        1.5.1 Creating a training and testing split
    Exercises
    Summary
2 Fully connected networks
    2.1 Neural networks as optimization
        2.1.1 Notation of training a neural network
        2.1.2 Building a linear regression model
        2.1.3 The training loop
        2.1.4 Defining a dataset
        2.1.5 Defining the model
        2.1.6 Defining the loss function
        2.1.7 Putting it together: Training a linear regression model on the data
    2.2 Building our first neural network
        2.2.1 Notation for a fully connected network
        2.2.2 A fully connected network in PyTorch
        2.2.3 Adding nonlinearities
    2.3 Classification problems
        2.3.1 Classification toy problem
        2.3.2 Classification loss function
        2.3.3 Training a classification network
    2.4 Better training code
        2.4.1 Custom metrics
        2.4.2 Training and testing passes
        2.4.3 Saving checkpoints
        2.4.4 Putting it all together: A better model training function
    2.5 Training in batches
    Exercises
    Summary
3 Convolutional neural networks
    3.1 Spatial structural prior beliefs
        3.1.1 Loading MNIST with PyTorch
    3.2 What are convolutions?
        3.2.1 1D convolutions
        3.2.2 2D convolutions
        3.2.3 Padding
        3.2.4 Weight sharing
    3.3 How convolutions benefit image processing
    3.4 Putting it into practice: Our first CNN
        3.4.1 Making a convolutional layer with multiple filters
        3.4.2 Using multiple filters per layer
        3.4.3 Mixing convolutional layers with linear layers via flattening
        3.4.4 PyTorch code for our first CNN
    3.5 Adding pooling to mitigate object movement
        3.5.1 CNNs with max pooling
    3.6 Data augmentation
    Exercises
    Summary
4 Recurrent neural networks
    4.1 Recurrent neural networks as weight sharing
        4.1.1 Weight sharing for a fully connected network
        4.1.2 Weight sharing over time
    4.2 RNNs in PyTorch
        4.2.1 A simple sequence classification problem
        4.2.2 Embedding layers
        4.2.3 Making predictions using the last time step
    4.3 Improving training time with packing
        4.3.1 Pad and pack
        4.3.2 Packable embedding layer
        4.3.3 Training a batched RNN
        4.3.4 Simultaneous packed and unpacked inputs
    4.4 More complex RNNs
        4.4.1 Multiple layers
        4.4.2 Bidirectional RNNs
    Exercises
    Summary
5 Modern training techniques
    5.1 Gradient descent in two parts
        5.1.1 Adding a learning rate schedule
        5.1.2 Adding an optimizer
        5.1.3 Implementing optimizers and schedulers
    5.2 Learning rate schedules
        5.2.1 Exponential decay: Smoothing erratic training
        5.2.2 Step drop adjustment: Better smoothing
        5.2.3 Cosine annealing: Greater accuracy but less stability
        5.2.4 Validation plateau: Data-based adjustments
        5.2.5 Comparing the schedules
    5.3 Making better use of gradients
        5.3.1 SGD with momentum: Adapting to gradient consistency
        5.3.2 Adam: Adding variance to momentum
        5.3.3 Gradient clipping: Avoiding exploding gradients
    5.4 Hyperparameter optimization with Optuna
        5.4.1 Optuna
        5.4.2 Optuna with PyTorch
        5.4.3 Pruning trials with Optuna
    Exercises
    Summary
6 Common design building blocks
    6.1 Better activation functions
        6.1.1 Vanishing gradients
        6.1.2 Rectified linear units (ReLUs): Avoiding vanishing gradients
        6.1.3 Training with LeakyReLU activations
    6.2 Normalization layers: Magically better convergence
        6.2.1 Where do normalization layers go?
        6.2.2 Batch normalization
        6.2.3 Training with batch normalization
        6.2.4 Layer normalization
        6.2.5 Training with layer normalization
        6.2.6 Which normalization layer to use?
        6.2.7 A peculiarity of layer normalization
    6.3 Skip connections: A network design pattern
        6.3.1 Implementing fully connected skips
        6.3.2 Implementing convolutional skips
    6.4 1 × 1 Convolutions: Sharing and reshaping information in channels
        6.4.1 Training with 1 × 1 convolutions
    6.5 Residual connections
        6.5.1 Residual blocks
        6.5.2 Implementing residual blocks
        6.5.3 Residual bottlenecks
        6.5.4 Implementing residual bottlenecks
    6.6 Long short-term memory RNNs
        6.6.1 RNNs: A fast review
        6.6.2 LSTMs and the gating mechanism
        6.6.3 Training an LSTM
    Exercises
    Summary
Part 2. Building advanced networks
7 Autoencoding and self-supervision
    7.1 How autoencoding works
        7.1.1 Principle component analysis is a bottleneck autoencoder
        7.1.2 Implementing PCA
        7.1.3 Implementing PCA with PyTorch
        7.1.4 Visualizing PCA results
        7.1.5 A simple nonlinear PCA
    7.2 Designing autoencoding neural networks
        7.2.1 Implementing an autoencoder
        7.2.2 Visualizing autoencoder results
    7.3 Bigger autoencoders
        7.3.1 Robustness to noise
    7.4 Denoising autoencoders
        7.4.1 Denoising with Gaussian noise
    7.5 Autoregressive models for time series and sequences
        7.5.1 Implementing the char-RNN autoregressive text model
        7.5.2 Autoregressive models are generative models
        7.5.3 Changing samples with temperature
        7.5.4 Faster sampling
    Exercises
    Summary
8 Object detection
    8.1 Image segmentation
        8.1.1 Nuclei detection: Loading the data
        8.1.2 Representing the image segmentation problem in PyTorch
        8.1.3 Building our first image segmentation network
    8.2 Transposed convolutions for expanding image size
        8.2.1 Implementing a network with transposed convolutions
    8.3 U-Net: Looking at fine and coarse details
        8.3.1 Implementing U-Net
    8.4 Object detection with bounding boxes
        8.4.1 Faster R-CNN
        8.4.2 Using Faster R-CNN in PyTorch
        8.4.3 Suppressing overlapping boxes
    8.5 Using the pretrained Faster R-CNN
    Exercises
    Summary
9 Generative adversarial networks
    9.1 Understanding generative adversarial networks
        9.1.1 The loss computations
        9.1.2 The GAN games
        9.1.3 Implementing our first GAN
    9.2 Mode collapse
    9.3 Wasserstein GAN: Mitigating mode collapse
        9.3.1 WGAN discriminator loss
        9.3.2 WGAN generator loss
        9.3.3 Implementing WGAN
    9.4 Convolutional GAN
        9.4.1 Designing a convolutional generator
        9.4.2 Designing a convolutional discriminator
    9.5 Conditional GAN
        9.5.1 Implementing a conditional GAN
        9.5.2 Training a conditional GAN
        9.5.3 Controlling the generation with conditional GANs
    9.6 Walking the latent space of GANs
        9.6.1 Getting models from the Hub
        9.6.2 Interpolating GAN output
        9.6.3 Labeling latent dimensions
    9.7 Ethics in deep learning
    Exercises
    Summary
10 Attention mechanisms
    10.1 Attention mechanisms learn relative input importance
        10.1.1 Training our baseline model
        10.1.2 Attention mechanism mechanics
        10.1.3 Implementing a simple attention mechanism
    10.2 Adding some context
        10.2.1 Dot score
        10.2.2 General score
        10.2.3 Additive attention
        10.2.4 Computing attention weights
    10.3 Putting it all together: A complete attention mechanism with context
    Exercises
    Summary
11 Sequence-to-sequence
    11.1 Sequence-to-sequence as a kind of denoising autoencoder
        11.1.1 Adding attention creates Seq2Seq
    11.2 Machine translation and the data loader
        11.2.1 Loading a small English-French dataset
    11.3 Inputs to Seq2Seq
        11.3.1 Autoregressive approach
        11.3.2 Teacher-forcing approach
        11.3.3 Teacher forcing vs. an autoregressive approach
    11.4 Seq2Seq with attention
        11.4.1 Implementing Seq2Seq
        11.4.2 Training and evaluation
    Exercises
    Summary
12 Network design alternatives to RNNs
    12.1 TorchText: Tools for text problems
        12.1.1 Installing TorchText
        12.1.2 Loading datasets in TorchText
        12.1.3 Defining a baseline model
    12.2 Averaging embeddings over time
        12.2.1 Weighted average over time with attention
    12.3 Pooling over time and 1D CNNs
    12.4 Positional embeddings add sequence information to any model
        12.4.1 Implementing a positional encoding module
        12.4.2 Defining positional encoding models
    12.5 Transformers: Big models for big data
        12.5.1 Multiheaded attention
        12.5.2 Transformer blocks
    Exercises
    Summary
13 Transfer learning
    13.1 Transferring model parameters
        13.1.1 Preparing an image dataset
    13.2 Transfer learning and training with CNNs
        13.2.1 Adjusting pretrained networks
        13.2.2 Preprocessing for pretrained ResNet
        13.2.3 Training with warm starts
        13.2.4 Training with frozen weights
    13.3 Learning with fewer labels
    13.4 Pretraining with text
        13.4.1 Transformers with the Hugging Face library
        13.4.2 Freezing weights with no-grad
    Exercises
    Summary
14 Advanced building blocks
    14.1 Problems with pooling
        14.1.1 Aliasing compromises translation invariance
        14.1.2 Anti-aliasing by blurring
        14.1.3 Applying anti-aliased pooling
    14.2 Improved residual blocks
        14.2.1 Effective depth
        14.2.2 Implementing ReZero
    14.3 MixUp training reduces overfitting
        14.3.1 Picking the mix rate
        14.3.2 Implementing MixUp
    Exercises
    Summary
Appendix. Setting up Colab
    A.1 Creating a Colab session
        Adding a GPU
        Testing your GPU
Index
inside back cover