Journey through the theory and practice of modern deep learning, and apply innovative techniques to solve everyday data problems.
In Inside Deep Learning, you will learn how to:
- Implement deep learning with PyTorch
- Select the right deep learning components
- Train and evaluate a deep learning model
- Fine tune deep learning models to maximize performance
- Understand deep learning terminology
- Adapt existing PyTorch code to solve new problems
Inside Deep Learning is an accessible guide to implementing deep learning with the PyTorch framework. It demystifies complex deep learning concepts and teaches you to understand the vocabulary of deep learning so you can keep pace in a rapidly evolving field. No detail is skipped—you’ll dive into math, theory, and practical applications. Everything is clearly explained in plain English.
About the technology
Deep learning doesn’t have to be a black box! Knowing how your models and algorithms actually work gives you greater control over your results. And you don’t have to be a mathematics expert or a senior data scientist to grasp what’s going on inside a deep learning system. This book gives you the practical insight you need to understand and explain your work with confidence.
About the book
Inside Deep Learning illuminates the inner workings of deep learning algorithms in a way that even machine learning novices can understand. You’ll explore deep learning concepts and tools through plain language explanations, annotated code, and dozens of instantly useful PyTorch examples. Each type of neural network is clearly presented without complex math, and every solution in this book can run using readily available GPU hardware!
- Title
- Inside Deep Learning
- Subtitle
- Math, Algorithms, Models
- Publisher
- Manning
- Author(s)
- Edward Raff
- Published
- 2022-06-27
- Edition
- 1
- Format
- eBook (pdf, epub, mobi)
- Pages
- 425
- Language
- English
- ISBN-10
- 1617298638
- ISBN-13
- 9781617298639
- License
- Read online for free
- Book Homepage
- Free eBook, Errata, Code, Solutions, etc.
inside front cover Inside Deep Learning Copyright dedication contents front matter Foreword Preface Acknowledgments About this book Who should read this book? How this book is organized: A road map About the mathematical notations About the exercises About Google Colab About the code liveBook discussion forum Other online resources About the author About the cover Part 1. Foundational methods 1 The mechanics of learning 1.1 Getting started with Colab 1.2 The world as tensors 1.2.1 PyTorch GPU acceleration 1.3 Automatic differentiation 1.3.1 Using derivatives to minimize losses 1.3.2 Calculating a derivative with automatic differentiation 1.3.3 Putting it together: Minimizing a function with derivatives 1.4 Optimizing parameters 1.5 Loading dataset objects 1.5.1 Creating a training and testing split Exercises Summary 2 Fully connected networks 2.1 Neural networks as optimization 2.1.1 Notation of training a neural network 2.1.2 Building a linear regression model 2.1.3 The training loop 2.1.4 Defining a dataset 2.1.5 Defining the model 2.1.6 Defining the loss function 2.1.7 Putting it together: Training a linear regression model on the data 2.2 Building our first neural network 2.2.1 Notation for a fully connected network 2.2.2 A fully connected network in PyTorch 2.2.3 Adding nonlinearities 2.3 Classification problems 2.3.1 Classification toy problem 2.3.2 Classification loss function 2.3.3 Training a classification network 2.4 Better training code 2.4.1 Custom metrics 2.4.2 Training and testing passes 2.4.3 Saving checkpoints 2.4.4 Putting it all together: A better model training function 2.5 Training in batches Exercises Summary 3 Convolutional neural networks 3.1 Spatial structural prior beliefs 3.1.1 Loading MNIST with PyTorch 3.2 What are convolutions? 3.2.1 1D convolutions 3.2.2 2D convolutions 3.2.3 Padding 3.2.4 Weight sharing 3.3 How convolutions benefit image processing 3.4 Putting it into practice: Our first CNN 3.4.1 Making a convolutional layer with multiple filters 3.4.2 Using multiple filters per layer 3.4.3 Mixing convolutional layers with linear layers via flattening 3.4.4 PyTorch code for our first CNN 3.5 Adding pooling to mitigate object movement 3.5.1 CNNs with max pooling 3.6 Data augmentation Exercises Summary 4 Recurrent neural networks 4.1 Recurrent neural networks as weight sharing 4.1.1 Weight sharing for a fully connected network 4.1.2 Weight sharing over time 4.2 RNNs in PyTorch 4.2.1 A simple sequence classification problem 4.2.2 Embedding layers 4.2.3 Making predictions using the last time step 4.3 Improving training time with packing 4.3.1 Pad and pack 4.3.2 Packable embedding layer 4.3.3 Training a batched RNN 4.3.4 Simultaneous packed and unpacked inputs 4.4 More complex RNNs 4.4.1 Multiple layers 4.4.2 Bidirectional RNNs Exercises Summary 5 Modern training techniques 5.1 Gradient descent in two parts 5.1.1 Adding a learning rate schedule 5.1.2 Adding an optimizer 5.1.3 Implementing optimizers and schedulers 5.2 Learning rate schedules 5.2.1 Exponential decay: Smoothing erratic training 5.2.2 Step drop adjustment: Better smoothing 5.2.3 Cosine annealing: Greater accuracy but less stability 5.2.4 Validation plateau: Data-based adjustments 5.2.5 Comparing the schedules 5.3 Making better use of gradients 5.3.1 SGD with momentum: Adapting to gradient consistency 5.3.2 Adam: Adding variance to momentum 5.3.3 Gradient clipping: Avoiding exploding gradients 5.4 Hyperparameter optimization with Optuna 5.4.1 Optuna 5.4.2 Optuna with PyTorch 5.4.3 Pruning trials with Optuna Exercises Summary 6 Common design building blocks 6.1 Better activation functions 6.1.1 Vanishing gradients 6.1.2 Rectified linear units (ReLUs): Avoiding vanishing gradients 6.1.3 Training with LeakyReLU activations 6.2 Normalization layers: Magically better convergence 6.2.1 Where do normalization layers go? 6.2.2 Batch normalization 6.2.3 Training with batch normalization 6.2.4 Layer normalization 6.2.5 Training with layer normalization 6.2.6 Which normalization layer to use? 6.2.7 A peculiarity of layer normalization 6.3 Skip connections: A network design pattern 6.3.1 Implementing fully connected skips 6.3.2 Implementing convolutional skips 6.4 1 × 1 Convolutions: Sharing and reshaping information in channels 6.4.1 Training with 1 × 1 convolutions 6.5 Residual connections 6.5.1 Residual blocks 6.5.2 Implementing residual blocks 6.5.3 Residual bottlenecks 6.5.4 Implementing residual bottlenecks 6.6 Long short-term memory RNNs 6.6.1 RNNs: A fast review 6.6.2 LSTMs and the gating mechanism 6.6.3 Training an LSTM Exercises Summary Part 2. Building advanced networks 7 Autoencoding and self-supervision 7.1 How autoencoding works 7.1.1 Principle component analysis is a bottleneck autoencoder 7.1.2 Implementing PCA 7.1.3 Implementing PCA with PyTorch 7.1.4 Visualizing PCA results 7.1.5 A simple nonlinear PCA 7.2 Designing autoencoding neural networks 7.2.1 Implementing an autoencoder 7.2.2 Visualizing autoencoder results 7.3 Bigger autoencoders 7.3.1 Robustness to noise 7.4 Denoising autoencoders 7.4.1 Denoising with Gaussian noise 7.5 Autoregressive models for time series and sequences 7.5.1 Implementing the char-RNN autoregressive text model 7.5.2 Autoregressive models are generative models 7.5.3 Changing samples with temperature 7.5.4 Faster sampling Exercises Summary 8 Object detection 8.1 Image segmentation 8.1.1 Nuclei detection: Loading the data 8.1.2 Representing the image segmentation problem in PyTorch 8.1.3 Building our first image segmentation network 8.2 Transposed convolutions for expanding image size 8.2.1 Implementing a network with transposed convolutions 8.3 U-Net: Looking at fine and coarse details 8.3.1 Implementing U-Net 8.4 Object detection with bounding boxes 8.4.1 Faster R-CNN 8.4.2 Using Faster R-CNN in PyTorch 8.4.3 Suppressing overlapping boxes 8.5 Using the pretrained Faster R-CNN Exercises Summary 9 Generative adversarial networks 9.1 Understanding generative adversarial networks 9.1.1 The loss computations 9.1.2 The GAN games 9.1.3 Implementing our first GAN 9.2 Mode collapse 9.3 Wasserstein GAN: Mitigating mode collapse 9.3.1 WGAN discriminator loss 9.3.2 WGAN generator loss 9.3.3 Implementing WGAN 9.4 Convolutional GAN 9.4.1 Designing a convolutional generator 9.4.2 Designing a convolutional discriminator 9.5 Conditional GAN 9.5.1 Implementing a conditional GAN 9.5.2 Training a conditional GAN 9.5.3 Controlling the generation with conditional GANs 9.6 Walking the latent space of GANs 9.6.1 Getting models from the Hub 9.6.2 Interpolating GAN output 9.6.3 Labeling latent dimensions 9.7 Ethics in deep learning Exercises Summary 10 Attention mechanisms 10.1 Attention mechanisms learn relative input importance 10.1.1 Training our baseline model 10.1.2 Attention mechanism mechanics 10.1.3 Implementing a simple attention mechanism 10.2 Adding some context 10.2.1 Dot score 10.2.2 General score 10.2.3 Additive attention 10.2.4 Computing attention weights 10.3 Putting it all together: A complete attention mechanism with context Exercises Summary 11 Sequence-to-sequence 11.1 Sequence-to-sequence as a kind of denoising autoencoder 11.1.1 Adding attention creates Seq2Seq 11.2 Machine translation and the data loader 11.2.1 Loading a small English-French dataset 11.3 Inputs to Seq2Seq 11.3.1 Autoregressive approach 11.3.2 Teacher-forcing approach 11.3.3 Teacher forcing vs. an autoregressive approach 11.4 Seq2Seq with attention 11.4.1 Implementing Seq2Seq 11.4.2 Training and evaluation Exercises Summary 12 Network design alternatives to RNNs 12.1 TorchText: Tools for text problems 12.1.1 Installing TorchText 12.1.2 Loading datasets in TorchText 12.1.3 Defining a baseline model 12.2 Averaging embeddings over time 12.2.1 Weighted average over time with attention 12.3 Pooling over time and 1D CNNs 12.4 Positional embeddings add sequence information to any model 12.4.1 Implementing a positional encoding module 12.4.2 Defining positional encoding models 12.5 Transformers: Big models for big data 12.5.1 Multiheaded attention 12.5.2 Transformer blocks Exercises Summary 13 Transfer learning 13.1 Transferring model parameters 13.1.1 Preparing an image dataset 13.2 Transfer learning and training with CNNs 13.2.1 Adjusting pretrained networks 13.2.2 Preprocessing for pretrained ResNet 13.2.3 Training with warm starts 13.2.4 Training with frozen weights 13.3 Learning with fewer labels 13.4 Pretraining with text 13.4.1 Transformers with the Hugging Face library 13.4.2 Freezing weights with no-grad Exercises Summary 14 Advanced building blocks 14.1 Problems with pooling 14.1.1 Aliasing compromises translation invariance 14.1.2 Anti-aliasing by blurring 14.1.3 Applying anti-aliased pooling 14.2 Improved residual blocks 14.2.1 Effective depth 14.2.2 Implementing ReZero 14.3 MixUp training reduces overfitting 14.3.1 Picking the mix rate 14.3.2 Implementing MixUp Exercises Summary Appendix. Setting up Colab A.1 Creating a Colab session Adding a GPU Testing your GPU Index inside back cover