This book is a short introduction to deep learning for readers with a STEM background. It aims at providing the necessary background to understand landmark AI models for image generation and language understanding.
This is the v1.2 updated on May 19th, 2024.
It is distributed under a non-commercial Creative Commons license and was downloaded 600'000 times in a bit more than a year.
Updates
V1.2 (May 19, 2024)
- Chapter 8. New chapter on low-resource methods (prompt engineering, quantization, low-rank adapters, model merging).
- Miscellaneous. Changed "meta parameter" to "hyper parameter".
- Section 3.6. Added a sub-section about fine-tuning.
- Section 4.8. Added the note about the quadratic cost of the attention operator
- The missing bits. Added a note about the O(T) of standard RNNs vs. the O(log T) of methods that leverage parallel scan.
V1.1.1 (Sep 20, 2023)
- Section 4.2. Added a paragraph about the equivariance of convolution layers.
- Section 5.3. Fixed the description of the original Transformer, and modified Figures 5.6, 5.7, 5.8, and 5.9 accordingly.
V1.1 (Sep 8, 2023)
- Miscellaneous. Fixed minor typos and phrasings.
- Section 1.3. Reformulated the text to clarify that overfitting is not particularly related to noise, but to any properties specific to the training set, as it is the case on the Figure 1.2.
- Section 3.2. Clarified the phrasing and changed the Figure 3.1.
- Section 3.4. Fixed the indexing of the mappings in the example of a composition.
- Section 3.7. Fixed the label "1TWh" in Figure 3.7, that should be "1GWh".
- Section 4.5. Added a figure to illustrate the functioning of 2D dropout.
- Section 4.6. Changed the Figure 4.8 so that in the top part illustrating the re-scaling / translating after normalization, the highlighted sub-blocks correspond to groups of activations that are re-scaled / translated with the same factor / bias.
- Section 6.6. Restricted the Figure 6.4. to three sub-images to make the text more legible.
- Section 7.1. Added two paragraphs to introduce the notion of Reinforcement Learning from Human Feedback.
- The missing bits. Removed the fine-tuning sub-section, since most of it was moved to Section 7.1.
Conditions of Use
This book is licensed under a Creative Commons License (CC BY-NC-SA). You can download the ebook The Little Book of Deep Learning for free.
- Title
- The Little Book of Deep Learning
- Publisher
- LuLu.com
- Author(s)
- François Fleuret
- Published
- 2024-5-19
- Edition
- 1
- Format
- eBook (pdf, epub, mobi)
- Pages
- 177
- Language
- English
- ISBN-13
- 9781447678618
- License
- CC BY-NC-SA
- Book Homepage
- Free eBook, Errata, Code, Solutions, etc.
Contents section*.2 List of figures section*.3 Foreword chapter*.4 Foundations part.6 Machine Learning chapter.7 Learning from data section.8 Basis function regression section.9 Under and overfitting section.13 Categories of models section.15 Efficient Computation chapter.16 GPUs, TPUs, and batches section.17 Tensors section.18 Training chapter.19 Losses section.20 Autoregressive models section.26 Gradient descent section.35 Backpropagation section.42 The value of depth section.49 Training protocols section.51 The benefits of scale section.54 Deep Models part.59 Model Components chapter.60 The notion of layer section.61 Linear layers section.62 Activation functions section.69 Pooling section.74 Dropout section.76 Normalizing layers section.79 Skip connections section.81 Attention layers section.83 Token embedding section.95 Positional encoding section.96 Architectures chapter.97 Multi-Layer Perceptrons section.98 Convolutional networks section.100 Attention models section.107 Applications part.116 Prediction chapter.117 Image denoising section.118 Image classification section.119 Object detection section.120 Semantic segmentation section.123 Speech recognition section.125 Text-image representations section.126 Reinforcement learning section.129 Synthesis chapter.133 Text generation section.134 Image generation section.136 The Compute Schism chapter.140 Prompt Engineering section.141 Quantization section.145 Adapters section.147 Model merging section.148 The missing bits chapter*.149 Bibliography section*.155 Index section*.157
Related Books