The Little Book of Deep Learning

This book is a short introduction to deep learning for readers with a STEM background. It aims at providing the necessary background to understand landmark AI models for image generation and language understanding.

This is the v1.2 updated on May 19th, 2024.

It is distributed under a non-commercial Creative Commons license and was downloaded 600'000 times in a bit more than a year.

Updates

V1.2 (May 19, 2024)

Chapter 8. New chapter on low-resource methods (prompt engineering, quantization, low-rank adapters, model merging).
Miscellaneous. Changed "meta parameter" to "hyper parameter".
Section 3.6. Added a sub-section about fine-tuning.
Section 4.8. Added the note about the quadratic cost of the attention operator
The missing bits. Added a note about the O(T) of standard RNNs vs. the O(log T) of methods that leverage parallel scan.

V1.1.1 (Sep 20, 2023)

Section 4.2. Added a paragraph about the equivariance of convolution layers.
Section 5.3. Fixed the description of the original Transformer, and modified Figures 5.6, 5.7, 5.8, and 5.9 accordingly.

V1.1 (Sep 8, 2023)

Miscellaneous. Fixed minor typos and phrasings.
Section 1.3. Reformulated the text to clarify that overfitting is not particularly related to noise, but to any properties specific to the training set, as it is the case on the Figure 1.2.
Section 3.2. Clarified the phrasing and changed the Figure 3.1.
Section 3.4. Fixed the indexing of the mappings in the example of a composition.
Section 3.7. Fixed the label "1TWh" in Figure 3.7, that should be "1GWh".
Section 4.5. Added a figure to illustrate the functioning of 2D dropout.
Section 4.6. Changed the Figure 4.8 so that in the top part illustrating the re-scaling / translating after normalization, the highlighted sub-blocks correspond to groups of activations that are re-scaled / translated with the same factor / bias.
Section 6.6. Restricted the Figure 6.4. to three sub-images to make the text more legible.
Section 7.1. Added two paragraphs to introduce the notion of Reinforcement Learning from Human Feedback.
The missing bits. Removed the fine-tuning sub-section, since most of it was moved to Section 7.1.

Conditions of Use

This book is licensed under a Creative Commons License (CC BY-NC-SA). You can download the ebook The Little Book of Deep Learning for free.

Title: The Little Book of Deep Learning
Publisher: LuLu.com
Author(s): François Fleuret
Published: 2024-5-19
Edition: 1
Format: eBook (pdf, epub, mobi)
Pages: 177
Language: English
ISBN-13: 9781447678618
License: CC BY-NC-SA
Book Homepage: Free eBook, Errata, Code, Solutions, etc.

Contents	section*.2
List of figures	section*.3
Foreword	chapter*.4
Foundations	part.6
	Machine Learning	chapter.7
		Learning from data	section.8
		Basis function regression	section.9
		Under and overfitting	section.13
		Categories of models	section.15
	Efficient Computation	chapter.16
		GPUs, TPUs, and batches	section.17
		Tensors	section.18
	Training	chapter.19
		Losses	section.20
		Autoregressive models	section.26
		Gradient descent	section.35
		Backpropagation	section.42
		The value of depth	section.49
		Training protocols	section.51
		The benefits of scale	section.54
Deep Models	part.59
	Model Components	chapter.60
		The notion of layer	section.61
		Linear layers	section.62
		Activation functions	section.69
		Pooling	section.74
		Dropout	section.76
		Normalizing layers	section.79
		Skip connections	section.81
		Attention layers	section.83
		Token embedding	section.95
		Positional encoding	section.96
	Architectures	chapter.97
		Multi-Layer Perceptrons	section.98
		Convolutional networks	section.100
		Attention models	section.107
Applications	part.116
	Prediction	chapter.117
		Image denoising	section.118
		Image classification	section.119
		Object detection	section.120
		Semantic segmentation	section.123
		Speech recognition	section.125
		Text-image representations	section.126
		Reinforcement learning	section.129
	Synthesis	chapter.133
		Text generation	section.134
		Image generation	section.136
	The Compute Schism	chapter.140
		Prompt Engineering	section.141
		Quantization	section.145
		Adapters	section.147
		Model merging	section.148
The missing bits	chapter*.149
Bibliography	section*.155
Index	section*.157

Deep learning