Stability and Speedup
It takes a lot of time and patience to train a neural network. In practice you usually want to iterate often to find a good approach to a particular problem. But if a single iteration takes hours (or even days) you might not be able to solve your task. Especially when you need to train large models on a cluster of GPUs, each saved hour of training will additionally correspond to monetary savings.
In this chapter we will look at different techniques that will either improve your training time significantly or allow you to build models that are less prone to instability during training.
Be aware, that many of the techniques that we will introduce in this chapter are not strictly necessary to sovle MNIST, yet they will be essential down the line once we start dealing with convolutional neural networks or transformer models.