Overfitting

Let us assume that we face data with a single feature and we contemplate how we are going to fit some model to the data.

There seems to be a linear relationship between the feature and the target, even though there is quite a bit of noise. So we might decide to use linear regression for our task.

The results looks quite reasonable, but we could also train a neural network. In that case we would end up with a model that fits the data very closely.

While linear regression can only model linear relationships, a neural network can theoretically assume any imaginable form. When we try to reduce the mean squared error, the neural network will inevitably assume a form that fits the data as close as possible. Given enough neurons and layers we might end up fitting the data perfectly. Yet when you look at both models above you might end up thinking that a simple linear model is actually a much better fit. The keyword we are looking for is generalization.

Info

A model that generalizes well is good at modelling new unforseen data.

When the two models are faced with new data, the neural network will underperform, while the simpler linear regression model will do just fine. In other words the linear model generalizes much better in this example.

This problem when the model fits too closely to the training data and does not generalize well to previously unforseen data is called overfitting. Overfitting is a common problem in deep learning and we are going to spend this entire chapter discussing different remedies.

As you might imagine underfitting can also be a potential problem in deep learning.

Info

A model that does not have enough expressiveness or parameters to fit to the data, will underfit the data.

In the example below we face data produced by a quadratic function, yet we attempt to use linear regression to fit the data. No matter how hard we try, even during training we are going to underperform.

Solving underfitting is usually a much easier problem, because it is often sufficient to increase the parameters of the model. We could for example use a neural network to train a model that fits the data better. Increasing the number of layers and/or the number of neurons usually solves the problem of underfitting.