Sequence Modelling
Most data that humans are dealing with and use for learning in their day to day life is sequential. The texts we are reading, the language we are hearing and the visual input we are processing are all sequential. A lot of structured data, like stock prices and weather data, also tends to be sequential.
Info
Sequential data is any type of data that needs to be organized in an ordered fashion (order matters) and there is most likely a correlation with previous data points.
The feature of sequential data that separates it from other types of data is the importance of its order. When we were dealing with images of cats and dogs, our algorithm did not depend on the images to be sorted in any particular way. Sequential data on the other hand needs to be processed in a strictly sequential way.
Look at the sentence below for example. You have probably seen this sentence before and it should make sense to you. If you interract with the example the sequence will shuffle.
Does this new sequence still make sense to you? Probably not. Why should a neural network be able to work with a randomly shuffled sequence then?
In this chapter we are going to focus on sequence modelling, a series of techniques that are very well suited to deal with sequential data. Especially we will focus on so called autoregressive models.
Info
An autoregressive model uses past values of the sequence to predict the next value in the sequence.
We could use an autoregressive model for example to predict the fifth word in a sentence, given the previous four words.
Mathematically we can express this idea as the probability of a value in a sequence, given the previous values: P(x_t | x_{t-1}, x_{t-2}, \dots, x_1 ) undefined .
The feed forward neural networks that we have worked with so far do not take the sequence of the data into account. This chapter will therefore introduce a new type of a neural network that is very well suited for sequential data: a recurrent neural network. To give you a broader overview of the field, we will additionally cover the WaveNet architecture, a 1d convolutional neural network with dilations. The most powerful sequence models, called Transformers, will be covered in the next chapter.