Deep Learning

If deep learning is a subset of machine learning, we need to ask ourselves the following question. What makes a machine learning algorithm a deep learning algorithm?

Info

Neural networks, a "deep" architecture and representation learning are all traits of deep learning.

Neural Networks

Deep learning is exclusively based on artificial neural networks. Machine learning algorithms that do not utilize neural networks can therefore not qualify as deep learning.

In machine learning an artificial neuron is just a computational unit. This unit receives some inputs (e.g features of a house) and predicts an output (e.g. the price of the house) using its' model.

The model that transforms features into label predictions is learned from data. In that sense a neuron is not different from any other machine learning algorithm. In fact the model of the neuron is extremely simple. For the most part it involves addition and multiplication. Why then would we use artificial neurons to build systems that should be capable of image recognition, text generation and other fairly complex tasks? We have the ability to stack neurons and thus creating a network of artificial neurons.

Info

An artificial neural network is a set of interconnected neurons, where the output of a neuron is used as the input of the next neuron.

We must not forget that each neuron has its own small model under the hood. This allows each of the neurons to learn a solution to a different subproblem. This approach is called divide and conquer. The whole task is divided into solvable small chunks and the solution to those chunks constitutes the solution to the larger task. The beauty of neural networks lies in their ability to use "divide and conquer" automatically, without being explicitly told what the subproblems are.

Traditional neural networks are structured in a layered architecture. Each neuron takes the outputs from all neurons in the previous layer as its' inputs. Similarly the single output of each neuron is used as an input for each neuron of the next layer. Yet even though the input for each of the neurons in the same layer are the same, the outputs are different, because each neuron uses a different model internally. This type of a network is called a fully connected neural network.

The very first layer in a neural network is called input layer. The input layer does not involve any calculations. It holds the features of the dataset and is exclusively used as the input to the neurons in the next layer. The last layer is called output layer. The intermediary layers are called hidden layers.

Deep Architecture

The term deep learning implies a deep architecture, which means that we expect a neural network to consist of at least two hidden layers. Most modern neural networks have vastly more layers, but historically it was extremely hard to train deep neural networks. More hidden layers did not improve the performance of a neural network automatically. On the contrary, more layers usually decreased the performance, because the learning algorithm broke down once the distance that information needs to travel between neurons increased beyond a certain threshhold. Luckily researchers found ways to deal with huge neural networks consisting of several 100 or even 1000 hidden layers, but you should not forget, that the success of deep neural networks is a relatively new phenomenon.

Representation Learning

Traditional machine learning relies heavily on feature engineering.

Info

Feature engineering is the process of generating new features from less relevant features using human domain knowledge. This process tends to improve the performance of a traditional machine learning algorithms.

Let us consider a regression task, where we try to predict the price of a house based on the location and the size of the house. While the location seems to have an impact on the price of a house, the representation of the location is relatively cryptic.

Location	Size	Price
100, 100	100	1000000
120, 100	120	1200000
10, 30	90	200000
20, 25	45	110000

A human expert might know that these coordinates are useful to decide how far from the city center the house lies. Larger distance implies a lower price. From those considerations the expert might decide to classify the coordinates into several categories that are useful for a machine learning algorithm, which should lead to a better prediction quality.

Location Constructed	Size	Price
City Center	100	1000000
City Center	120	1200000
City Outskirts	90	200000
City Outskirts	45	110000

Deep learning on the other hand must not involve any active feature engineering. Due to its hierarchical (layered) nature, deep neural networks are able to learn useful representations (hidden features) of input variables on their own, provided we have a large enough dataset. We can for example imagine that the first layers are responsible for learning those representations (e.g. city center), while the latter layers are responsible for the calculation of targets (e.g. price).

Usecase for Deep Learning

Deep learning has overtaken other machine learning methods in almost all domains. Computer vision, speach recognition and many more tasks require deep learning, but there are some prerequisites that need to be met if we want to apply deep learning.

For once deep learning needs massive amounts of data if we want to achieve decent results. Modern image recognition systems are trained on millions of images and text translation systems are usually trained on all of Wikipedia.

This amount of data needs to be incorporated into the training process, which in turn requires the use of modern graphics cards, which can be extremely costly. Moreover the electricity bill will cost you a small (or a big) fortune, because these algorithms are usually trained for several days at a time.

The good news is, that we can utilize smaller datasets to learn how deep learning works. We might not produce state of the art results, but the knowledge should be transferable. Additionally we will discuss free/cheap computational resources in a different chapter, which will allows us to train decent models even without having access to a local graphics card.