Neural Network

Let's start this chapter with the obvious question: what is a neural network?

Info

A neural network is an object, that structures individual neurons in a hierarchy of layers and combines them into a single model, by feeding the outputs of a layer as inputs into the next layer.

If the above definition does not make any sense to you, below is a more intuitive explanation.

There are many different activation functions out there, but for now we will assume that we are dealing with the sigmoid activation function. That means, that a neuron is essentially a separate logistic regression unit with individual weights and a bias. The neuron below for example takes features x_1 - x_4 undefined as inputs, multiplies those with individual weights w_1 - w_4 undefined , adds the bias b undefined and applies the sigmoid activation function \sigma undefined .

In a neural network the same are used to produce several different neurons. Those neurons utilize different weights and biases and produce therefore different outputs. Such a collection of neurons is called a layer.

We can stack several layers after each other to produce a neural network. The outputs of the previous layer are used as inputs instead of the input features. Often the input neurons are also called hidden features.

The output neuron(s) is (are) used as an input into the loss function, for example the cross-entropy loss if we are dealing with a classificatio problem.

We can train neural networks that can classify images, generate text or play computer games. No matter what task we are trying to accomplish and how the neural network is structured, the training process of neural networks is always done using the same steps that we used in linear and logistic regression.

In the forward pass the features are processed layer by layer and neuron by neuron to finally determine the loss of the neural network and to construct a computational graph.

In the backward pass we use the backpropagation algorithm to calculate the gradients for all weights and biases.

Conceptually the whole learning process is not much different from what we saw in the previous chapters. The computational graph is larger and broader, but the ideas are the same.