Nonlinear Problems
We have arrived at a point in our studies, where we can start to understand neural networks, but there are several questions we should ask ourselves before we move on to the technicalities of neural networks. Let's start with the most obvious question.
Info
Why do we need neural network when we can solve regression tasks using linear regression and classification tasks using logistic regression?
In the example below we have a classification problem with two classes and two features. By visually inspecting the dataset, the human brain can quickly separate the data by imagining a circle between the two classes.
If you go back to the logistic regression lecture, you will remember that logistic regression produces a linear decision boundary[1] . While this might be sufficient for some problems, in our case we would misclassify approximately half of the data. Logistic regression can only produce a linear decision boundary and therefore can only solve linear problems. The data below on the other hand clearly depicts a nonlinear problem.
A neural network on the other hand can theoretically generate an adequate decision boundary for nonlinear problems.
Most interesting problems in machine learning are nonlinear. Computer vision for example is highly nonlinear. Linear and logistic regression are therefore not sufficient and we have to utilize artificial neural networks.
From our discussion above the next question follows naturally.
Info
What components and properties should a neural network exhibit to solve nonlinear problems?
A neural network must utilize nonlinear activation functions in order to solve nonlinear problems. If for example we used an identity function as our activation function, no matter how many layers our neural network would have, we would only be able to solve linear problems. A sigmoid activation function \dfrac{1}{1+e^{-z}} undefined is nonlinear and is going to be used as an example in this lecture. That being said, there are many more nonlinear activation functions, which often provide much better properties than the sigmoid activation. Additional activation functions are going to be discussed in a separate lecture.
As you have probably already guessed, a nonlinear activation function by itself is not sufficient to solve nonlinear problems. Logistic regression for example produces a linear decision boundary, even though it is based on the sigmoid activation function.
Warning
To deal with nonlinear problems we need a neural network with at least 1 hidden layer.
The below architecture with two inputs, one hidden layer with four neurons and the sigmoid activation function will be utilized to learn to solve the circular problem above.
How many hidden layers you eventually use and how many neurons are going to be used in a particular layer is up to you, but many problems will require a a particualar architecture to be solved efficiently.