Machine Learning

ML Definition

Let's start this section at the very beginning and define the term machine mearning.

Info

"Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed[1] ".

The above definition is inspired by Arthur Samuel, one of the pioneers in the area of artificial intelligence, who coined the term machine learning. While this definition is commonly used, it is not the one that we will rely on. Throughout the deep learning block we will rely on a much more simple, more programming oriented definition of machine learning.

Info

Machine learning is a programming paradigm.

Let's take some time and figure out what that definition actually means.

In simplified terms we can say, that the task of the programmer is to write a function, that can generate desired outputs based on the inputs to the function.

InputsOutputsFunction

For example a programmer might be assigned the task to write a spam filter, where the function would classify the email as spam or ham based on the contents of the email, the email address, the email subject and some additional metadata. It does not matter whether the programmer uses a traditional programming paradigm or machine learning, the result of the task is essentially the same: a function that takes those inputs and produces email classification as the output. The big difference between the classical and the machine learning progamming paradigm is the way that this function is derived.

When programmers apply a traditional programming paradigm to create a spam filter, they will study the problem at hand and look at the inputs of the function. They could for example recognize that the words money, rich and quick are common in spam emails and write the first draft of the the function using a programming language like JavaScript or C++. If the output of the function corresponds to the expectations of the programmers, the job is done. If not, the programmers would keep improving the code of the function until the outputs of the function are satisfactory. For example the programmers might be satisfied, once the produced function is able to classify spam emails with an accuracy of 95%.

The machine learning paradigm is a different approach. While both paradigms produce a function, in machine learing we commonly tend to use the word model instead of function. The programmer still needs to write some parameters of the model explicitly, but the logic of the function is configured in an automated procedure called model training. For that purpose the programmer needs to have access to a datasetthat contains the inputs to the function and the correct desired outputs.

IDAddressSubjectExpected Output
1nigerian.prince@ng-gov.ngHelp Mespam
2marta.smith@gmail.comTax Reportham
............
1000no-reply@info.o2.comNew Contractham

The model takes in the inputs from the dataset (address and subject) and predicts the outputs (spam or ham). Using the difference between the actual outputs and the outputs produced by the model, the logic of the model is adjusted automatically. That procedure keeps repeating in a loop until some metric is met. As the performance of the model is generally expected to improve over time, we also tend to call this procedure learning.

Info

In classical programming and machine learning we try to solve a problem by generating computer functions. In classical programming the programmer codes the logic of that function explicitly. In machine learning the programmer provides the dataset and chooses the algorithm and model parameters that are used to learn the function.

One final question remains: "When do we use machine learning and when do we use classical programming?". Machine learning is usually used when the complexity of the program would get out of hand if we implemented the logic manually. A program that is able to recognize digits is almost impossible to implement by hand. How would you for example implement a program that is able to differentiate between an 8 and a 9? This is an especially hard problem when the location of the numbers is scattered and not centered in the middle of an image. The same problem can be solved relatively straightforward using neural networks, provided we have the necessary data.

ML Categories

Machine learning is often divided into specific categories. Those classifications are ubiquitous nowadays, so knowing at least some basic terminology is a must.

Supervised Learning

As the name supervised learning suggests, there is a human supervisor who labels the input data with the corresponding correct output. The different inputs are called features, while the outputs are called labels or targets.

Let us for example assume that we want to estimate the price of a house based on the location and the size of the house. In that case the location and the size are the features, while the price is the target.

LocationSizePrice
London1001000000
Berlin3080000000
.........

The two common tasks that we try to solve with supervised learning are regression and classification.

In a classification task there is a finite number of classes that the machine learning algorithm needs to determine based on the features. The usual example that is brought up in the machine learning literature is a spam filter. Based on the header and content of the email the algorithm needs to decide whether the email is ham or spam.

In a regression task on the other hand the algorithm produces a continuous number. Predicting the price of the house based on the features of the house is a regression task.

1,000,000$200,000$

In machine learning literature it is preferable to use the term label when we deal with classification tasks and the term target when we deal with regression tasks, but some authors might use those terms interchangebly.

Unsupervised Learning

In unsupervised learning the dataset contains only features and no labels. The overall task is to find some hidden structure in the data. We could for example use the labels in the house pricing dataset and cluster those houses according to the features of the houses. Similar houses would be allocated to the same cluster, while different houses should be in different clusters.

In the below example we divide the houses into two categories based on size and price.

02500005000007500001000000 050100150 Size Price

Semi-supervised Learning

Labeling your dataset is costly, therefore companies and researchers try to get away with labeling as few samples as possible. In semi-supervised learning only a fraction of data has labels, while the rest is unlabeled. The labeled data is used to train the algorithm and to label the remaining data. After that step the whole dataset can be used for training.

Self-supervised Learning

Self-supervised learning can be seen as supervised learning, where the labels are not determined by a human supervisor, but are derived directly from the features of the data.

Let us look for example at the sentence below.

What is your name

We could design a natural language processing task by masking a part of the sentence.

What is your ...

The algorithm needs to learn to predict the masked word, which essentially becomes the label in our task. We feed the model with millions of such examples during the training process. Over time the model gets better and better at this task, which would indicate some knowledge about the structure of the english language. That model and by extension that language can eventually be used in a supervised learning task like sentiment analysis, where you have only limited amount data.

Self supervised learning is not limited to natural language processing. While the original ideas were developed for text, more recently self-supervised learning has also been applied successfully to computer vision.

Reinforcement Learning

Reinforcemnt learning deals with sequential decisions, where an agent interacts with the environment and receives rewards based on its actions.

The cartpole is probably the most well known reinforcement learning task. The agent needs to learn to balance the pole by moving the cart left or right. Each single step the agent succeeds, the agent gets a reward of 1. If the pole gets below a certain angle or the cart moves outside the screen the agent fails and doesn't get any more rewards.

References

  1. Samuel A.L. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development. Vol. 44. pp. 206-226. (1959).