Computer Vision

Computer vision encompasses quite a large variety of tasks but in a nutshell we could say that computer vision tasks try to emulate animal vision. The computer receives images or videos as input and and is expected to processes those in a useful manner.

Broadly speaking many computer vision tasks can be subdivided into 3 major categories. Let's use a very simple dummy example in order to get an intuition for what those categories are: an image of a cirlce.

The goal of an image classification task would be to assign the correct label to an image. The model returns the probabilities for different labels and the label with the highest probability wins out. This is what we have been doing with the MNIST dataset so far.

Object detection models find certain objects in an image and draw a bounding box around that object.

Semantic segmentation models assign each pixel in the image a certain category. Below for example we segment the image into the circle and the background.

All of above mentioned tasks are different in some regards, but usually they all utilize so called convolutional neural networks. We are going to cover those in the upcoming section.