PyTorch Tensors

Literally all modern deep learning libraries are based on a fundamental mathematical object called tensor. We will use this object throughout all remaining chapters of this block, no matter if we implement something as trivial as linear regresssion or a state of the art deep learning architecture.

import torch

According to the PyTorch documentation, torch.Tensor is a multi-dimensional matrix containing elements of a single data type. The method torch.tensor() is the most straightforward way to create a tensor. Below for example we create a tensor object with 2 rows and 3 columns.

tensor = torch.tensor([[0, 1, 2], [3, 4, 5]]) 
print(tensor)

tensor([[0, 1, 2],
        [3, 4, 5]])

The method has some arguments, that allow us to control the properties of the tensor: torch.tensor(data, dtype=None, device=None, requires_grad=False).

The data argument is the only required parameter. With this argument we provide an arraylike structure, like a list, a tuple or a NumPy ndarray, to construct a tensor.

tensor = torch.tensor(data=[[0, 1, 2], [3, 4, 5]])

The dtype argument determines the type of the tensor. This essentially means, that we have to think about in advance what type of data a tensor is supposed to contain. If we do not specify the type explicitly, dtype is going to be torch.int64, if all of inputs are integers and it is going to be torch.float32 if even one of the inputs is a float. Most neural network weights and biases are going to be torch.float32, so for the time being those two datatypes are actually sufficient to get us started. When the need arises, we will cover more datatypes.

tensor = torch.tensor([[0, 1, 2], [3, 4, 5]], dtype=torch.float32) 
print(tensor.dtype)

torch.float32

Tensors can live on different devices, like the cpu, the gpu or tpu and the device argument allows us to create a tensor on a particular device. If we do not specify a device, we will use the cpu as the default. For the most part we will be interested in moving a tensor to the gpu to get better parallelisation. For that we need to have an Nvidia graphics card. We can test if we have a valid graphics card, by running torch.cuda.is_available(). If the method returns True, we are good to go.

# cuda:0 represents the first nvidia device
# theoretically you could have several graphics cards
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
tensor = torch.tensor([[0, 1, 2], [3, 4, 5]], device=device)

The last argument, requires_grad determines whether the tensor needs to be included in gradient descent calculations. This will be covered in more detail in future tutorials.

There are many more methods to create a Tensor. The method torch.from_numpy() turns a numpy ndarray into a PyTorch tensor, torch.zeros() returns a Tensor with all zeros and torch.ones() returns a Tensor with all ones. We will see more of those methods as we go along. It makes no sense to cover all of them without any context.

If we need to change the parameters of an already initialized Tensor, we can do the adjustments in a later step, primarily using the to method of the Tensor class. The to method does not overwrite the original Tensor, but returns an adjusted one.

tensor = torch.tensor([[0, 1, 2], [3, 4, 5]])
print(f'Original Tensor: dtype={tensor.dtype}, device={tensor.device}, requires_grad={tensor.requires_grad}')
tensor = tensor.to(torch.float32)
print(f'Adjusted dtype: dtype={tensor.dtype}, device={tensor.device}, requires_grad={tensor.requires_grad}')
tensor = tensor.to(device)
print(f'Adjusted device: dtype={tensor.dtype}, device={tensor.device}, requires_grad={tensor.requires_grad}')
tensor.requires_grad = True
print(f'Adjusted requres_grad: dtype={tensor.dtype}, device={tensor.device}, requires_grad={tensor.requires_grad}')

Original Tensor: dtype=torch.int64, device=cpu, requires_grad=False
Adjusted dtype: dtype=torch.float32, device=cpu, requires_grad=False
Adjusted device: dtype=torch.float32, device=cuda:0, requires_grad=False
Adjusted requres_grad: dtype=torch.float32, device=cuda:0, requires_grad=True

In practice we are often interested in the shape of a particular tensor. We can use use my_tensor.size() or my_tensor.shape to find out the dimensions of the tensor.

print(tensor.size())
print(tensor.shape)

torch.Size([2, 3])
torch.Size([2, 3])

PyTorch, like other frameworks that work with tensors, is extremely efficient when it comes to matrix operations. These operations are done in parallel and can be transfered to the GPU if you have a cuda compatibale graphics card. Essentially all of deep learning is based on matrix operations, so let"s spend some time to learn how we can invoke matrix operations using Tensor objects.

We will use two tensors, \mathbf{A} undefined and \mathbf{B} undefined to demonstrate basic mathematical operations.

A = torch.ones(size=(2, 2), dtype=torch.float32)
B = torch.tensor([[1, 2],[3, 4]], dtype=torch.float32)

We can add, subtract, multiply and divide those matrices using basic mathematic operators like +, -, *, /. All those operations work elementwise, so when you multiply two matrices you won't actually use matrix multiplication that involves dot products but elementwise multiplication.

print(A + B)
print(A - B)
print(A * B)
print(A / B)

  tensor([[2., 3.],
          [4., 5.]])
  tensor([[ 0., -1.],
          [-2., -3.]])
  tensor([[1., 2.],
          [3., 4.]])
  tensor([[1.0000, 0.5000],
          [0.3333, 0.2500]])

We can achieve the same results using the explicit methods: Tensor.add(), Tensor.subtract(), Tensor.multiply(), Tensor.divide().

print(A.add(B))
print(A.subtract(B))
print(A.multiply(B))
print(A.divide(B))

While the above methods do not change the original tensors, each of the methods has a corresponding method that changes the tensor in place. These methods always end with a _: add_(), subtract_(), multiply_(), divide_().

test = torch.tensor([[1, 2], [4, 4]], dtype=torch.float32)
test.add_(A)
# the test tensor was changed
print(test)

tensor([[2., 3.],
        [5., 5.]])

Probaly one of the most important matrix operations in all of deep learning is product of two matrices, \mathbf{A \cdot B} undefined . For that purpose we can use the matmul method.

# Equivalent to torch.matmul(A, B)
A.matmul(B)

tensor([[4., 6.],
        [4., 6.]])

Alternatively we can use @ as a convenient way to use matrix multiplication. This is essentially just a shorthand notation for torch.matmul.

# Equivalent to torch.matmul(A, B)
A @ B

A final concept that we would like to mention is the concept of dimensions in PyTorch. Often we would like to calculate some summary statistics (like a sum or a mean) for a Tensor object. But we would like those to be calculated for a particular dimension. We can explicitly set the dimension by defining the dim parameter.

t = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(t.sum())
print(t.sum(dim=0))
print(t.sum(dim=1))

tensor(21)
tensor([5, 7, 9])
tensor([ 6, 15])

The very first sum that we calculate in the example below, does not take any dimensions into consideration and just calculates the sum over the whole tensor. In the second example we calculate the sum over the 0th, the row, dimension. That means that for each of the available columns we calculate the sum by moving down the rows. When we calculate the sum for the 1st, the column dimension, we go over each row and calculate the sum by moving through the columns.