PyTorch Tensors
Literally all modern deep learning libraries are based on a fundamental mathematical object called tensor. We will use this object throughout all remaining chapters of this block, no matter if we implement something as trivial as linear regresssion or a state of the art deep learning architecture.
import torch
According to the PyTorch documentation, torch.Tensor
is a
multi-dimensional matrix containing elements of a single data type. The
method torch.tensor()
is the most straightforward way to create
a tensor. Below for example we create a tensor object with 2 rows and 3 columns.
tensor = torch.tensor([[0, 1, 2], [3, 4, 5]])
print(tensor)
tensor([[0, 1, 2], [3, 4, 5]])
The method has some arguments, that allow us to control the properties of
the tensor: torch.tensor(data, dtype=None, device=None, requires_grad=False)
.
The data
argument is the only required parameter. With this argument
we provide an arraylike structure, like a list, a tuple or a NumPy ndarray, to
construct a tensor.
tensor = torch.tensor(data=[[0, 1, 2], [3, 4, 5]])
The dtype
argument determines the type of the tensor. This
essentially means, that we have to think about in advance what type of data
a tensor is supposed to contain. If we do not specify the type explicitly,
dtype
is going to be torch.int64
, if all of inputs are integers and
it is going to be torch.float32
if even one of the inputs is a
float. Most neural network weights and biases are going to be
torch.float32
, so for the time being those two datatypes are
actually sufficient to get us started. When the need arises, we will cover
more datatypes.
tensor = torch.tensor([[0, 1, 2], [3, 4, 5]], dtype=torch.float32)
print(tensor.dtype)
torch.float32
Tensors can live on different devices, like the cpu, the gpu or tpu and the device
argument allows us to create a tensor on a particular device. If we do not specify
a device, we will use the cpu as the default. For the most part we will be interested
in moving a tensor to the gpu to get better parallelisation. For that we need
to have an Nvidia graphics card. We can test if we have a valid graphics card,
by running
torch.cuda.is_available()
. If the method returns
True
, we are good to go.
# cuda:0 represents the first nvidia device
# theoretically you could have several graphics cards
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
tensor = torch.tensor([[0, 1, 2], [3, 4, 5]], device=device)
The last argument, requires_grad
determines whether the tensor needs
to be included in gradient descent calculations. This will be covered in more
detail in future tutorials.
There are many more methods to create a Tensor. The method torch.from_numpy()
turns a numpy ndarray into a PyTorch tensor, torch.zeros()
returns a Tensor with all zeros and torch.ones()
returns a Tensor
with all ones. We will see more of those methods as we go along. It makes no
sense to cover all of them without any context.
If we need to change the parameters of an already initialized Tensor, we can
do the adjustments in a later step, primarily using the to
method of the Tensor class. The to
method does not overwrite the
original Tensor, but returns an adjusted one.
tensor = torch.tensor([[0, 1, 2], [3, 4, 5]])
print(f'Original Tensor: dtype={tensor.dtype}, device={tensor.device}, requires_grad={tensor.requires_grad}')
tensor = tensor.to(torch.float32)
print(f'Adjusted dtype: dtype={tensor.dtype}, device={tensor.device}, requires_grad={tensor.requires_grad}')
tensor = tensor.to(device)
print(f'Adjusted device: dtype={tensor.dtype}, device={tensor.device}, requires_grad={tensor.requires_grad}')
tensor.requires_grad = True
print(f'Adjusted requres_grad: dtype={tensor.dtype}, device={tensor.device}, requires_grad={tensor.requires_grad}')
Original Tensor: dtype=torch.int64, device=cpu, requires_grad=False Adjusted dtype: dtype=torch.float32, device=cpu, requires_grad=False Adjusted device: dtype=torch.float32, device=cuda:0, requires_grad=False Adjusted requres_grad: dtype=torch.float32, device=cuda:0, requires_grad=True
In practice we are often interested in the shape of a particular tensor. We
can use use my_tensor.size()
or my_tensor.shape
to
find out the dimensions of the tensor.
print(tensor.size())
print(tensor.shape)
torch.Size([2, 3]) torch.Size([2, 3])
PyTorch, like other frameworks that work with tensors, is extremely
efficient when it comes to matrix operations. These operations are done in
parallel and can be transfered to the GPU if you have a cuda compatibale
graphics card. Essentially all of deep learning is based on matrix
operations, so let"s spend some time to learn how we can invoke matrix
operations using Tensor
objects.
We will use two tensors, \mathbf{A} undefined and \mathbf{B} undefined to demonstrate basic mathematical operations.
A = torch.ones(size=(2, 2), dtype=torch.float32)
B = torch.tensor([[1, 2],[3, 4]], dtype=torch.float32)
We can add, subtract, multiply and divide those matrices using basic
mathematic operators like +
, -
, *
,
/
. All those operations work elementwise, so when you multiply
two matrices you won't actually use matrix multiplication that involves dot
products but elementwise multiplication.
print(A + B)
print(A - B)
print(A * B)
print(A / B)
tensor([[2., 3.], [4., 5.]]) tensor([[ 0., -1.], [-2., -3.]]) tensor([[1., 2.], [3., 4.]]) tensor([[1.0000, 0.5000], [0.3333, 0.2500]])
We can achieve the same results using the explicit methods: Tensor.add()
,
Tensor.subtract()
, Tensor.multiply()
,
Tensor.divide().
print(A.add(B))
print(A.subtract(B))
print(A.multiply(B))
print(A.divide(B))
While the above methods do not change the original tensors, each of the
methods has a corresponding method that changes the tensor in place. These
methods always end with a _
: add_()
,
subtract_()
, multiply_()
, divide_()
.
test = torch.tensor([[1, 2], [4, 4]], dtype=torch.float32)
test.add_(A)
# the test tensor was changed
print(test)
tensor([[2., 3.], [5., 5.]])
Probaly one of the most important matrix operations in all of deep learning
is product of two matrices, \mathbf{A \cdot B}
undefined
.
For that purpose we can use the matmul
method.
# Equivalent to torch.matmul(A, B)
A.matmul(B)
tensor([[4., 6.], [4., 6.]])
Alternatively we can use @
as a convenient way to use matrix
multiplication. This is essentially just a shorthand notation for
torch.matmul
.
# Equivalent to torch.matmul(A, B)
A @ B
A final concept that we would like to mention is the concept of dimensions
in PyTorch. Often we would like to calculate some summary statistics (like a
sum or a mean) for a Tensor object. But we would like those to be calculated
for a particular dimension. We can explicitly set the dimension by defining
the dim
parameter.
t = torch.tensor([[1, 2, 3], [4, 5, 6]])
print(t.sum())
print(t.sum(dim=0))
print(t.sum(dim=1))
tensor(21) tensor([5, 7, 9]) tensor([ 6, 15])
The very first sum that we calculate in the example below, does not take any dimensions into consideration and just calculates the sum over the whole tensor. In the second example we calculate the sum over the 0th, the row, dimension. That means that for each of the available columns we calculate the sum by moving down the rows. When we calculate the sum for the 1st, the column dimension, we go over each row and calculate the sum by moving through the columns.