Mathematical Notation
Under the hood machine learning is based on math. The math is relatively simple and you are not required to know more than basic linear algebra, calculus and probability theory. Skills that you can improve as you progress in this block.
The complexity and confusion often arises not from the math itself, but from mathematical notation. Unfortunately this notation is not always consistent from paper to paper, from book to book and from author to author, so you really need to make sure you understand the intentions of the authors.
Generally speaking we will utilze the notation that reduces the friction between math and computational implementation, in other words once you understand the math you should be able to understand the code automatically and vice versa.
We are not going to completely reinvent the wheel, but instead utilize already established notation in books and courses. Mostly we will use the notation used by Sebastian Raschka in his books[1] and courses. The book by Sebastian Raschka is in our self-study recommendation list and should thereby create some familiarity.
To clarify the mathematical notation we are going to use the example of a dataset that contains the features of a house and the corresponding target (price).
Distance to City Centre | Size | Price |
---|---|---|
20 | 100 | 2000000 |
30 | 30 | 800000 |
40 | 40 | 700000 |
30 | 60 | 1700000 |
We start with the representation of the feature dataset \mathbf{X} \in \mathbb{R}^{n\times m} undefined . The dataset is an n\times m undefined Matrix with n undefined number of samples (rows) and m undefined number of features (columns).
\mathbf{X} = \begin{bmatrix} x^{(1)}_1 & x^{(1)}_2 & x^{(1)}_3 & \cdots & x^{(1)}_m \\ x^{(2)}_1 & x^{(2)}_2 & x^{(2)}_3 & \cdots & x^{(2)}_m \\ x^{(3)}_1 & x^{(3)}_2 & x^{(3)}_3 & \cdots & x^{(3)}_m \\ \vdots & \vdots & \vdots & \cdots & \vdots & \\ x^{(n)}_1 & x^{(n)}_2 & x^{(n)}_3 & \cdots & x^{(n)}_m \\ \end {bmatrix} undefinedThe superscript (i) undefined is used to display individual samples in the dataset and the subscript j undefined is used to display the features in the dataset. For example x^{(1)}_5 undefined references the 5th feature in the first sample of the dataset.
If we apply this notation to the houseprice dataset we get the 4 \times 2 undefined representation for the feature matrix.
\mathbf{X} = \begin{bmatrix} 20^{(1)}_1 & 100^{(1)}_2 \\ 30^{(2)}_1 & 40^{(2)}_2 \\ 40^{(3)}_1 & 40^{(3)}_2 \\ 30^{(4)}_1 & 60^{(4)}_2 \\ \end {bmatrix} undefinedOften we want to show calculations for one single sample in the dataset. For purpose that we use row vectors, represented by bold, lowercase letter \mathbf{x} undefined .
\mathbf{x} = \begin{bmatrix} x_1 & x_2 & x_3 & \cdots & x_m \end {bmatrix} undefinedThe labels are represented by a column vecor \mathbf{y} undefined of size n undefined .
\mathbf{y} = \begin{bmatrix} y^{(1)} \\ y^{(2)} \\ y^{(3)} \\ \vdots \\ y^{(n)} \\ \end {bmatrix} undefinedIn the house dataset the price is the label represented by the vector \mathbf{y} undefined of dimension 4 undefined .
\mathbf{y} = \begin{bmatrix} 2000000^{(1)} \\ 800000^{(2)} \\ 700000^{(3)} \\ 1700000^{(4)} \\ \end {bmatrix} undefinedThe weight vector that is used to scale the input vector \mathbf{x} undefined is represented by the bold letter \mathbf{w} undefined .
\mathbf{w} = \begin{bmatrix} w_1 & w_2 & w_3 & \cdots & w_m \end {bmatrix} undefinedInfo
We do not expect you to learn the above notation by heart. The intention is to have a reference that can be used to return to, if this becomes necessary. Nor do we assume the notation to be complete. The notation above is the foundation that is fundamental to all machine learning topics. Each section will enhance the notation to suit the needs of the section.