#007 Linear Algebra – Change of basis
Highlight: So far, we have already talked that it is possible to represent the vector using different basis vectors. In this post we will learn how to go from our standard coordinate system \(\left ( x,y \right ) \) into some other bases. Next, we will also learn why this change of basis can be very useful. For now, we will just say that it’s frequently applied in many signal processing and machine learning methods. So, stay with us and let’s roll!
Tutorial Overview:
Different basis vector representation
Let’s think about our vector, represented in a standard \(\left (x, y \right ) \) plane. We learned so far that it is represented with a pair of coordinates. In this example, our coordinates are \(3 \) and \(2 \). The first coordinate \(3 \) means that we have a vector along the \(x \) axis that scales a unit vector with a factor of \(3 \). And upward, where we have \(\hat{j} \) vector, that we scale with the number of \(2 \). So, this pair \(\begin{bmatrix}3\\2\end{bmatrix} \) gives us the resulting vector that we will analyze.
Now, the most important question is whether we can represent this vector in a different way?
Are \(\left ( x,y \right ) \) axes the only axes that allow us to represent this vector?
We will see that many other vectors can be used as the basis vectors.
Imagine now that we have a new pair of vectors \(\vec{b}_{1} \) and \(\vec{b}_{2} \) as shown in the image.
We say that these are different basis vectors and we will use them to represent our vector of interest. Hence, we need to find a way to decompose our original vector in that coordinate system given with \(\vec{b}_{1} \) and \(\vec{b}_{2} \) basis vectors.
Then, our vector will be a linear combination of vectors \(\vec{b}_{1} \) and \(\vec{b}_{2} \). Once we figure out how much we need to scale vectors \(\vec{b}_{1} \) and \(\vec{b}_{2} \) we will get a needed linear combination of our original vector.
Next, you can ask yourself: well it’s okay for this one vector that we are analyzing, but can we do this for any other vector?
Well, we have already talked that if \(\vec{b}_{1} \) and \(\vec{b}_{2} \) are linear combinations, and for every possible scalar value that multiplies both \(\vec{b}_{1} \) and \(\vec{b}_{2} \) we get a span. The ideal scenario happens when span of new vectors is an infinite 2-D plane. That is, if the span is the same plane as the one defined for a coordinate system \(\left ( x,y \right ) \). Then, we would be able to reconstruct any vector in that span. If two vectors are linearly dependent … weeelll… ?
Let’s have a look at one numerical example. In our new coordinate system we need to scale \(\vec{b}_{1} \) with the scalar \(5/3 \) and \(\vec{b}_{2} \) with the scalar \(1/3 \) in order to reconstruct our original vector.
Let’s have a look at another example.
Here, we have a vector and in the picture \(\vec{b}_{1} \) is scaled with a scalar \(-1 \) and \(\vec{b}_{2} \) is scaled with a scalar \(2 \). This will give us the following (yellow) vector. The coordinates of this vector in our new coordinate system will be \(\begin{bmatrix}-1\\2\end{bmatrix} \).
Note how the overall grid that we used to have in a \(\left ( x,y \right ) \) coordinate system is rotated and transformed. However, the meaning and interpretation remains the same. One vector along \(\vec{b}_{1} \) is scaled by \(-1 \) and along the vector \(\vec{b}_{2} \) is scaled in this direction where the vector \(\vec{b}_{2} \) is pointing.
So, one important concept when we speak about transforming different coordinate systems is that actually we can view it as a simple linear transformation of our original coordinate system. And, this is one example where we have a vector \(\vec{b}_{1} \) and \(\vec{b}_{2} \). Once again, the grid lines will play the same purpose as they used to do in regular \(\left ( x,y \right ) \) plane.
In addition, we should not forget that the coordinate center should be the same for all vectors. We can see this in the image below, and the origin will remain the same. This means that any coordinate system that we can create has to obey the rule that the origin is always centered at the same point. We also say that this \(\begin{bmatrix}0\\0\end{bmatrix} \) vector is a vector that we obtain when any vector we multiply with a \(0 \) scalar. Therefore, it has to be the same for all coordinate systems that we are going to analyze within the scope of linear algebra.
How to translate between different coordinate systems?
In the following image we can see an alternative basis for one coordinate system and those are basis vectors \(\vec{b}_{1} \) and \(\vec{b}_{2} \). \(\vec{b}_{1} \) in our original \(\left ( x,y \right ) \) coordinate system has the following coordinates: \(2 \) along \(x \) and \(1 \) along \(y \), whereas \(\vec{b}_{2} \) has \(\begin{bmatrix}-1\\1\end{bmatrix} \) coordinates in \(\left ( x,y \right ) \) coordinate system. In addition, an arbitrary vector (represented in yellow) have coordinates \(\begin{bmatrix}-4\\1\end{bmatrix} \) in the original \(\left ( x,y \right ) \) plane. On the other hand, in this different alternative coordinate system it is represented with coordinates \(-1 \) because that’s how much we have to scale vector \(\vec{b}_{1} \) and it’s scaled with \(2 \) along \(\vec{b}_{2} \), cause that’s how we much we have to scale our \(\vec{b}_{2} \) vector.
So, the question is: how can we define a function to go from \(\begin{bmatrix}-1\\2\end{bmatrix} \) back to \(\begin{bmatrix}-4\\1\end{bmatrix} \) and vice versa.
We will now focus how to do this. If one vector is represented in an alternative coordinate system with coordinates \(\begin{bmatrix}-1\\2\end{bmatrix} \), then we need to calculate the coordinates of the basis vectors as the next step.
In other words, we see that a vector \(\vec{b}_{1} \) will be \(\begin{bmatrix}2\\1\end{bmatrix} \) and \(\vec{b}_{2} \) will be \(\begin{bmatrix}-1\\1\end{bmatrix} \). Now, we can just multiply these basis vector coordinates with the appropriate coordinates and what we get is actually the needed result. This result represents the coordinates in our original coordinate system \(\left ( x,y \right ) \) from which we started. This is shown in following calculations and depicted in the image below.
One interesting thing is to view these transformation steps as a matrix-vector multiplication. It is shown with this matrix multiplying the vector \(\begin{bmatrix}-1\\2\end{bmatrix} \). The columns of the \(2\times 2 \) matrix are just vector basis vectors represented in the original coordinate system. So, for \(\vec{b}_{1} \) we have \(\begin{bmatrix}2\\1\end{bmatrix} \) and for \(\vec{b}_{2} \) we have \(\begin{bmatrix}-1\\1\end{bmatrix} \).
This matrix-vector multiplication gives us an idea how we can actually connect two coordinate systems. So, if we start from the original system \(\left ( x,y \right ) \), when we transform it, we will obtain that it’s transformed vector basis have coordinates \(\begin{bmatrix}2\\1\end{bmatrix} \) and \(\begin{bmatrix}-1\\1\end{bmatrix} \). So, some arbitrary vector that we have in our original coordinate system with coordinates \(\begin{bmatrix}-1\\2\end{bmatrix} \) will be able to transform (map) as well. If we follow the same linear transformation this vector will be mapped into the yellow vector in the alternative coordinate system \(\left ( \vec{b}{1},\vec{b}{2} \right ) \) we may say that a vector from \(\left ( x,y \right ) \) is literally transformed into the output vector. So, the new, transformed vector will have the same coordinates, but in a new coordinates system. This is something that we have already covered in our previous post.
Another question that we can now ask is: how to go from some arbitrary coordinate system to our original coordinate system? Well, we need to apply the so-called inverse matrix. Basically, in order to go from this coordinate system to our original \(\left ( x,y \right ) \) coordinate system we need to apply the inverse transform. This inverse transform is obtained when from the original \(2\times 2 \) transformation matrix we find an inverse matrix.
Summary
Changing basis in linear algebra and machine learning is frequently used. Quite often, these transformations can be difficult to fully understand for practitioners, as the necessary linear algebra concepts are quickly forgotten. Hence, this post will prove to have an impact into future, fully understanding of concepts such as singular value decomposition and principal component analysis. In the next post, we will talk about very very important concept: eigenvectors and eigenvalues.