#014 3D Face Modeling – Understanding primitives and transformations for image formation (Version A)
Highlights: Hello and welcome to our new post. In this lecture, we will continue gaining a good understanding of the image formation process. In particular, we are going to talk about how we can project a 3D scene onto a 2D image. We will introduce basic primitives and transformations in conventional but also in so-called homogeneous coordinates which play a fundamental role in describing objects and projections in 3D computer vision. Once we have an understanding of this process we can start developing mathematical models. After that, we will perform some of these mathematical operations in Python.
In this post, we are going to review the YouTube video “Computer Vision – Lecture 2.1 (Image Formation: Primitives and Transformations)”[1]. So let’s begin!
Primitives and Transformations
Geometric primitives are the basic building blocks used to describe 3D shapes. Here, we will particularly focus on points, lines, and planes which are the most fundamental geometric building blocks. Furthermore, we will discuss the most basic transformations both in 2D and 3D space.
2D points
Let’s start with 2D points that can be written in conventional or so-called inhomogeneous coordinates as a vector where we have two real-valued scalar numbers that describe the two coordinates. For instance, the two coordinates in the image domain.
$$ \mathbf{x}=\left(\begin{array}{l} x \\y\end{array}\right) \in \mathbb{R}^2 $$
Now, we’re going to introduce now an equivalent representation in so-called homogeneous coordinates. Here, we extend the dimensionality of the vector space by one. For 2D points that means we have a 3D space but actually, this space has a certain meaning. It means that we will remove the element at zero and call this projective space. This is a very important space that we’re going to work a lot with during this part of the post.
$$ \tilde{\mathbf{x}}=\left(\begin{array}{c} \tilde{x} \\\tilde{y} \\\tilde{w}\end{array}\right) \in \mathbb{P}^2 $$
Where \(\mathbb{P}^2=\mathbb{R}^3 \backslash\{(0,0,0)\} \) is called projective space.
As you can see in the equation above, we have augmented this two-dimensional vector with a third coordinate that is called \(w \). We have also introduced the tilde symbol. We use this symbol to explicitly denote that this particular coordinate is to be integrated as a homogeneous coordinate and to distinguish this from a conventional inhomogeneous coordinate. So, whenever we see a symbol that carries a tilde on top we would know that this is interpreted as a homogeneous coordinate.
Homogeneous vectors that differ only by scale are considered equivalent. That’s why this projective space is effectively a 2D space because we define an equivalence class by all vectors that are related through a scalar operation. For example, assuming that there’s a vector \(\vec{a} = (1,1,1) \) then the vector \(\vec{b} = (2,2,2) \) is considered equivalent. That means that homogeneous vectors are defined only up to scale. In other words, all of the vectors that are related just by a different scaling are considered to be equivalent.
So, why do we introduce such a strange construction? Well, we’re going to answer that question in detail later in the post. For now, remember that this is very beneficial because it allows us to:
- Express points at infinity
- Express intersections of parallel lines
- Express transformations very easily as concatenations of multiple transformations
- Mathematically formalize perspective projections that are fundamental for 3D computer vision as linear operations
Now, let’s see how can we convert between an inhomogeneous vector and a homogeneous vector. By convention, this is done by simply adding a number one. So, if there is a number one at the last element of the vector, then the first two coordinates are considered the inhomogeneous vector of a homogeneous vector. It can be done as follows. Let’s have a look.
$$ \tilde{\mathbf{x}}=\left(\begin{array}{c} \tilde{x} \\\tilde{y}\\\tilde{w}\end{array}\right)=\left(\begin{array}{l}x \\y \\1\end{array}\right)=\left(\begin{array}{l}\mathbf{x} \\1\end{array}\right)=\overline{\mathbf{x}} $$
So, here we have an inhomogeneous vector \(x \). We concatenate number one to this two-dimensional vector to obtain a three-dimensional vector. Now, this \(\tilde{\mathbf{x}} \) is going to be a homogeneous vector where the first two coordinates are the inhomogeneous part.
Remember that we call all vectors that have a number one at the end, the augmented vectors. These vectors are denoted with another symbol which is \(\bar{x} \). This means that we have a homogeneous 3D vector that is one particular element of this equivalence class which is exactly the element that has a one as the last entry. Note, that there’s only one such element.
So, this is how we define the relationship between the inhomogeneous vector and the homogeneous vector.
This also allows us to convert in the opposite direction. As you might have already realized, to do that we simply have to divide by the last element of the homogeneous vector \(\tilde{w} \). Because, if we do so, the last element of the vector turns into a one. So, we can read off the inhomogeneous vector from the first elements of the vector. Let’s have a look at the following equation.
$$ \overline{\mathbf{x}}=\left(\begin{array}{l}\mathbf{x} \\1\end{array}\right)=\left(\begin{array}{l}x\\y \\1\end{array}\right)=\frac{1}{\tilde{w}} \tilde{\mathbf{x}}=\frac{1}{\tilde{w}}\left(\begin{array}{c}\tilde{x} \\\tilde{y} \\\tilde{w}\end{array}\right)=\left(\begin{array}{c}\tilde{x} / \tilde{w} \\\tilde{y} / \tilde{w} \\1\end{array}\right) $$
So, we see the relationship between the homogeneous, the inhomogeneous, and the augmented vector, all in one equation.
There’s one homogeneous vector that has a special meaning and that’s the homogeneous vector where the last element \(\tilde{w} \) is equal to zero. Such points are called ideal points or points at infinity. These points can’t be represented with inhomogeneous coordinates because if we have \(\tilde{w} =0 \), we would have to divide by zero which would lead to infinity. However, this allows us to very conveniently express points that are located at infinity even without having to use the infinity symbol. To do that we can simply have a homogeneous vector where the last element is equal to zero and the point that corresponds to that homogeneous vector is a point that lies at infinity.
This can be illustrated in the following way.
Here’s a visual illustration of the relationship between homogeneous, inhomogeneous, and augmented vectors. We can see the homogeneous coordinate system painted in orange, and we have the homogeneous vector that’s represented in that coordinate system. Also, we have this \(x \), \(y \) plane here that is in the homogeneous coordinate system but translated to the location where \(w=1 \).
Now, if we divide this homogeneous vector by \(\tilde{w} \), we obtain this point that is located at the intersection of the line that connects this 3D point \(\tilde{x} \) and the homogeneous coordinate system, and the plane that defines the inhomogeneous coordinates. At that point, we have the augmented vector with the last element of one.
Notice, this projection resembles a lot of the perspective projection process that we have discussed in this post.
2D lines
2D lines can also be expressed using homogeneous coordinates. For that, we can use the following formula.
$$ \tilde{\mathbf{l}}=(a, b, c)^{\top} $$
So, here we have a line \(\tilde{\mathbf{l}} \) which is equal to 3D vector \((a, b, c) \). If we multiply that 3 vector as an inner product with an augmented vector \(\overline{\mathbf{x}} \) we obtain the following expression:
$$ \left\{\overline{\mathbf{x}} \mid \tilde{\mathbf{l}}^{\top} \overline{\mathbf{x}}=0\right\} \quad \Leftrightarrow \quad\{x, y \mid a x+b y+c=0\} $$
We can recognize this expression above as the line equation. It states that if this equation is equal to zero, all points that satisfy this constraint are located on the line. So, we can write this in this way, but we can also simply write this in terms of an inner product between the homogeneous line vector and the homogeneous point which in this case is in an augmented vector.
We can also normalize \(\tilde{\mathbf{l}} \) so that \(\tilde{\mathbf{l}}=\left(n_x, n_y, d\right)^{\top}=(\mathbf{n}, d)^{\top} \) with \(\|\mathbf{n}\|_2=1 \). In this case, \(\mathbf{n} \) is the normal vector perpendicular to the line, and \(\bar{d} \) is its distance to the origin.
An exception is the line at infinity \(\tilde{l}_{\infty}=(0,0,1)^{\top} \) which passes through all ideal points.
To describe some of the properties of the line, we will introduce the following cross-product expressed as the product of a skew-symmetric matrix and a vector
$$ \mathbf{a} \times \mathbf{b}=[\mathbf{a}]_{\times} \mathbf{b}=\left[\begin{array}{ccc} 0 & -a_3 & a_2 \\ a_3 & 0 & -a_1 \\ -a_2 & a_1 & 0\end{array}\right]\left(\begin{array}{l}b_1 \\b_2 \\b_3 \end{array}\right)=\left(\begin{array}{l}a_2 b_3-a_3 b_2 \\a_3 b_1-a_1 b_3 \\a_1 b_2-a_2 b_1 \end{array}\right) $$
In homogeneous coordinates, the intersection of two lines is given by the following equation:
$$ \tilde{\mathbf{x}}=\tilde{\mathbf{l}}_1 \times \tilde{\mathbf{l}}_2 $$
Similarly, the line joining two points can be written as:
$$ \tilde{\mathbf{l}}=\overline{\mathbf{x}}_1 \times \overline{\mathbf{x}}_2 $$
Let’s have a look at the following illustration. Here we have two lines. One line is characterized by the equation \(y=1 \), and another line is characterized by the equation \(x=2 \). The line vector for the first line is equal to \((0,1,-1) \) because if we multiply this with the augmented vector we obtain \(y-1=0 \) or \(y =1 \). Similarly, the line vector for the second line is \((1,0,-2) \) because we want to express \(x = 2 \). Now, we can take these two line vectors and compute the cross product which can be also written as this product of this skew-symmetric matrix, and the second line vector. This skew-symmetric matrix is simply specified as this cross-product matrix that you can see in the image below.
Next, if we multiply these two terms together and we will obtain an augmented vector \((2,1,1) \). Now, if we compare our result with the illustration above we can see that this is indeed the correct intersection of these two lines.
2D Conics
We can also represent more complex algebraic objects using homogeneous equations. For instance, if we go from the linear equations that we had discussed so far to, for example, quadratic equations we can express conic sections. Let’s have a look at the following example.
So, here we can see that conic sections can be expressed using a quadratic equation. In other words, the solution to this quadratic expression is a conic section which is a section of a cone with a plane. Depending on how we orient that plane which is defined by the \(Q \) matrix, we get either a simple circle, ellipse, parabola space, or a hyperbola in this 2D space.
Now let’s move on from 2D points to 3D points.
3D points
3D points can also be written in inhomogeneous coordinates as:
$$ \mathbf{x}=\left(\begin{array}{l} x \\y \\z\end{array}\right) \in \mathbb{R}^3 $$
or in homogeneous coordinates as:
$$ \tilde{\mathbf{x}}=\left(\begin{array}{c} \tilde{x} \\\tilde{y} \\\tilde{z} \\\tilde{w}\end{array}\right) \in \mathbb{P}^3 $$
with projective space \(\mathbb{P}^3=\mathbb{R}^4 \backslash\{(0,0,0,0)\} \)
3D planes can also be represented as homogeneous coordinates \(\tilde{\mathbf{m}}=(a, b, c, d)^{\top} \).
$$ \left\{\overline{\mathbf{x}} \mid \tilde{\mathbf{m}}^{\top} \overline{\mathbf{x}}=0\right\} \quad \Leftrightarrow \quad\{x, y, z \mid a x+b y+c z+d=0\} $$
Again, we can normalize \(\tilde{\mathbf{m}} \) so that \(\tilde{\mathbf{m}}=\left(n_x, n_y, n_z, d\right)^{\top}=(\mathbf{n}, d)^{\top} \) with \(\|\mathbf{n}\|_2=1 \). In this case, \(\mathbf{n} \) is the normal perpendicular to the plane and \(d \) is its distance to the origin.
An exception is the plane at infinity \(\tilde{\mathbf{m}}=(0,0,0,1)^{\top} \) which passes through all ideal points (= points at infinity) for which \(\tilde{w}=0 \).
Now, let’s have a look at the following illustration.
As you can see, this illustration is similar to the one we showed earlier in the post for 2D points, only now we have one additional dimension. So what we have here is this coordinate system and this plane which is at distance \(d \) from the coordinate system if this vector is normalized to one. Then \(n \) corresponds to the normal of that plane
3D lines
3D lines are less elegant than either 2D lines or 3D planes. One possible representation is to express points on a line as a linear combination of two points \(\mathbf{p} \) and \(\mathbf{q} \) on the line.
$$ \{\mathbf{x} \mid \mathbf{x}=(1-\lambda) \mathbf{p}+\lambda \mathbf{q} \wedge \lambda \in \mathbb{R}\} $$
Now, let’s move on to the transformations.
2D Transformations
In this part, we’re going to discuss 2D and 3D transformations. Let’s start with 2D transformations. The simplest 2D transformation is a translation.
Translation
Assuming that we have a square, here we’re simply translating that square to another location In that two-dimensional space.
Such a translation is simply given by adding a two-dimensional vector \(t \) in inhomogeneous coordinates to all the points that we’d like to translate.
$$ \mathbf{x}^{\prime}=\mathbf{x}+\mathbf{t} \quad \Leftrightarrow \quad \overline{\mathbf{x}}^{\prime}=\left[\begin{array}{cc} \mathbf{I} & \mathbf{t} \\ \mathbf{0}^{\top} & 1 \end{array}\right] \overline{\mathbf{x}} $$
Alternatively, we can write this as this homogeneous expression on the right side, where we have a \(2\times2 \) matrix and we’re converting an augmented vector \(\overline{\mathbf{x}} \) to the new location \(\mathbf{x}^{\prime} \) by multiplying it with the identity matrix \(I \) and adding the translation \(t \).
Using homogeneous representations allows us to chain/invert transformations. It is true not only for the translations but also for the other transformations that we will discuss in a minute.
Also, it is important to note that augmented vectors \(\overline{\mathbf{x}} \) can always be replaced by general homogeneous vectors \(\tilde{\mathbf{x}} \).
Euclidean Transformation
The next transformation that we are going to cover is the so-called euclidean transformation. It is a translation and a rotation. Therefore, it has three degrees of freedom in 2D space and it can be expressed in the inhomogeneous coordinates as well in the homogeneous coordinates. Let’s have a look at the following equation.
$$ \mathbf{x}^{\prime}=\mathbf{R} \mathbf{x}+\mathbf{t} \quad \Leftrightarrow \quad \overline{\mathbf{x}}^{\prime}=\left[\begin{array}{cc} \mathbf{R} & \mathbf{t} \\ \mathbf{0}^{\top} & 1\end{array}\right] \overline{\mathbf{x}} $$
Here, instead of the identity submatrix inside our \(2\times2 \) matrix, we have the rotation component \(R \). Note that \(\mathbf{R} \in S O(2) \) is an orthonormal rotation matrix with \(\mathbf{R R}^{\top}=\mathbf{I} \) and \({det}(\mathbf{R})=1 \)
Similarity Transformation
The next transformation that we are going to cover is the similarity transformation. It is a 2D translation plus 2D rotation plus scale. So it has four degrees of freedom.
$$ \mathbf{x}^{\prime}=s\mathbf{R} \mathbf{x}+\mathbf{t} \quad \Leftrightarrow \quad \overline{\mathbf{x}}^{\prime}=\left[\begin{array}{cc} s\mathbf{R} & \mathbf{t} \\ \mathbf{0}^{\top} & 1\end{array}\right] \overline{\mathbf{x}} $$
As you can see it’s exactly the same expression as before except that now we have a scale \(s \) that’s multiplied with the rotation matrix \(R \).
Affine Transformation
The next transformation is the Affine transformation. Here we have six degrees of freedom. Again, we have exactly the same expression as before except that the rotation matrix is replaced with an arbitrary matrix \(A \).
$$ \mathbf{x}^{\prime}=\mathbf{A} \mathbf{x}+\mathbf{t} \quad \Leftrightarrow \quad \overline{\mathbf{x}}^{\prime}=\left[\begin{array}{cc} \mathbf{A} & \mathbf{t} \\ \mathbf{0}^{\top} & 1\end{array}\right] \overline{\mathbf{x}} $$
As you can see in the image above, here we are not preserving angels anymore. However, we will always preserve 2 parallel lines after the transformation with respect to each other.
Perspective transformation (homography)
Finally, the most general transformation that we can define with such a linear homogeneous equation is the so-called perspective transformation or homography. Now, every point on the square can transform into a different location. That is why we have eight degrees of freedom here. We can also represent this by the coordinates where every of coordinates of the square transforms to.
The homography can be represented by a \(3\times3 \) homogeneous matrix \(\tilde{\mathbf{H}} \). which is a
$$ \tilde{\mathbf{x}}^{\prime}=\tilde{\mathbf{H}} \tilde{\mathbf{x}} $$
Now, the equation above is in terms of a general homogeneous vector. Again, we can obtain the augmented vector by dividing \(\tilde{\mathbf{x}} \) with the last element \(\tilde{w} \).
$$ \overline{\mathbf{x}}=\frac{1}{\tilde{w}} \tilde{\mathbf{x}} $$
As you can see, perspective transformations do not maintain parallel lines anymore. Instead, they preserve straight lines. So. if a line has been straight in the original object before the transformation, then after the transformation is guaranteed to also be a straight line.
Now, let’s see how we can perform some of these operations in Python.
Operations with homogeneous coordinates in Python
Now, it is time for coding. We will calculate the dot and the cross product of two points, find the line that connects these two points, and apply the linear transformations that we covered in the previous paragraph.
First, let’s import the necessary libraries. For this exercise, we will use only numpy
and matplotlib
library.
import numpy as np
import matplotlib.pyplot as plt
Next, let’s define two arbitrary points with homogeneous coordinates.
a = np.array([1,1,1])
b = np.array([2,2,1])
Now, we will calculate the inner and cross-product of these two points.
#dot product
np.dot(a,b)
Output:
5
As we already learned in this post, the cross-product between two points will give us the line joining two points.
#cross product
np.cross(a, b)
Output:
array([-1, 1, 0])
Now, let’s perform the experiment that we already showed earlier in the post in Python.
Here we have two lines. One line is characterized by the equation \(y=1 \), and another line is characterized by the equation \(x=2 \). Our goal is to calculate the intersection of these two lines.
So, first, let’s define our two lines.
# the first line equation is
y = 1
l1 = np.array([0,1,-1])
# thesecond line equation is x=2
l2 = np.array( [1,0,-2])
We can calculate the intersection point between these two lines in two ways. The first way is to simply calculate the cross product between two lines and we can do that in the following way.
point = np.cross(l1, l2)
print(point)
Output:
[-2 -1 -1]
Notice that we have obtained coordinates of \((-2,-1,-1) \) instead of \((2,1,1) \). Don be confused with this result because these two points are equivalent in the homogeneous coordinate system. To get the inhomogeneous coordinates, we can divide them with the third coordinate and we will obtain the correct result.
The second way to calculate the intersection is to use a skew matrix shown in the illustration above.
point = np.matmul ( [[0, -1, -1],[1, 0, 0],[1, 0, 0]] , np.array([1,0,-2]).T )
print(point)
Output:
[2 1 1]
As you can see, using both ways we obtained the same result.
Linear transformations in Python
Now, let’s perform the simple 2D transformation in Python such as translation, euclidean, similarity, and affine transformation.
Now, let’s define 4 points that form a square and perform transformations on them.
# define 4 points that form a square
X = np.array( [ [-1,-1,1], [1,-1,1] , [1,1,1], [-1,1,1]] ).T
print(X)
Output:
[[-1 1 1 -1] [-1 -1 1 1] [ 1 1 1 1]]
Here, we will plot our 4 points.
plt.scatter(X[0,:], X[1,:])
plt.axis('equal')
Now, it is time to transform our 4 points. Let’s start with the simplest transformation which is a translation.
Translation:
To perform translation we need to define the translation vector M
. For that, we need to specify the translation in \(x \) and in \(y \) direction by creating two variables x_t
and y_t
. Then, we will just multiply that matrix with our 4 points using the function np.matmul()
.
# translation vector
x_t= 1
y_t= 1
t = np.array([x_t, y_t])
# 2D Translation via matrix multiplication
M = np.array( [[1., 0., t[0]],[0.,1.,t[1]],[0.,0.,1.]] )
X_new = np.matmul(M, X)
print(X_new)
Output:
[[0. 2. 2. 0.] [0. 0. 2. 2.] [1. 1. 1. 1.]]
plt.scatter(X_new[0,:], X_new[1,:], c='r')
plt.axis('equal')
As you can see, the original points are now shifted for one pixel in the \(x \), as well as in the \(y \) direction.
Rotation:
For rotation we will apply the same process, only now we need to create the rotation matrix R
. For that, we need to specify the angle theta
. The rotation matrix is defined with the following formula.
$$ R=\left[\begin{array}{ccc} \cos \theta & -\sin \theta & 0 \\\sin \theta & \cos \theta & 0 \\0 & 0 & 1\end{array}\right] $$
theta = np.pi / 4
# 2D Translation + Rotation
R = np.array ( [[np.cos(theta), -np.sin(theta), t[0]],[np.sin(theta), np.cos(theta), t[1]],[0,0,1]] )
Same as before, we will multiply our matrix R
with the 4 points, we created earlier and plot the results.
X_new = np.matmul(R, X)
print(X_new)
Output:
(-1.1707106781186547, 2.5849242404917496, -1.1707106781186547, 2.5849242404917496)
plt.scatter(X_new[0,:], X_new[1,:], c='r')
plt.axis('equal')
Similarity transformation:
For the similarity transformation, we will just define the scaling factor s
and multiply this factor with the rotation matrix R
.
# scale, just multiply the R with s
theta = np.pi / 4
S = np.array ( [[np.cos(theta), -np.sin(theta) ], [np.sin(theta), np.cos(theta)]])
s = 2
RS = s * S
R[:2,:2] = RS
X_new = np.matmul(R,X)
plt.scatter(X_new[0,:], X_new[1,:],c='r')
plt.scatter(X[0,:], X[1,:])
plt.axis('equal')
As you can see, our original points, are now rotated and scaled.
Affine transformation:
The final trasformation that we gonig to apply in Python is affine trasformation. Again, the process is very similar to the previous one, only now our affine matrix will contain the arbitrary values. Let’s see how we can do this.
# affine projection
M = np.zeros((3,3))
M[0,0] = 1.5
M[0,1] = 1.
M[1,0] = 0.5
M[1,1] = -1.5
M[2,2] = 1.
X_new = np.matmul(M, X)
plt.scatter(X_new[0,:], X_new[1,:],c='r')
plt.scatter(X[0,:], X[1,:])
plt.axis('equal')
As explained earlier in the post, here we are not preserving angels anymore as we did in the translation, rotation, and similarity transformations. However, we did preserve 2 parallel lines with respect to each other.
Summary
This was quite an interesting post and also an important one. Here, we have learned how to project an object from a 3D scene onto a 2D plane. We discussed the math behind this process introducing the concept of homogeneous coordinates which play a fundamental role in describing objects and projections. We also talked about linear transformations such as translation, rotation, similarity, and affine transformations, and we explained how to perform all of these mathematical operations in Python using only the NumPy library.
References:
[1] – Computer Vision – Lecture 2.1 (Image Formation: Primitives and Transformations) – Youtube channel of the machine learning groups at the University of Tübingen.