#013 3D Face Modeling – Creating a 3D model of a cube from one image

#013 3D Face Modeling – Creating a 3D model of a cube from one image

Highlights: Hello, and welcome. In this post, we’re going to talk about how we can create a 3D model of a cube with only one image as input. The way we are going to achieve this is the same as we explained previously for the 3D face model. So let us begin!

Open In Colab

1. Creating 3D cubes

The first step we need to do is to create different 3D cubes to replicate a database of 3D scans. We will create \(n\) number of normal cubes (they will have the same length of the sides just different scaling between the cubes), \(n\) number of long cubes (stretched cubes), and \(n\) number of short cubes (squished cubes).

Let’s start by importing the necessary libraries.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
import cv2

We also need to define some parameters that will later be used in the code.

n = 100 # Number of 3D cubes to be generated
RED = (0, 0, 255) # RED COLOR
scale = 50        # Scale the cube on the cv2 canvas
BLACK = (0, 0, 0) # BLACK COLOR

WIDTH = 1920      # Width of the canvas
HEIGHT = 1080     # Height of the canvas
circle_pos = [WIDTH//2, HEIGHT//2] # Center of canvas

First, we create the average cubes. These cubes should have all the same lengths and the same scaling factor. This can be achieved by creating a numpy array with 8 elements that represent 8 corners of the cube, with 3 dimensions, \(x, y, z\). These coordinates are later multiplied with the scaling factor which is randomly drawn from a uniform distribution (a number from 2 to 5).

average_cubes = np.zeros((n, 8, 3))
for i in range(n):
    delta_shape = np.random.uniform(2, 5)
    points = np.array([ [-1, -1, 1],
                        [1, -1, 1],
                        [1, 1, 1],
                        [-1, 1, 1],
                        [-1, -1, -1],
                        [1, -1, -1],
                        [1, 1, -1],
                        [-1, 1, -1]], dtype=np.float64)
    points *= delta_shape
    average_cubes[i] = points

Now, we move on to the second type of cube, the long cube, which appears stretched. For these cubes, the \(z\) coordinate will be bigger and the two other coordinates will have smaller values.

# Create long cubes | Stretched out
long_cubes = np.zeros((n, 8, 3))
for i in range(n):
    delta_shape = np.random.uniform(2, 5)
    points = np.array([ [-0.5, -0.5, 2],
                        [0.5, -0.5, 2],
                        [0.5, 0.5, 2],
                        [-0.5, 0.5, 2],
                        [-0.5, -0.5, -2],
                        [0.5, -0.5, -2],
                        [0.5, 0.5, -2],
                        [-0.5, 0.5, -2]], dtype=np.float64)

    points *= delta_shape
    long_cubes[i] = points

And lastly, the third type is the short or squished cube. For these, the \(z\) coordinate will have a smaller value than the two other coordinates.

# Create short cubes | Squished
short_cubes = np.zeros((n, 8, 3))
for i in range(n):
    delta_shape = np.random.uniform(2, 5)
    points = np.array([ [-1, -1, 0.5],
                        [1, -1, 0.5],
                        [1, 1, 0.5],
                        [-1, 1, 0.5],
                        [-1, -1, -0.5],
                        [1, -1, -0.5],
                        [1, 1, -0.5],
                        [-1, 1, -0.5]], dtype=np.float64)

    points *= delta_shape
    short_cubes[i] = points

After we have all our cubes generated, we will combine them in a flattened vector which will end up with a shape of \(n \times 24\).

# Concatenate them to form a single vector of size n x 24
X = np.concatenate([average_cubes.reshape(
    n, -1), long_cubes.reshape(n, -1), short_cubes.reshape(n, -1)], 0)

We create a PCA object that will be used to perform dimensionality reduction. We will fit it onto the X dataset that we just obtained.

# Create a PCA object and fit on the data
pca = PCA(n_components=24)
pca.fit(X)

To visualize and understand how many components we should take from the PCA, we will draw out the explained_variance_ratio_ which shows us how much data we will preserve with how many components.

# Understanding how many components we should keep
plt.figure()
plt.bar(range(1, len(pca.explained_variance_ratio_)+1),
        pca.explained_variance_ratio_)
plt.ylabel('Explained variance')
plt.xlabel('Components')
plt.plot( range(1, len(pca.explained_variance_ratio_)+1),
            np.cumsum(pca.explained_variance_ratio_),
            c='red',
            label="Cumulative Explained Variance")

plt.legend(loc='upper left')
plt.show()

We will also visualize these PCA components and see the shape and how they are able to reconstruct 99% of the cubes.

# Understanding the PCA components
fig = plt.figure(figsize=(6, 6))
ax = fig.add_subplot(projection='3d')
for i in range(2):
    comp1_x = pca.components_[i, 0:24:3]
    comp1_y = pca.components_[i, 1:24:3]
    comp1_z = pca.components_[i, 2:24:3]
    ax.scatter(comp1_x, comp1_y, comp1_z, s=100, label=f"{i}-th component")
    ax.legend(f"{i}")

ax.set_xlim([-1, 1])
ax.set_xlim([-1, 1])
ax.set_zlim([-1, 1])

ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.set_zlabel("Z")
ax.legend()
plt.show()

Looking at the image above, we can see that with only 2 components we can reconstruct 99% of the data.

Looking at the plot above we can see that the first component is stretched in the \(z\) axis and squished in the \(x\) axis. The second component is squashed in the \(z\) axis making it more stretched in the \(x, y\) axes. This is just showing us that the PCA has learned our dataset and is able to reconstruct it with only these two components.

You may be wondering now, what do we need this for if we want to create a new 3D model from an image that was never in the \(n\) dataset previously created?

We will need to project our 3D cube into a 2D image and find distinct points on the image, the corners, which will be the ground truth. To create the 3D model we need to find 3 parameters: \(\alpha_1, \alpha_2, \theta\). The two parameters, \(\alpha_1\) and \(\alpha_2\), are multiplied with our principal components and it gives us a 3D model. Using the parameter \(\theta\) we will rotate our 3D model. The 3D model is later projected onto the 2D image and corners are detected again to check if the ground truth corners are overlapping with the newly projected corners. For a detailed explanation of PCA, check out this post.

Let us go step by step. First, we define a simple projection matrix. Also, for later visualization purposes, we will define a list in which we will store where each of the corners is projected on the 2D image.

projection_matrix = np.matrix([
    [1, 0, 0],
    [0, 1, 0]
])

projected_points = [
    [n, n] for n in range(len(points))
]

def connect_points(screen, i, j, points):
    cv2.line(screen, (points[i][0], points[i][1]),
            (points[j][0], points[j][1]), BLACK, 1)
    return screen

Usually, 3D datasets that are available online will have a certain degree of rotation. Because we are generating our own dataset, we will rotate the cubes by a random number \(\theta\). Later in the code, we will be using the gradient descent algorithm to optimize the parameter theta for each cube to reconstruct the cube as well as it can.

angle_x = angle_y = angle_z = 0
angle_x += 0.0
angle_y += 0.0
angle_z += np.random.uniform(-0.5, 0.5)

rotation_z = np.matrix([
    [np.cos(angle_z), -np.sin(angle_z), 0],
    [np.sin(angle_z),  np.cos(angle_z), 0],
    [0,                 0,        1],
])

rotation_y = np.matrix([
    [np.cos(angle_y), 0, np.sin(angle_y)],
    [0,         1,        0],
    [-np.sin(angle_y), 0, np.cos(angle_y)],
])

rotation_x = np.matrix([
    [1,       0,                0],
    [0, np.cos(angle_x), -np.sin(angle_x)],
    [0, np.sin(angle_x),  np.cos(angle_x)],
])

Next, we create a blank canvas for the cube, rotate it, and project it into a 2D image. For that, we use the commands cv2.fillPoly and cv2.polylines to draw lines and color the cube’s sides for easier corner detection. Lastly, we use the cv2.goodFeaturesToTrack algorithm to detect the four corners of our cube. We are only detecting four because we are doing the experiment with the view of the top side of the cube. Hence only four corners and one side of the cube are visible.

front_side = [4, 5, 6, 7]
top_side = [4, 0, 1, 5]
right_side = [5, 6, 2, 1]
screen = np.ones((HEIGHT, WIDTH, 3), dtype=np.uint8) * 255

i = 0
points = X[0].reshape(8, 3)
for point in points:
    rotated3d = np.matmul(rotation_z, point.reshape((3, 1)))
    rotated3d = np.matmul(rotation_y, rotated3d)
    rotated3d = np.matmul(rotation_x, rotated3d)

    projected2d = np.matmul(projection_matrix, rotated3d)

    x = int(projected2d[0][0] * scale) + circle_pos[0]
    y = int(projected2d[1][0] * scale) + circle_pos[1]

    projected_points[i] = [x, y]
    i += 1

cv2.fillPoly(screen, [np.array(projected_points)[front_side]], (0, 0, 255))
cv2.fillPoly(screen, [np.array(projected_points)
            [top_side]], (240, 240, 240))
cv2.fillPoly(screen, [np.array(projected_points)[right_side]], (0, 255, 0))

cv2.polylines(screen, [np.array(projected_points)
            [front_side]], True, (0, 0, 0))
cv2.polylines(screen, [np.array(projected_points)
            [top_side]], True, (0, 0, 0))
cv2.polylines(screen, [np.array(projected_points)
            [right_side]], True, (0, 0, 0))

gray = cv2.cvtColor(screen, cv2.COLOR_BGR2GRAY)

corners = cv2.goodFeaturesToTrack(gray, 4, 0.02, 20)
corners = np.int0(corners)

for i in corners:
    x, y = i.ravel()
    cv2.circle(screen, (x, y), 40, (0, 0, 0), -1)

This was done for one cube, we want to go over all the cubes and perform random rotations. We will add a for loop that will iterate through all of our cubes and perform all the steps from above.

out = cv2.VideoWriter(f"projection_and_rotation.mp4", cv2.VideoWriter_fourcc(
    *"mp4v"), 1, (WIDTH, HEIGHT))

all_projected_points = []
all_corners = []
for cube in range(n*3):
    angle_x = angle_y = angle_z = 0

    angle_x += 0.0
    angle_y += 0.0
    angle_z += np.random.uniform(-0.5, 0.5)

    rotation_z = np.matrix([
        [np.cos(angle_z), -np.sin(angle_z), 0],
        [np.sin(angle_z),  np.cos(angle_z), 0],
        [0,                 0,        1],
    ])

    rotation_y = np.matrix([
        [np.cos(angle_y), 0, np.sin(angle_y)],
        [0,         1,        0],
        [-np.sin(angle_y), 0, np.cos(angle_y)],
    ])

    rotation_x = np.matrix([
        [1,       0,                0],
        [0, np.cos(angle_x), -np.sin(angle_x)],
        [0, np.sin(angle_x),  np.cos(angle_x)],
    ])
    screen = np.ones((HEIGHT, WIDTH, 3), dtype=np.uint8) * 255

    i = 0
    points = X[0].reshape(8, 3)
    for point in points:
        rotated3d = np.matmul(rotation_z, point.reshape((3, 1)))
        rotated3d = np.matmul(rotation_y, rotated3d)
        rotated3d = np.matmul(rotation_x, rotated3d)

        projected2d = np.matmul(projection_matrix, rotated3d)

        x = int(projected2d[0][0] * scale) + circle_pos[0]
        y = int(projected2d[1][0] * scale) + circle_pos[1]

        projected_points[i] = [x, y]
        i += 1

    cv2.fillPoly(screen, [np.array(projected_points)[front_side]], (0, 0, 255))
    cv2.fillPoly(screen, [np.array(projected_points)
                [top_side]], (240, 240, 240))
    cv2.fillPoly(screen, [np.array(projected_points)[right_side]], (0, 255, 0))

    cv2.polylines(screen, [np.array(projected_points)
                [front_side]], True, (0, 0, 0))
    cv2.polylines(screen, [np.array(projected_points)
                [top_side]], True, (0, 0, 0))
    cv2.polylines(screen, [np.array(projected_points)
                [right_side]], True, (0, 0, 0))

    gray = cv2.cvtColor(screen, cv2.COLOR_BGR2GRAY)

    corners = cv2.goodFeaturesToTrack(gray, 4, 0.02, 20)
    corners = np.int0(corners)

    for i in corners:
        x, y = i.ravel()
        cv2.circle(screen, (x, y), 40, (0, 0, 0), -1)

    all_projected_points.append(projected_points)
    all_corners.append(corners)
    out.write(screen)
out.release()

Below we can see several frames from the video that was generated, where we can see our cubes and also the corners detected.

As mentioned before, we want to optimize and calculate three parameters so let us define them now. In order to calculate them, we will use gradient descent. For this, we need to set a learning rate of \(1e-3\) or \(0.001\).

alph_1 = np.random.rand(1)
alph_2 = np.random.rand(1)
theta = np.random.rand(1)

lr = 1e-3

pc_components = pca.components_
gt_v = all_corners[0].reshape(4, 2)

As the ground truth, we will be using the four detected corners on the original image and we will compare it with four new corners that we will detect on the new projected cube from the predicted 3D cube.

To create a 3D cube from just an image, we will perform matrix multiplication of the random alphas that we previously created and the PCA components, PC1, and PC2, that were calculated on the \(n\) samples. Normally, gradient descent is performed in several iterations until the algorithm hits the local minima, we will first go block by block and then place it inside a for loop to simulate the iterations.

transformed_3d = (alph_1 * pc_components[0].reshape(8, 3)) + (alph_2 * pc_components[1].reshape(8, 3))

We also create three rotation matrices, we used them previously to rotate our cubes, and we just want a rotation around the \(z\) axis. For this purpose, we will use the theta variable for the rotation \(z\) rotation.

Note, we want to find the corresponding values for alph_1, alph_2, and theta in such a way that it creates 3D models for each of our cubes.

rotation_z = np.matrix([
                [np.cos(theta), -np.sin(theta), 0],
                [np.sin(theta),  np.cos(theta), 0],
                [0,                 0,          1],
            ])

rotation_y = np.matrix([
                [np.cos(0),  0, np.sin(0)],
                [0,          1,         0],
                [-np.sin(0), 0, np.cos(0)],
            ])

rotation_x = np.matrix([
                [1,       0,            0],
                [0, np.cos(0), -np.sin(0)],
                [0, np.sin(0),  np.cos(0)],
            ])

We iterate through all 8 points we obtained by multiplying the alpha_1 and alpha_2 parameters and the PCA components and multiplying each 3D corner with the rotation matrices. If you remember, we created our 2D projected cubes by multiplying the rotated 3D cubes with the projection matrix. For visualization purposes, we will perform the same steps here and we will also multiply it with a scale and add a circle position (to make our 2D cubes bigger and center them).

pr_v = np.zeros((8, 2))
for i, point in enumerate(transformed_3d):
    rotated_z = np.matmul(rotation_z, point.reshape((3, 1))) 
    rotated_y = np.matmul(rotation_y, rotated_z) 
    rotated_x = np.matmul(rotation_x, rotated_y) 
    projected2d = np.matmul(projection_matrix, rotated_x)
    x = int(projected2d[0][0] * scale) + circle_pos[0]
    y = int(projected2d[1][0] * scale) + circle_pos[1]
    
    pr_v[i] = [x, y]

Out of these 8 coordinates we just projected into 2D, we take the corners that correspond to the frontal view we had in the beginning (the image where we detected the corners);.

front_side = [4, 5, 6, 7]
pr_v = pr_v[front_side].astype(np.int64)
transformed_3d_front = transformed_3d[front_side]

Now comes the interesting part, the gradient descent. We calculate the loss and then perform the backpropagation with partial derivatives with respect to \(\theta\), \(\alpha_1\), and \(\alpha_2\).


dif_thetas = []
dif_alphas1 = []
dif_alphas2 = []
for point in range(len(gt_v)):
    gt_x = gt_v[point][0]
    gt_y = gt_v[point][1]
    pr_x = pr_v[point][0]
    pr_y = pr_v[point][1]
    transformed_3d_v = transformed_3d_front[point]
    pc_component0 = pc_components[0].reshape(8, 3)[point]
    pc_component1 = pc_components[1].reshape(8, 3)[point]
    E_landmarks = np.square(gt_x - pr_x) + np.square(gt_y - pr_y)

    dif_theta = 2 * (gt_x - pr_x) * \
                (-1) * \
                (-np.sin(theta) * transformed_3d_v[0] - np.cos(theta) * transformed_3d_v[1]) + \
                2 * (gt_y - pr_y) * \
                (-1) * \
                (np.cos(theta) * transformed_3d_v[0] + np.sin(theta) * transformed_3d_v[1])

    dif_alpha_1 = 2 * (gt_x - pr_x) * \
                (np.cos(theta) * pc_component0[0] - np.sin(theta) * pc_component0[1]) * \
                (-1) + \
                2 * (gt_y - pr_y) * \
                (np.sin(theta) * pc_component0[0] + np.cos(theta) * pc_component0[1]) * \
                (-1)

    dif_alpha_2 = 2 * (gt_x - pr_x) * \
                (np.cos(theta) * pc_component1[0] - np.sin(theta) * pc_component1[1]) * \
                (-1) + \
                2 * (gt_y - pr_y) * \
                (np.sin(theta) * pc_component1[0] + np.cos(theta) * pc_component1[1]) * \
                (-1)

    dif_thetas.append(dif_theta)
    dif_alphas1.append(dif_alpha_1)
    dif_alphas2.append(dif_alpha_2)
    
theta  -=  lr * (np.mean(dif_thetas))
alph_1  -= lr * (np.mean(dif_alphas1))
alph_2  -= lr * (np.mean(dif_alphas2))

This is what the whole gradient descent loop looks like and the cube fitting process. But, this was just one iteration, in order for our model to learn, we need to make several hundred iterations.

We will create a for loop and place the computations inside of it. For visualization purposes, we also save the values of the theta, alpha_1, and alpha_2 in a list. The whole flow can be seen in the code block below.

thetas = []
alphas1 = []
alphas2 = []
gt_v = gt_corners
for _ in range(500):
    transformed_3d = (
        alph_1 * pc_components[0].reshape(8, 3)) + (alph_2 * pc_components[1].reshape(8, 3))
    rotation_z = np.matrix([
        [np.cos(theta), -np.sin(theta), 0],
        [np.sin(theta),  np.cos(theta), 0],
        [0,                 0,          1],
    ])

    rotation_y = np.matrix([
        [np.cos(0),  0, np.sin(0)],
        [0,          1,         0],
        [-np.sin(0), 0, np.cos(0)],
    ])

    rotation_x = np.matrix([
        [1,       0,            0],
        [0, np.cos(0), -np.sin(0)],
        [0, np.sin(0),  np.cos(0)],
    ])

    pr_v = np.zeros((8, 2))
    for i, point in enumerate(transformed_3d):
        rotated_z = np.matmul(rotation_z, point.reshape((3, 1)))
        rotated_y = np.matmul(rotation_y, rotated_z)
        rotated_x = np.matmul(rotation_x, rotated_y)
        projected2d = np.matmul(projection_matrix, rotated_x)
        x = int(projected2d[0][0] * scale) + circle_pos[0]
        y = int(projected2d[1][0] * scale) + circle_pos[1]

        pr_v[i] = [x, y]

    front_side = [4, 5, 6, 7]
    pr_v = pr_v[front_side].astype(np.int64)
    transformed_3d_front = transformed_3d[front_side]

    dif_thetas = []
    dif_alphas1 = []
    dif_alphas2 = []
    for point in range(len(gt_v)):
        gt_x = gt_v[point][0]
        gt_y = gt_v[point][1]
        pr_x = pr_v[point][0]
        pr_y = pr_v[point][1]
        transformed_3d_v = transformed_3d_front[point]
        pc_component0 = pc_components[0].reshape(8, 3)[point]
        pc_component1 = pc_components[1].reshape(8, 3)[point]
        E_landmarks = np.square(gt_x - pr_x) + np.square(gt_y - pr_y)

        dif_theta = 2 * (gt_x - pr_x) * \
            (-1) * \
            (-np.sin(theta) * transformed_3d_v[0] - np.cos(theta) * transformed_3d_v[1]) + \
            2 * (gt_y - pr_y) * \
            (-1) * \
            (np.cos(theta) *
             transformed_3d_v[0] + np.sin(theta) * transformed_3d_v[1])

        dif_alpha_1 = 2 * (gt_x - pr_x) * \
            (np.cos(theta) * pc_component0[0] - np.sin(theta) * pc_component0[1]) * \
            (-1) + \
            2 * (gt_y - pr_y) * \
            (np.sin(theta) * pc_component0[0] + np.cos(theta) * pc_component0[1]) * \
            (-1)

        dif_alpha_2 = 2 * (gt_x - pr_x) * \
            (np.cos(theta) * pc_component1[0] - np.sin(theta) * pc_component1[1]) * \
            (-1) + \
            2 * (gt_y - pr_y) * \
            (np.sin(theta) * pc_component1[0] + np.cos(theta) * pc_component1[1]) * \
            (-1)

        dif_thetas.append(dif_theta)
        dif_alphas1.append(dif_alpha_1)
        dif_alphas2.append(dif_alpha_2)

    theta  -= lr * (np.mean(dif_thetas) )
    alph_1 -= lr * (np.mean(dif_alphas1))
    alph_2 -= lr * (np.mean(dif_alphas2))

    thetas.append(theta[0].copy())
    alphas1.append(alph_1[0].copy())
    alphas2.append(alph_2[0].copy())

We will plot the theta, alpha_1, and alpha_2 parameters to see how our model learned over the iterations.

fig, axs = plt.subplots(1, 3, figsize=(10, 5))

axs[0].set_title("Theta")
axs[0].plot(thetas)

axs[1].set_title("Alpha 1")
axs[1].plot(alphas1)

axs[2].set_title("Alpha 2")
axs[2].plot(alphas2)
plt.show()

When checking the original values for theta, the value we got from the random sampling was around -0.15. Also, we can see that the theta parameter was extremely close to that value. Going back to the beginning and checking the alpha parameters that we get when we perform pca.transform().

alphas_gt = pca.transform(X[0].reshape(1, -1))
print(alphas_gt[:2])
Output:
-1.0883
4.96056

As you can see, the values that we get from this, match the values that we got from the gradient descent meaning that we have successfully created a 3D model for the cube from just a 2D image.

Let us just go over all the steps that we have previously done, or even better let us look at the block diagram below.

The first step was to create 3D cubes, in which 3 types of cubes were made. After that step, we flatten the cubes to get a \(n \times 24\) vector onto which PCA was applied on. PCA reduces the dimensions and it can be seen that with only 2 components almost any of those 3 types of cubes can be reconstructed. Once the components are obtained we rotate our cubes by a random value of \(\theta\) and project that rotated cube onto 2D. We used goodFeaturesToTrack from OpenCV to detect corners on the cubes, these will be our ground truth data. Now comes the exciting part, the training process. Three random scalars are initialized, parameters, \(\alpha_1, \alpha_2, \theta\) which we want to optimize. Multiplying the alpha parameters with the PCA components and summing them a 3D model of a cube is obtained. We rotate that cube by the parameter theta and project it into 2D. We know where each of the vertices from the cube was projected in 2D and we take those that are visible in the image. The next step is comparing the corners we just obtained, the predicted corners, and the ground truth ones that we detected previously, we call this loss \(E_{landmarks}\) and perform gradient descent to calculate the partial derivatives and update the parameters.

The goal of this is to get a 3D model that fits the original 3D cube. For example, instead of cubes, let’s look at these as faces. We have a 3D scan of a face, we project it into 2D and detect the landmarks on the face. We create a new 3D face using the PCA components and parameters alpha and rotate it by a parameter theta. Projecting this new face into 2D and detecting the landmarks on the new predicted face we obtain new landmarks that we compare with the ground truth ones. By optimizing the parameters for this face, we can model any position, any emotion, and any facial movements.

Summary

In this post, we have seen how we can fit a 3D model on an image from just that image as input. We used PCA to obtain PCA components and using gradient descent found a way how to get a 3D model from a 2D image. The cubes from this example can be looked at as face scans as we explained in a previous post.