#004 3D Face Modeling – 3D Scanning & Motion Capture: Parametric Face Models

Highlights: Hello and welcome. In the last few years, Deep fake videos become very popular on social media. In this post, we will learn the basic theory behind deep fakes, and more specifically, how we can build a 3D face model and capture a motion in the human face.

In this post, we are going to review the YouTube video “3D Scanning & Motion Capture: 8. Parametric Face Models”[1]. So let’s begin!

Let’s have a look at the image below. Here, we can see a 3D human face model. Our goal is to describe possible shapes and motions in the human face. Let’s see how we can do that.

In the following image, we can see a lot of vertices in the mash. The idea is that we want to reduce the number of free parameters that we have here. And the way we want to choose these parameters is essential in a way that we describe the possible shapes and motion and possibly appearance of faces in a lower number of parameters. This is done via dimensionality reduction for the most part.

So, we’re going to have a dimensionality reduction that will allow us to control this mesh by changing only a small subset of parameters. Basically what we want to do is to find a realistic distribution. Instead of having a lot of degrees of freedom, we want to reduce the number of degrees of freedom by building a parametric model.

Parametric Face Model: Shape Identity

So, parametric face models are capturing the distribution of human faces. We want to learn a lower dimensionally, or lower approximation, that figures out how to handle this distribution with a relatively small set of parameters. In order to get there, we need a database that consists of a large number of different human faces. Here, we can list the different characteristics of this database:

The database consists of neutral images – there’s no expression on the face
The database consists of images of different people
Every face has a different size and shape
Every face has different surface appearances

So, there are two types of variations that we’re going to capture here:

different shapes of human faces
different appearances of the human faces

One thing we can do is to take all faces in our database, and compute an average face. So, what is the average face?

Finding an average face

The average face is based on the samples that we have in this database. So, how can we compute this average face? First, we have to figure out a way how to look at each of the vertices in the mesh, and also we have to figure out a way to look at the surface appearance.

Once we find an average face we would love to encode a deviation from that average face. So, we want to have a parameterization that tells us, how does this average face essentially change?

Changing shape

Here, we have a certain standard deviation, that changes the face in the following way:

The positive standard deviation – the face will appear a little bit larger (the male face)
The negative standard deviation – the face will appear a little bit smaller (the female face)

So, we first calculate an average face and then go into these two directions to change the shape.

Notice that the surface appearance here is always the same. So, this axis in the image above is only changing the geometry. That means we have the same expression, same appearance, and different geometries underneath. So, in this case, we can change the identity, but we can not change the appearance. Moreover, we also want to change the appearance. This is the second property of such a face model.

Changing appearance

So, the shape defines who you are. And in this case, we also have the albedo encoded. Let’s take a look at the image below.

Here we can see that the skin tone gets a little darker, and in the other direction, the skin tone gets a little bit lighter. Note that here we don’t have just one scalar that controls appearance. You have to imagine that this is a higher dimensional vector that allows controlling the shape and skin tone in a multi-dimensional fashion.

So, to summarise, we essentially have these two axes that we can see in the image above. The vertical axis is for the shape, and the horizontal axis is for the albedo. These axes are defining the shape of identity. And each of them is a higher dimensional vector.

Adding expression

Now, we would love to have a third axis here. Since it is difficult to draw the third axis in one graph, we just have to imagine that we’re going to add this third axis that represents the expressions.

In the image above we can see a single person who is recorded with different expressions. And this is what our face expression database is going to give us.

So, now we have these different faces of the same person. However, we have different expressions of that very same person. So, you have to imagine we have another axis that goes and controls the facial expressions.

Now let’s examine some examples, of how to reconstruct a face model based on a given parameterization.

Parametric Face model

Rigid pose

In the following image, we can see an example of the parametric face model.

As you can see this face model here is being animated. On the right-hand side, you see several sliders that we can use to control different parameters. These are the parameters from our model – shape, albedo, expression, and illumination. Right now, we haven’t talked about the last one. We will explain this later on in the post.

What we can do here is to put all of these parameters and stack them in a single vector $P $. And these are the parameters of our parametric model. The simplest parameter that we’re going to use here is not part of any of these, and it is just simply a rigid pose. The idea here is that we can rotate the parametric model according to the rigid pose. We can write down how many parameters are being used for this specific model. Now, if we’re having a rigid transformation here, we know that we have three parameters for the rotation and three parameters for the translation.

$$ P = 6 $$

For the other parts, we are going to add more of these parameters by simply modifying the sliders.

Shape identity

Now, what we want to do is to change the shape identity. For that, we will use the $\alpha $ parameters. We can see that the shape of the model is changed accordingly with the sliders. So the first slider is basically scaling it up and down, and the other ones are having different effects on the shape. As a combination of these sliders, we can effectively change the shape of the face.

In this specific case, there is a large number of sliders here, but for this specific model, we will use 80-dimensional vectors that represent the shape.

$$ P = 6+80 $$

Material reflection

Now, we can do the same thing with the albedo. Again, we have sliders and we can change the albedo using the same method.

And similar to before, this beta vector here has also an 80-dimensional vector.

$$ P = 6+80+80 $$

Next, let’s see how we can change the expressions.

Expression parameters

Here, we have an expression vector. So, to change the expressions we will apply the same method as before. You can see, the different sliders have different effects on the respective expressions of this person. At the same time, we are not controlling any of the albedos and not changing the shape at all. So the core idea is that this identity of the person stays exactly the same while we control the expressions of that single person. In this case, we have 76 parameters.

$$ P = 6+80+80+76 $$

In order to get a realistic appearance of these models, you also want to be able to shade them. And for the shading, you need light.

Lighting Parameters

In order to control the light we can use a few sliders. So, we can change the surrounding lighting to generate lighting effects in order to make the face appear in different environments. In this case, these lighting parameters are 27 dimensional.

$$ P = 6+80+80+76+27 $$

So, if you’re adding it up, you know, we have in total 269 parameters that control the parametric face model and allow us to control the shape, the albedo, the expressions, and finally, the illumination. Now, let’s explain these parameters in detail.

Let’s start explaining what the illumination parameters look like because that’s something we haven’t really talked about yet in this post. It’s actually an important concept in both computer vision and computer graphics. In order to understand how we can model the illumination using these 27 parameters, you have to understand what illuminations models basically are doing.

Illumination models

So the idea here is that you have:

3D face
light source
camera

The idea is that based on the lighting is that you apply some shading and then the face looks something like this.

Now, let’s talk about one specific lighting model in a little bit more detail.

The first idea is environment maps and that is a straightforward idea.

Let’s take a look at the following example. This panorama image is an environment map that represents the incident lighting at a certain point that is being received from the surrounding scene. So that’s the scene lighting. What we can do here is to take this incident lighting, and parameterize it accordingly. There are different ways to do this, and the most common ones are:

Cube maps
Sphere maps

These are concepts that are typically used in computer graphics, in order to parameterize the access to the environment maps in order to map them to the respective meshes for the final rendering.

The assumption here is that lighting is kind of distant, it’s infinitely far away. There’s no self-shadowing, and there’s no scattering.

Furthermore, note that we have zero interaction with the scene. So, if we take this example, we can’t interact with it, we can’t go behind it. We are just assuming everything is infinitely far away.

So, this is relatively straightforward and can be applied to actual real images. You can take a panel image, and you can map this to an environment map. But the core idea is always the same. You have the observer in the center, and everything else is infinitely far away. Then you have this incident light. And now what you can do is to take these photos. However, when you want to parameterize these environment maps, you can find a basic function that you can parameterize.

And this is the idea of what spherical harmonics are doing.

So the idea is basically an orthogonal basis. And it’s also defined over the sphere. You could imagine that it’s like a parameterization of environment maps. But, it’s a lowering approximation. So, you will not see any detail or sharp changes in lighting. It’s a very smooth approximation. And the idea is that you just simply find a reasonable parameterization over a sphere, and you select a bunch of basis functions over the sphere that you can linearly combine.

B – the basic functions you’re combining.
gamma – the coefficients that are multiplied by the basis functions

So, what these spherical harmonics are doing? Well, they’re essentially parameterizing this sphere, based on the linear combination of these basis functions.

The assumption here is that the surfaces are going to be Lambertian. So, the question is how can we use the sphere parameterization of these spherical harmonics spaces in order to model Lambertian lighting?

Well, Lambertian lighting means that the lighting is not dependent on the respective viewpoint, but it is depending on the surface model and incident lighting. The incident lighting is modeled by these spherical harmonics and the normal here is given by the respective mesh or the respective geometry.

So, the following graph represents Lambertian lighting.

Here, we have a normal surface. What this means in Lambertian lighting is that you have some incident light and the light is reflected in a uniform way. So, no matter where we are going to look at it, this surface point will always have the same brightness in the opposite of Lambertian lighting which would be a specular reflection.

We’re also going to assume that we have this distance smooth lighting. This is similar to the environment maps. So, we assume everything is infinitesimally far away, and we’re going to have smooth lighting. And the lighting is simply a computation based on the bases in a combination with the normals. So, the lighting is computed by taking max coefficients, which are part of our parametric models.

Finally in order to model the lighting, we’re gonna change these coefficients with the sliders.

Summary

So, this is all for this post. Today, we learned how to build a 3D face model and capture a motion in the human face. In the next post, we are going to talk about PCA. This is a dimensionality reduction technique that is used for parametric face models.

References:

[1] – 3D Scanning & Motion Capture: 8. Parametric Face Models – Prof. Matthias Nießner

#004 3D Face Modeling – 3D Scanning & Motion Capture: Parametric Face Models