#012 3D Face Modeling – Pinhole Camera and Perspective Projection

Highlights: Hello, and welcome. In this post, we’re going to talk about how to form an image from a 3D scene onto a 2D plane. We will start with a concept of a pinhole camera which is one of the most important concepts in computer vision. It is the simplest type of camera that you can imagine and it performs perspective projection.

In this post, we are going to review the YouTube video “Pinhole and Perspective Projection | Image Formation”[1]. So let’s begin!

Image Formation

To learn how to project an image from a 3D scene onto a 2D plane, we need to understand the geometric and photometric relation between the scene and its image. For the geometric relation, we want to understand where the single point on the scene ends up in the image. For the photometric relation, we want to understand where the brightness and appearance of a single point would be in the image.

So. let’s have a look at the following example.

Here we can see a house on the right side which represents a 3D scene, and on the left side, we can see an image plane (screen). Now, the first question that we want to answer is the following. Is an image of the house being formed on the screen? If we consider any point on the screen, it does receive light from a lot of points on the house. However, we will not see a clear image. Instead, we will get a blurred image.

So, the question is, how we can create a clear, crisp image of the house on the screen? Well, the simplest way to do this is by using a pinhole. A pinhole is an opaque sheet with a tiny hole in it, and it’s placed between the scene and the image plane as shown in the following image.

Here, we can also see the optical axis. It is the axis that is perpendicular to the image plane, shown in the image above as a dotted line.

Now, we can select a single point on the house, and we can draw a single ray that travels from that point $P_{0} $, to the image plane, and projects onto the point $P_{i} $. So now, every point of the scene projects onto a single point in the image.

Next, our goal is to understand the relationship between $P_{0} $, and $P_{i} $. For that, we need a 3D coordinate frame placed at the pinhole, with the $z $ -axis pointing along the optical axis.

The next term that we need to explain is the distance between the pinhole and the image plane. It is called the effective focal length $f $.

Now, we can write the point $P_{0} $ using the vector $r_{0} $ with the coordinates $x_{0} $, $y_{0} $, and $z_{0} $. The point $P_{i} $ can be denoted as $r_{i} $ with the coordinates on the image plane $x_{i} $, $y_{i} $, and $f $. Note that instead of the $z $ coordinate here we have our effective focal length $f $.

$$ \vec{r}_{0} = (x_{0}, y_{0}, z_{0}) $$

$$ \vec{r}_{i} = (x_{i}, y_{i}, f) $$

So, irrespective of where the $z $ component of the image plane lies on the 3D scene, it is always going to be equal to $f $.

Now, if we consider these two triangles shown in the image above, we can see that they are similar triangles. Similar triangles have the same shape but different sizes, and their corresponding angles are equal. Now, if we take a closer look at the image above, we can see that an angle at the intersection of the optical axis and the image plane is equal to 90 degrees as well as his corresponding angle at the projection of the point $P_{0} $ onto the optical axis. Similarly, the two angles that are located at the pinhole are also identical. Therefore, we can conclude that these two triangles are in fact similar triangles. So, we can write the following equation:

$$ \frac{\vec{r}_{i}}{f}=\frac{\vec{r}_{0}}{z_{0}} $$

Here, the $z_{0} $ is the $z $ component of the 3D point. In other words, it’s the
depth of the point in 3D.

Next, we can break down our vectors $\vec{r}_{0} $, and $\vec{r}_{i} $ into its components:

$$ \frac{x_i}{f}=\frac{x_0}{z_0}, \frac{y_i}{f}=\frac{y_0}{z_0} $$

So, these are the equations of perspective projection.

Properties of perspective projection

Now, let’s take a look at some of the properties of perspective projection. The first one we will talk about is the perspective projection of a line in 3D. In the following image, we can see this line and a
pinhole.

We already know that the line and the point (pinhole), define a plane in 3D. Here, all the rays of light that
pass through the pinhole lie on this plane. Furthermore, all the rays of light that pass through the pinhole towards the image plane also should lie on this plane. Therefore, the image of this 3D line must lie at the intersection of this plane and the image plane on the 2D image. In other words, the image of a line in 3D has to be a line in 2D.

That is the reason why we find that in photographs, straight lines in the scene will map to straight lines in the photograph.

Next, we want to talk about image magnification.

Image magnification

Let’s say that we have an object of a certain size at a certain distance. Our question is, what is going to be its size in the image? So, for this, we’re going to use a little segment $A_{0} $, $B_{0} $ that you can see in the image above. This segment has a length $d_{0} $. Here, $A_{0} $ has the following coordinates:

$$ A_0\left(x_0, y_0, z_0\right) $$

Also, $B_{0} $ has the same coordinates displaced by $x_{0} + \delta x_{0} $ and $y_{0} + \delta y_{0} $.

$$ B_0\left(x_0+\delta x_0, y_0+\delta y_0, z_0\right) $$

So, this segment lies on a plane in the scene that is parallel to the image plane. And of course, it produces another segment on the image $A_{i} $, $B_{i} $. We want to understand what the length $d_{i} $ of the segment $A_{i} $, $B_{i} $ is going to be, due to a segment of length $d_{0} $ in the scene.

The ratio of the length of the segment in the image to the length of the segment in the scene is called magnification. This can be written in terms of the displacements, $\delta x_0 $, $\delta y_0 $, and $\delta x_i $, $\delta y_i $ as follows:

Magnification:

$$ |m|=\frac{\left\|\vec{\mathbf{d}}_i\right\|}{\left\|\vec{\mathbf{d}}_0\right\|}=\sqrt{\delta x_i{ }^2+\delta y_i^2} / \sqrt{\delta x_0{ }^2+\delta y_0^2} $$

Next, we can simplify the equation above. To do that we will substitute for the displacements in the image. What we’re going to do is apply perspective projection to the point $A_{0} $ and $B_{0} $. In doing so, we are going to get four equations, one for $x $ coordinate, and one for $y $, for each one of the two points, $A $ and $B $.

Form perspective projection:

Point $A $:

$$ \frac{x_i}{f}=\frac{x_0}{z_0} \text { and } \frac{y_i}{f}=\frac{y_0}{z_0} $$

Point $B $:

$$ \frac{x_i+\delta x_i}{f}=\frac{x_0+\delta x_0}{z_{0}} \text { and } \frac{y_i+\delta y_i}{f}=\frac{y_0+\delta y_0}{z_{0}} $$

Using these four expressions, we end up with a very simple expression for the relationship between displacements in the image to the displacements in the scene.

$$ \frac{\delta x_i}{f}=\frac{\delta x_0}{z_0} \text { and } \frac{\delta y_i}{f}=\frac{\delta y_0}{z_0} $$

Now, we can plug back this equation into the equation for magnification. It turns out that we will get a
very simple expression for magnification.

Magnification:

$$ |m|=\frac{\left\|\vec{\mathbf{d}}_i\right\|}{\left\|\vec{\mathbf{d}}_0\right\|}=\sqrt{\delta x_i{ }^2+\delta y_i^2} / \sqrt{\delta x_0{ }^2+\delta y_0^2} = \left|\frac{f}{z_0}\right| $$

This equation states that the absolute value of the magnitude $m $ is going to be equal to the absolute value of effective focal length $f $, divided by $z_0 $, which is the depth of the object in the scene.

$$ m=\frac{f}{z_0} $$

Note that we have $z_0 $ in the denominator. This is really an interesting thing. It means that the size of the magnification of an object in an image is going to be inversely proportional to its distance from the camera. The sign of $m $ is going to correspond to whether the image is upright or inverted. In the case of a pinhole camera, it’s going to be inverted, and therefore, negative.

Now let’s take a look at some of the manifestations of image magnification.

Manifestations of image magnification

Let’s take a look at the following image.

Here we can see two parallel train tracks. We know that the tracks must be parallel in 3D, otherwise the train is going to have a problem. However, in the image, they appear to be intersecting or meeting at infinity. In other words, as we go further and further away, in terms of depth, the two lines get closer and closer. The reason for this is magnification that is inversely proportional to the distance.

Now, let’s talk about some additional magnification properties. One property is that $m $ can be assumed to be constant if the range of scene depth $\Delta z $ is much smaller than the average scene depth $\tilde{z} $. This means that the magnification of an object can be assumed to be constant if the object is small compared to its distance from the camera. This is an interesting property, that you can see in
selfies. In selfies, your nose tends to look much larger than the rest of your face. The reason is that the camera is actually quite close to your face, and the nose is closer to the camera than your ears or your
eyes. Therefore, the nose appears and tends to be magnified a lot more than the rest of your face.

Once we have magnification, we can think about the ratio of the area of an object in the image and its area in the scene. This ratio would be equal to $m^{2} $. This is shown in the image below.

Now, let’s talk about another interesting manifestation of perspective projection called the vanishing point.

The vanishing point

In the following image, we can see a picture of a tunnel. We can assume it is a straight tunnel.

As you can see, this tunnel has a lot of lines in 3D (the white lane lines, the two edges of the yellow bands, etc). We can assume that all of these lines are parallel in 3D, and yet, all of them seem to be emerging from a single point in the image. That point is called the vanishing point. So, if we have a set of parallel lines in 3D, they seem to be converging at a single point in the image.

Now, the question that we need to answer is where is the location of the vanishing point in the image. This location depends on the orientation of these parallel lines in 3D.

How to calculate the vanishing point?

So, how do we figure out, given a set of parallel lines in 3D, where the vanishing point is going to end up for that set of lines? Let’s have a look at the image below.

Here, we can see two parallel lines. Our goal is to calculate the vanishing point corresponding to these two lines. So, how we can do this? Well, remember, all parallel lines in 3D share the same vanishing point. So. what we need to do is to construct a line that is parallel to these two lines that pass through the pinhole. Wherever that line pierces the image is the vanishing point corresponding to this set of parallel lines in 3D.

This process is very straightforward, given that we already learned about perspective projection. First, we define the direction of the set of parallel lines in 3D. That’s given by the vector $\hat{l}\left(l_x, l_y, l_z\right) $. Then we create a point $P\left(l_x, l_y, l_z\right) $ in the direction from the pinhole of the camera.

Now, we simply perspectively project that point into the image using perspective projection equations that we already know., and we are going to get the coordinates of the vanishing point.

$$ \left(x_{v p}, y_{v p}\right)=\left(f \frac{l_x}{l_z}, f \frac{l_y}{l_z}\right) $$

The vanishing point is a concept that artists have used extensively. In the following image, we can see a very famous painting “The Music Lesson”, by Dutch artist Johannes Vermeer. In this scene here, we can see a lot of parallel lines. However, let’s take a look at this set of blue lines.

We can see that this set of parallel lines ends up converging at this vanishing point. The artist actually placed what he considers to be the most important object or activity exactly at that place. He wants to draw your attention to that activity.

Here’s another interesting concept, which is called false perspective. In the following image, we can see a Galleria Spada by Francesco Borromini.

So, what we can see here is an archway or a hallway. At the end of the hallway, we can see the sculpture. Now, when we stand in front of the hallway, we get the impression that the sculpture is roughly 150 feet away from us. It turns out that this sculpture is only 30 feet away. The reason for that is the very compelling effect or illusion, that accrues because the pillars of the archway are actually getting smaller with distance away from the observer. This creates an effect of false perspective, which forces us to believe, that the object at the end of the hallway is much further away than it actually is.

So, we talked about the pinhole and how it can be used to create images. However, one thing we did not mention yet is the size of the pinhole. So, let’s talk about what should the size of the pinhole be.

The size of a pinhole

The pinhole is a tiny little hole. It makes a lot of sense, to conclude that if the hole is smaller, we’re going to have a thinner, better-defined ray going through. However, that is not quite true. Let’s have a look at the image below.

Here, we can see the different sizes of the pinhole used to take a set of images. We start with the largest
size of 2 millimeters. Because it’s the fairly large size of a pinhole, we’re going to get a lot of blurring in the image. That is because, from each point in the scene, we don’t get a single point in the image. Instead, we end up getting a bundle of rays going through, which creates a distribution of light in a circular region.

As we start making the pinhole smaller and smaller, eventually we will end up with a very sharp image. But here’s the interesting thing. If we make the pinhole even smaller, the image starts getting blurrier again. So why does this happen? This is because of an effect in wave optics called diffraction.

It turns out that if we have any opening, and a light wave that’s passing through this opening, there’s going to be a bending of these light waves at the edge of the periphery of the opening as you can see in the image below.

Here, the smaller the opening gets, there’s going to be more the effect of the bending than the light actually passing through. Therefore, we will get severe bending as we get too small. So then the question is, what is the ideal size of the pinhole? Well, we can calculate this. It turns out that the diameter of the pinhole should be roughly 2 times the square root of the product of the effective focal length $f $
and the wavelength of light $\lambda $. In our case, the wavelength lies somewhere between 400 nanometers and 700 nanometers.

So, let’s just pick the average, of 550 nanometers. We can plug in that number in the equation above and we can plug in the effective focal length. In that way, we will calculate the ideal, pinhole diameter. So, when we use the ideal pinhole size, we can get some stunning images.

In the following example, we can see an image of the Flatiron Building in New York.

What is interesting here is that this image is pretty much focused everywhere. So, well-designed pinhole cameras tend to create these all-focused, images. The image is captured with a camera that has an effective focal length of 73 millimeters, and a pinhole of 0.2 millimeters diameter. However, the price we pay is the exposure time which is equal to 12 seconds. So, since a pinhole captures very little light, it also lets very little light through. As a result of that, the exposure times tend to be much longer, so that the detector that we’re using to capture the image has enough photons that arrive on it.

We can imagine that almost for any real application of computer vision, waiting for 12 seconds to capture a single frame is just not going to work. That’s why we use lenses, but that is the topic for the next post.

Summary

That was all for this post. We have learned about the concept of a pinhole camera, and how parallel lines behave differently in the real world and in perspective projection. We have also introduced one new concept, which is the Vanishing point and we explained why need pinhole to be smaller, to create a very sharp image.

References:

[1] – Pinhole and Perspective Projection | Image Formation – Shree Nayar, T. C. Chang Professor of Computer Science in the School of Engineering at Columbia University.

#012 3D Face Modeling – Pinhole Camera and Perspective Projection