CamCal 011 Fundamental Matrix
Highlights: In this post we will learn about fundamental matrix and we will continue our series about stereo vision. In the last post we concluded that if we have enough points we should be able to figure out the constraints for the epipolar line. So for this, we will need to calculate a fundamental matrix.
Tutorial Overview:
Intro
In previous posts we have developed the relationship between two images obtained with two calibrated cameras where we have actually known the rotation and translation parameters between them. In particular we defined the essential matrix which related between world points of two calibrated cameras. On the other hand, what should we do when we don’t have calibrated cameras. Well let’s think about it in a different way. If we have two images and enough corresponding pairs of points it feels like we should be able to figure out the epipolar lines and what the correspondence should be for all the other points. In other words we can estimate where they can be on the epipolar line according to their depth. So, we will do the mathematics of this relationship between two uncalibrated cameras and their images.
1. Weak Calibration
Let’s assume that we have got two projective cameras, but we don’t know anything about focal length or their offsets. Also, let’s assume that we don’t have a radial distortion and non-uniform stretch. We are going to call this example a calibration. The main idea is that we are going to estimate the epipolar geometry from a set of corresponding points between uncalibrated camera images. In order to make this approach more robust with respect to noise we will use a large set of corresponding points. Let’s get started.
From our previous posts you will remember that we have a projection mapping that went from a world coordinate through our extrinsic to our intrinsic to a projective representation of the image coordinate.
$$ \begin{bmatrix}wx_{im}\\ wy_{im}\\w\end{bmatrix}= K_{int}\Phi _{ext}\begin{bmatrix}X_{w}\\Y_{w}\\Z_{w}\\1\end{bmatrix} $$
So, we are going to use a \(3\times 4 \) version of \(\Phi \) here since \(K \) will be a \(3\times 3 \) where the left-hand side is that rotation matrix and the right-hand side is the transformed translation. And you will notice a slightly weird form written here as \(-{R_{1}}^{T}\textrm{T} \) and all that saying is that if we know the translation in one frame we might have to express any other frame depending exactly how we are doing this. That is \(\Phi _{ext} \), our extrinsic parameter matrix.
$$ \Phi_{ext}= \begin{bmatrix}r_{11} & r_{12} & r_{13} & {-R_{1}}^{T}\textrm{T}\\ r_{21}& r_{22} & r_{23} &{-R_{2}}^{T}\textrm{T} \\ r_{31} & r_{32} & r_{33} & {-R_{3}}^{T}\textrm{T}\end{bmatrix} $$
$$ K_{int}= \begin{bmatrix}-f/s_{x} & 0 &O_{x} \\ 0& -f/s_{y} &O_{y} \\ 0& 0& 1\end{bmatrix} $$
So \( K_{int} \) is our intrinsic calibration matrix and it has a focal length and an \(x \) scale and a \(y \) scale. Of course as we said before they are multiplied together so we really only need two numbers. We have got \(-f/s_{x} \) and \(-f/s_{y} \), so they always appeared multiples like that but we have a scale in both directions and we have the offsets. What we don’t have is we don’t have any skew, that is this zero value that makes this math a little bit easier. However, as we have said before in general for modern cameras skew just isn’t an issue.
We are going to write this in the following way:
$$ p_{im}= K_{int}\Phi _{ext}P_{w} $$
The point in the image \(p_{im}\) (homogeneous) is the point in the world coordinates \(P_{w}\) multiplied by the extrinsic matrix \(\Phi_{ext}\), multiplied by the intrinsic matrix \(K_{int}\). The location of point in 3D point in the camera frame would be \(\Phi _{ext}P_{w} \) which is just the world point put through the extrinsic. We are going to call it \(p_{c}\). Remember that the extrinsic matrix maps us from some arbitrary world frame into the 3D frame of the camera. So we can write down that \(K_{int} \) maps us from point in the camera coordinate system to the homogeneous point in the image.
$$ p_{im}= K_{int}p_{c} $$
2. Uncalibrated Case
This before was in case when we had a calibrated camera, if we had \(\Phi\) and \(K \).But we said that this was uncalibrated. So let’s suppose we have a given camera \(p_{im}= K_{int}p_{c} \). Now if we look at that \(K_{int} \) that is a invertible matrix. We can invert \(K_{int}\) and that’s what is shown here:
For a given camera:
$$ p_{im}= K_{int}p_{c} $$
And since it is invertible:
$$ p_{c }= K_{int}^{-1}p_{im} $$
So we can go from the image back to the point out in the camera frame. Based on where we are in the image, we will be able to tell where we are in the world. Point \(p_{c } \) is not a regular 3D point. This is a homogeneous point that is anywhere along the ray. The homogeneous coordinates is telling us that if we have the point in the image and the intrinsics we can determine the ray in space which is the same as a homogeneous coordinate in the image. That is why we can go from an image to the ray.
Now we can take that equation and we could apply it when we have two cameras. Typically called the left one and the right one, or just the main and the prime. Here, it is written left and right to be more convenient:
$$ p_{c,left }= K_{int,left}^{-1} p_{im,left} $$
$$ p_{c,right }= K_{int,right}^{-1} p_{im,right} $$
Now, we have a different internal calibration matrix per camera. All we need to think about is that there exists an intrinsic calibration matrix for each camera. Also it is uncalibrated, so we don’t know the value of the matrix and just assume that there is one.
If we want to express the location of the point in the right frame in therms of the ray, we can write it as \(p_{c,right }\). Similarly we can write \(p_{c,left }\) for the point in the left frame. We know something about the relationship between a point in the right image and a point on the left if these were calibrated. There is the essential matrix that would relate the ray to the point from the right frame to the ray to the point in the left frame. Moreover, the relation between them is called the essential matrix.
$$ {p_{c,right }}^{T} \textrm{E} p_{c,left}= 0 $$
However, we don’t know what the essential matrix is because we have uncalibrated cameras, but we know there would exist one. So, if we assume there exists one then we can rewrite this equation:
$$ \left ( K_{int,right}^{-1}p_{im,right} \right )^{T}E\left ( K_{int,left}^{-1}p_{im,left} \right )= 0 $$
3. Matrix Multiplication
Now, we should talk about matrix multiplication. So, we can modify last equation and regroup things in brackets using the associative property:
$$ p_{im,right}^{T}\left ( {K_{int,right}^{-1}}^{T}EK_{int,left}^{-1} \right )p_{im,left} = 0 $$
After that, we can combine this whole thing in the bracket. We are going to call it \(F \). When we do that we end up with this beautiful equation:
$$ p_{im,right}^{T}Fp_{im,left}= 0 $$
This is a relationship between image points which we can just write this way:
$$ p^{T}F{p}’= 0 $$
\(p \) being in one frame \({p}’ \) being in the other frame. This fundamental matrix constraint is going to allow us to solve for the relationship between one view and another if we have enough points that correspond. And once we do that, we know what relates the two images.
4. Properties of the Fundamental Matrix
Epipolar lines
Now, we will to talk about properties of the fundamental matrix. This relationship here is the relationship that happens algebraically between the corresponding points.
$$ p^{T}F{p}’= 0 $$
Both \(p \) and \({p}’ \) are the image points of the same point in the outside world. From epipolar geometry we know that if we have got some point in the image \(p\), we know that it is on the ray \(OP\) and, therefore that point must be on the epipolar line \(l’\). This is the geometric constraint.
We already said in previous post Stereo Geometry Intro that the equation of a line can be written in a couple of ways. One way is \(p^{T}l= 0 \). Remember \(l \) was the normal to the plane and all the points \(p \) had to be on that plane and would be perpendicular to it. Notice that in both equations we have \(p^{T} \). So, if the second equation is equal to zero, and the \(p^{T}\) is the constraint that has to equal zero, we get epipolar line in \(\Pi\) image, which is equal to \(l= F{p}’\).
\(l= F{p}’ \) is the epipolar line in the \(\Pi\) image associated with \({p}’ \).
Also we can just reverse this and get \(F^{T}{p}\) which is the epipolar line in the \(\Pi’\) image associated with the point \(p \).
$$ {l}’= F^{T}{p} $$
So the fundamental matrix gives us the epipolar line constraint between two views and if we know what \(F \) is we can compute the epipolar lines. In fact, for a given point we can just take \(F^{T}{p} \) and that is the epipolar line in the other image.
So the only thing we don’t know for some point that we see in our left image is the depth along that ray. Well since we don’t know its depth along that ray it has to be along this line. So, the fundamental matrix equation reduces that geometric constraint to an algebraic one.
Epipole
$$ p^{T}F{p}’= 0 $$
We know that \({p}’ \) is on the epipolar line corresponding to \(p \) in the other image. But suppose \({p}’ \) was on the epipolar line in the prime image for every point \(p \) in the original image. How could that be? Well suppose \(F{p}’= 0 \). Well in order for point to be on every epipolar line, there is only one point that is on every epipolar line and that’s the epipole. So, we can find the epipole by solving the equation above. Remember there are two epipoles. The epipole in the original frame and the epipole in the prime frame.
$$ F{p}’= 0 \\ F^{T}{p}= 0 $$
We just solve each of those equations and that identifies for us the epipoles in the image.
Singular Matrix
The fundamental matrix is actually a \(3 \times 3\) because we use the homogeneous coordinates of an image point (3-vector). But the \(F \) matrix is actually singular. And the reason for that is if it wasn’t singular it would map between points and points. In fact, it maps between points and lines so it maps from a 2-D point to a 1-D line. So when we have a fundamental matrix and then we get a point all we can tell is the line in the other image. So we go from something that has \(0 \) degrees of freedom to something that has another degree of freedom. That is because the rank of the fundamental matrix is only \(2 \) and not \(3 \).
Now we have the fundamental matrix. As we said it relates between the pixel coordinates in the two views. And it is much more general than the essential matrix because we have removed the need to know the intrinsic parameters. If we have two arbitrary cameras that aren’t even necessarily the same, the fundamental matrix allows us to relate them without worrying about the intrinsic or the extrinsic.
Summary
Now when we have all this stuff in mind, we can finally get to the part where we can show the code. In the next post we will cover all sorts of things which is necessary to know to be able to find a fundamental matrix and epipolar lines in Python and C++.