#007 Advanced Computer Vision – Video Stabilization

#007 Advanced Computer Vision – Video Stabilization

Highlights: Hello and welcome. In this post[1], we will learn to remove unwanted camera movement or shake from a video using a technique called Video stabilization. It is typically used to make videos appear smoother and more professional. There are a variety of techniques that can be used to stabilize a video, including image cropping, image scaling, and frame-by-frame motion estimation.

In this post, we will present an efficient and robust implementation of a digital video stabilization algorithm. This algorithm is based on a two-dimensional motion model and uses a Euclidean transformation. So let’s begin with our post.

1. Video stabilization – intro

What is video stabilization?

Video stabilization is a set of techniques utilized to reduce the effect of camera motion on the video. This camera motion can take the form of translational movement in the \(x \), \(y \), and \(z \)-directions or rotational movement in the yaw, pitch, and roll axes.

Video stabilization real-world applications

The requirement for video stabilization is present across a wide range of domains.

Videography and photography

It plays a crucial role in both consumer and professional videography, leading to the development of a variety of mechanical, optical, and algorithmic solutions to address this need. Additionally, stabilization techniques also have applications in still image photography, allowing for handheld images to be captured with longer exposure times.

Medical field

In the medical field, video stabilization is crucial in diagnostic procedures such as endoscopy and colonoscopy, as it enables accurate determination of the location and extent of pathologies.

Military field

Similarly, in the military and robotic applications, videos captured by aerial vehicles or robots during reconnaissance missions must be stabilized for localization, navigation, target tracking, and other purposes.

2. Video stabilization – different approaches

Video stabilization techniques encompass a range of methods, including mechanical, optical, and digital approaches.

  • Mechanical video stabilization employs motion sensors such as gyroscopes and accelerometers to adjust the position of the image sensor in order to compensate for camera motion.
  • In the Optical video stabilization method, instead of moving the entire camera, stabilization is achieved by moving parts of the lens. Here, video stabilization utilizes a moveable lens assembly to adjust the path of light through the camera’s lens system in order to compensate for camera motion.
  • Digital video stabilization methods do not require the use of special sensors to estimate camera motion. This approach typically involves three stages: motion estimation, motion smoothing, and image composition. The first stage involves determining the transformation parameters between consecutive frames. The second stage filters out unwanted motion, and the final stage reconstructs the stabilized video.

In this post, we will explore an algorithm for video stabilization that is based on a two-dimensional motion model and uses a Euclidean transformation, which incorporates translation, rotation, and scaling to achieve stabilization.

What is Euclidean transformation?

Before proceeding, is essential to review the concept of 2D transformations. In the following image, we can see an illustration of such transformations. To learn more about 2D and 3D transformations, check out this post.

The Euclidean motion model, as illustrated in the image above, describes the transformation of a square in an image to any other square with a different location, size, or rotation. This model is considered to be more restrictive than affine and homography transform. However, it is more relevant for motion stabilization as it resembles the camera movement between successive frames of a video more than other transformations (camera movement in a video is typically limited).

Now, let’s dive deeper into the technique that is commonly used for video stabilization. This method is called the point feature matching technique.

3. Video stabilization based on point feature matching technique

The proposed method employs the tracking of a select number of feature points between consecutive frames to estimate the motion between these frames and subsequently compensate for it. The flowchart illustrated below outlines the basic steps involved in this process.

Now, let’s dive deeper into the coding part an see how we can apply video stabilization in Python.

4. Video stabilization with Lucas Kanade optical flow in Python

First, let’s import the necessary libraries.

import numpy as np
import cv2
import matplotlib.pyplot as plt
from google.colab.patches import cv2_imshowCode language: JavaScript (javascript)

The next step is to create the synthetic video. Using the simple video will help us to better understand the process of video stabilization.

First, we will generate a black image using the function np.zeros(), and then we will define several areas of interest with different rectangular shapes. We will do that using the following code.

# Here, we define an initial template
img1 = np.zeros((256, 256), dtype = np.uint8)
x0 = 50; y0 = 50; w = 32; h = 32; 
img1[y0:y0+h, x0:x0+w] = 255
plt.imshow(img1, cmap = 'gray')
template = img1[y0:y0+h, x0:x0+w]
x0 = 70 ; y0 = 70; w = 32; h = 32; 
img1[y0:y0+h, x0:x0+w] = 200
plt.imshow(img1, cmap = 'gray')
template = img1[y0:y0+h, x0:x0+w]
x0 = 60 ; y0 = 80; w = 32; h = 32; 
img1[y0:y0+h, x0:x0+w] = 50
plt.imshow(img1, cmap = 'gray')
template = img1[y0:y0+h, x0:x0+w]
x0 = 90 ; y0 = 90; w = 32; h = 32; 
img1[y0:y0+h, x0:x0+w] = 150
plt.imshow(img1, cmap = 'gray')
template = img1[y0:y0+h, x0:x0+w]
x0 = 40 ; y0 = 75; w = 32; h = 32; 
img1[y0:y0+h, x0:x0+w] = 220
plt.imshow(img1, cmap = 'gray')
template = img1[y0:y0+h, x0:x0+w]
img1[100:150,100:200 ] = img1[100:150,50:150].copy()
img1[100:150,100:200 ] = np.uint8(img1[100:150,100:200 ] *1.3)
plt.imshow(img1, cmap = 'gray')Code language: PHP (php)

The next step is to create a video with this image. Our goal is to move the image one pixel in the \(x \), and also one pixel in the \(y \) direction in each frame.

stacked_images = []
stacked_images.append(img1)
for i in range(20):
    # here we have a translation model
    M = np.zeros((2,3),dtype = np.float32)
    tx += 1
    ty += 1
    M[0,0] = 1
    M[1,1] = 1
    M[0,2] = tx
    M[1,2] = ty
    img2 = cv2.warpAffine(img1, M, (img1.shape))
    plt.imshow(img1, cmap = 'gray')
    stacked_images.append(img2)
frame_width = img1.shape[1]; frame_height = img1.shape[0]
out = cv2.VideoWriter('output_vlada.mp4',cv2.VideoWriter_fourcc('M','J','P','G'), 10, (frame_width,frame_height))
for i in range(len(stacked_images)):
    img_to_write = cv2.cvtColor (stacked_images[i],  cv2.COLOR_GRAY2BGR)
    out.write(img_to_write)
out.release()Code language: PHP (php)

Now, let’s take a look at our video.

Now, we are ready to proceed with our task step by step.

Step_1 – load the video

First, let’s complete the setup for reading the created input video and writing the output video.

frames = []

# Read input video
cap = cv2.VideoCapture('output_vlada.mp4')


# Get frame count
n_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) 

fps = cap.get(cv2.CAP_PROP_FPS)
 
# Get width and height of video stream
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
 
fourcc = cv2.VideoWriter_fourcc('M', 'P', '4', 'V')
 
out = cv2.VideoWriter('out.mp4', fourcc, fps, 
                      (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))))Code language: PHP (php)

Step_2 – Read the first frame and convert it to grayscale

The next step is to calculate optical flow. Optical flow is the pattern of apparent motion of image objects between two consecutive frames caused by the movement of those objects or the observer. The process of determining the optical flow involves capturing two consecutive frames of a video, estimating the motion vectors between the frames, and subsequently utilizing these vectors to correct the motion.

In other words, we need to capture two frames of a video, estimate the motion between the frames, and finally, we need to correct the motion. So here, we will capture the first frame, convert it to grayscale, and crate the variable mask. We will use this mask to draw the trajectory of the optical flow.

# Read first frame
_, prev = cap.read() 
 
# Convert frame to grayscale
prev_gray = cv2.cvtColor(prev, cv2.COLOR_BGR2GRAY)

mask = np.zeros_like(prev)Code language: PHP (php)

Step_3 – Estimate motion between two frames

This step can be divided into 3 parts:

  1. Detect the key points on the frame using Good Features to Track algorithm
  2. Calculate Lucas-Kanade optical flow
  3. Estimate motion – map the previous frame to the current frame

Detect the key points on the frame

The process of determining the optical flow is a crucial aspect of the algorithm. Specifically, it involves iterating over all frames of the video and determining the motion between each frame and its preceding frame. The Euclidean motion model is a commonly used approach in this process and requires the determination of motion vectors of only two points in the two frames. However, in practice, it is more robust to estimate the motion model by determining the motion vectors of a larger number of points, typically in the range of 50 to 100. This approach improves the accuracy and robustness of the motion estimation.

So, our first task in this step is to detect the points for tracking. As you already know, smooth regions are not good for tracking. On the other hand, textured regions with lots of corners are perfect. Fortunately, the OpenCV library has a fast feature detector that detects features that are ideal for tracking. It is called goodFeaturesToTrack(). To learn more about this method check out this post.

In the following code snippet, we will calculate our points. Also, we will define the array transforms where we will store our transformation values. Note that this array has 3 columns because we need to calculate the translational movement in the \(x \) and \(y \) direction (these two values will be stored in the first two columns) and rotation (this value will be stored in the third column).

# Pre-define transformation-store array
transforms = np.zeros((n_frames-1, 3), np.float32) 

prev_pts = cv2.goodFeaturesToTrack(prev_gray,
                                     maxCorners=200,
                                     qualityLevel=0.01,
                                     minDistance=30,
                                     blockSize=3)Code language: PHP (php)

Calculate Lucas-Kanade optical flow

Once we have detected good features in the previous frame, we can track them in the next frame using an algorithm Lucas-Kanade algorithm.

for i in range(n_frames-2):
  
  success, curr = cap.read() 
  if not success: 
    break 

  # Convert to grayscale
  curr_gray = cv2.cvtColor(curr, cv2.COLOR_BGR2GRAY) 
  curr_pts, status, err = cv2.calcOpticalFlowPyrLK(prev_gray, curr_gray, prev_pts, None) 

  # Sanity check
  assert prev_pts.shape == curr_pts.shape 
 
  # Filter only valid points
  idx = np.where(status==1)[0]
  good_old = prev_pts[idx]
  good_new = curr_pts[idx]
   
  #Find transformation matrix
  m = cv2. estimateAffine2D(good_old, good_new)

  OF_points_old.append(good_old)
  OF_points_new.append(good_new)
  
  # Extract traslation
  dx = m[0][0][2]
  dy = m[0][1][2]
 
  # Extract rotation angle
  da = np.arctan2(m[0][1][0], m[0][0][0])
 
  # Store transformation
  transforms[i] = [dx,dy,da]

  for i,(new,old) in enumerate(zip(good_new,good_old)):
    a,b = new.ravel()
    c,d = old.ravel()
    
    mask = cv2.line(mask, (int(a),int(b)),(int(c),int(d)), (0,0,255), 1)
    curr = cv2.circle(curr,(int(a),int(b)),5, (0,0,255),-1)


  img = cv2.add(curr,mask)
  frames.append(img)

  # Now update the previous frame and previous points
  prev_pts = good_new.reshape(-1,1,2)
  prev_gray = curr_gray.copy()

 
  #print("Frame: " + str(i) +  "/" + str(n_frames) + " -  Tracked points : " + str(len(prev_pts)))
  out.write(img)
# Release video
cap.release()
out.release()
# Close windows
cv2.destroyAllWindows()Code language: PHP (php)

Estimate motion

Given the determined location of features in the current frame and their corresponding location in the previous frame, it is possible to estimate the rigid (Euclidean) transformation that maps the previous frame to the current frame, using a function such as estimateAffine2D(). This transformation is a mathematical representation of the motion between the frames and can be decomposed into its constituent components, namely the \(x \) and \(y \) translation and rotation (angle). The translation values are then stored in two arrays, dx and dy which allows for the smooth modification of these motion parameters.

After calculating the transformation matrix m, we will just draw the optical flow points in our original video. Let’s see the result.

As you can see the algorithm selected the 8 keypoints. Then, we calculated optical flow and displayed those 8 points on the video. Also, for better understanding, we can plot the displacement of each of these points along the \(x \) and \(y \) axis. Let’s have a look.

Step_4 – Calculate smooth motion between frames

In the previous step, we estimated the relative displacement between consecutive frames and subsequently stored the estimated values in an array. In the current step, we aim to determine the overall trajectory of motion by cumulatively summing the differential displacements. The ultimate goal of this process is to smooth out the trajectory. This can be achieved by utilizing the cumulative sum function cumsum() from the NumPy library in Python. Now, let’s explain why we need to apply this process of smoothing in a little bit more detail. First, let’s calculate trajectory and plot the translation values along the \(x \) and \(y \) axis over time. Note that for these curves we used a different video because, in the current video, we have a movement of only one pixel along the \(x \), and also one pixel along the \(y \) axis.

# Compute trajectory using cumulative sum of transformations
trajectory = np.cumsum(transforms, axis=0) Code language: PHP (php)
l = len(trajectory)
time = np.arange(0, l, 1, dtype=int)
x = trajectory[:,0]
y = trajectory[:,1]

plt.plot(time,x)
plt.plot(time,y)Code language: CSS (css)

In this step, we aim to smooth these curves in order to reduce noise and improve the overall interpretability of the data. A common method for achieving this is through the use of a moving average filter. This technique involves replacing the value of a function at a given point with the average of its neighboring values, as defined by a predefined window size. This has the effect of smoothing out the curve by reducing the impact of high-frequency fluctuations. Let’s take a look at an example of this process for further clarification.

Let’s say we have stored the original curve in an array \(c \). The smooth curve \(f \) can be obtained by filtering $laex c $ with a moving average box filter of size 5.

$$ f[k]=\frac{c[k-2]+c[k-1]+c[k]+c[k+1]+c[k+2]}{5} $$

In the Python implementation, we can achieve this by defining the movingAverage() moving average function that can be used for any input curve and it will return the smoothed version of the curve.

Moreover, we define a function smooth() that takes in the trajectory and performs smoothing on the three components (translation along \(x \), translation along \(y \), and rotational movement).

We also need to define another function called fixBorder(), because when stabilizing a video, it is common to observe black boundary artifacts due to the necessary shrinking of frames during the stabilization process. To mitigate this issue, one approach is to scale the video slightly about its center. This can be achieved by applying a small scaling factor.

This function utilizes the getRotationMatrix2D() function, which allows for both rotation and scaling of an image without altering its center. To achieve the desired scaling effect, this function is called with a rotation value of 0 and a scaling factor of 1.04.

def movingAverage(curve, radius): 
  window_size = 2 * radius + 1
  # Define the filter 
  f = np.ones(window_size)/window_size 
  # Add padding to the boundaries 
  curve_pad = np.lib.pad(curve, (radius, radius), 'edge') 
  # Apply convolution 
  curve_smoothed = np.convolve(curve_pad, f, mode='same') 
  # Remove padding 
  curve_smoothed = curve_smoothed[radius:-radius]
  # return smoothed curve
  return curve_smoothed 

def smooth(trajectory): 
  smoothed_trajectory = np.copy(trajectory) 
  # Filter the x, y and angle curves
  for i in range(3):
    smoothed_trajectory[:,i] = movingAverage(trajectory[:,i], radius=SMOOTHING_RADIUS)

  return smoothed_trajectory

def fixBorder(frame):
  s = frame.shape
  # Scale the image 4% without moving the center
  T = cv2.getRotationMatrix2D((s[1]/2, s[0]/2), 0, 1.04)
  frame = cv2.warpAffine(frame, T, (s[1], s[0]))
  return frameCode language: PHP (php)

The next step is to call the function smooth() to obtain smooth transforms. Then we will calculate the difference between the smooth trajectory and the original trajectory and add this difference back to the original transforms.

# Create variable to store smoothed trajectory
smoothed_trajectory = smooth(trajectory) 

# Calculate difference in smoothed_trajectory and trajectory
difference = smoothed_trajectory - trajectory
 
# Calculate newer transformation array
transforms_smooth = transforms + differenceCode language: PHP (php)

Now, let’s plot our smoothed curves in the \(x \) and the \(y \) direction.

As you can see the smoothed curves look pretty similar to the original ones but that is due to the fact that we are using a very simple video.

Step_5 – Apply smoothed motion to frames

The last step that we need to apply is to again loop over the frames and apply the transforms we just calculated. We can do that with the following code.

cap = cv2.VideoCapture('output_vlada.mp4')

fourcc = cv2.VideoWriter_fourcc('M', 'P', '4', 'V')
 
out1 = cv2.VideoWriter('out1.mp4', fourcc, fps, 
                      (2*int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))))
 
# Write n_frames-1 transformed frames
for i in range(n_frames-2):
  # Read next frame
  success, frame = cap.read() 
  if not success:
    break

  # Extract transformations from the new transformation array
  dx = transforms_smooth[i,0]
  dy = transforms_smooth[i,1]
  da = transforms_smooth[i,2]

  # Reconstruct transformation matrix accordingly to new values
  m = np.zeros((2,3), np.float32)
  m[0,0] = np.cos(da)
  m[0,1] = -np.sin(da)
  m[1,0] = np.sin(da)
  m[1,1] = np.cos(da)
  m[0,2] = dx
  m[1,2] = dy

  # Apply affine wrapping to the given frame
  frame_stabilized = cv2.warpAffine(frame, m, (w,h))

  # Fix border artifacts
  frame_stabilized = fixBorder(frame_stabilized) 

  # Write the frame to the file
  frame_out = cv2.hconcat([frames[i], frame_stabilized])

  out1.write(frame_out)
  #cv2_imshow(frame_out)
  cv2.waitKey(10)
  

# Release video
cap.release()
out1.release()
# Close windows
cv2.destroyAllWindows()Code language: PHP (php)

Now, let’s see our output video. On the left side, we will see the original frames and on the right side, we will see the stabilized frames.

As you can see, we obtain a very good result. But this is such a simple video with a simple movement. Let’s try to apply the stabilization method to a much more complex video. We will load a video of a basketball game and try to stabilize the camera. This will be extremely helpful if we want to track basketball players on the court. Let’s see the final result.

Final result

As you can see the result is not as good as the previous one. That is because the current method is optimized for the processing of videos of a fixed duration and may not be suitable for real-time applications. To adapt this method for real-time video output, significant modifications would be necessary to ensure the efficient and timely processing of the data. These modifications may include the implementation of techniques such as parallel processing, or the incorporation of specialized hardware, such as graphics processing units (GPUs), to improve performance.

Summary

So, that is all. In this post, we explored a very interesting topic. We talked about a video processing technique used to eliminate the shakiness of the video. After covering the basic theory behind this method we provided a step-by-step explanation of applying this method in Python. Finally, we concluded that this method provides good results against low-frequency motion, but it does not perform so well if there is a heavy camera motion in a video.

References:

[1] Video Stabilization Using Point Feature Matching in OpenCV