datahacker.rs@gmail.com

#001 OpenCV projects – Face Tracking with OpenCV using Haar cascade detectors

#001 OpenCV projects – Face Tracking with OpenCV using Haar cascade detectors

Highlights: This will be our introductory post where we will explain the difference between object detection and object tracking. We will learn the easiest way to track objects and in this case, it will be a face detection. We will admit that we are cheating a bit because we are not using tracking algorithms. Due to the simplicity of this post as a simple introduction, we are going to use Viola-Jones face detection algorithm. In one of our previous posts, we have already explained in detail what Haar cascade classifiers are and how to use them to detect faces, eyes, and smiles.

Tutorial overview:

  1. What is object tracking?
  2. What are Haar Cascade classifiers?
  3. Tracking faces with Haar Cascade classifiers

1. What is object tracking?

First, we will start to clarify the idea and concept of object tracking. The object tracking is a process of locating an object in consecutive video frames. When the object is detected we also get a bounding box for that object. This is done for every object in the image or video that we are trying to detect and track. Then, we use the object tracking algorithm which assigns an ID to each object identified in the first frame of the video. Finally, the algorithm tries to carry across this ID in consecutive frames and identify the new position of the same object.

Difference between object detection and object tracking

Now you’re probably wondering what the difference between object detection and object tracking is. Let’s explore some of them.

  • Object tracking algorithms are much faster than object detection algorithms

When we use object detection, it detects an object in every frame. So, basically it is a series of repeated detections. On the other hand, object tracking only requires object detection in the first frame. After that detection, the algorithm has enough information about the object. It learns about the appearance of the object, the location in the previous frame, and the direction and the velocity of its motion. Then, the algorithm can use all this information to predict the location of the object in the next frame. That is why this method is much faster than object detection. [1] [2]

  • Object tracking algorithms are more robust to occlusion than detection algorithms.

In a case where the detected object goes behind the obstacle, or it disappears from the boundaries of the video frame, object detection algorithms will not be able to pick them up. On the other hand, a tracking algorithm will be able to handle these kinds of obstacles if there are happened for a short period of time.

  • Object detection algorithms are able to count the number of desirable objects in a video. 

Why do we use Face Tracking?

Face tracking can be used in several different real time situations. For example, it can be used by retailers to count the number of visitors and to track the movement of visitors through their stores. This data can then be used to optimize store layout, staffing, and restocking of shelves.

Face tracking can also be used in advertising to determine how many people are viewing certain products. After that, we can also determine their gender, age group, and even attention time. Insights gathered through face tracking technology helps businesses to find the best locations for displays, custom-tailor content in real-time, and understand audience engagement levels.

face tracking
Coca-cola smile [3]

2. What are Haar Cascade classifiers?

Two decades ago, face detection was a tricky job. It was ongoing research at that time, and you had to be a great programming expert to be able to perform object detection on images. Then in 2002, Paul Viola and Michael Jones came up with the research article entitled “Rapid Object Detection using a Boosted Cascade of Simple Feature”. This work revolutionized image processing and it is still the most commonly used method for face and image processing.

Thanks to Viola and Jones, the process of face detection became much faster and accurate. They came up with Haar Cascade (Viola-Jones algorithm) – a machine learning object detection algorithm that can be used to identify objects in real-time. It consists of many simple features called Haar features which are used to determine whether the object (face, eyes) is present in the image/video or not. This method is still very popular and commonly used in practice. In this post, we are going to use Viola-Jones algorithm for face detection.

3. Tracking faces with Haar Cascade classifiers

It is important to note that in this post we are going to use an object detection algorithm for the purpose of tracking. In our future posts, we are also going to cover object tracking algorithms, and then we will see why they are more suitable for this application.

First let’s import the necessary libraries.

import cv2
from google.colab.patches import cv2_imshow

Next, we will load our video and create a video capture object with the function cv2.VideoCapture(). If you need more information on how to read a video in Python with OpenCV, check out the following post.

# Load a video
# If you are working in Google colab, you need to download the video into your notebook
vid_location = "vid.mp4"
cap = cv2.VideoCapture(vid_location)

Now, we need to load our Haar cascade files. The easiest way to do that is by using the function cv2.CascadeClassifier(). In this function, we have a parameter cv2.data.haarcascades which will return the location of the Haar Cascade files. Note that OpenCV already contains these files by default. All you need to do is to choose which Haar cascade you need. Here, we choose a frontal face, left eye, right eye, and smile.

# Load our classifiers
face_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml") 
l_eye_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_lefteye_2splits.xml")
r_eye_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_righteye_2splits.xml")
smile_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_smile.xml")

The next step is to create a cv2.VideoWriter() object. Using the method cap.get() we can take the width and height of the input video and then use it as the parameter for the output video. In that way we’ll manage to create the output video with the same width and height as the original one. Also, we need to define fourcc codec and the number of frames which in our case is 20.

# Grab the width and height of the input video
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))

# Create the video writer for MP4 format
out = cv2.VideoWriter('outpy.mp4',cv2.VideoWriter_fourcc(*'MP4V'), 20.0, (frame_width,frame_height))

The following code creates a while loop that reads frames from our video continuously. First, we need to convert the video frame into a grayscale. To extract coordinates of a rectangle that we are going to draw around the detected face, we need to create a variable faces. In this object, we are going to store our detected faces (face rectangles). With a function detectMultiScale() we will obtain a tuple of four elements: \(x \) and \(y \) are coordinates of a top left corner, and \(w \) and \(h \) are width and height of the rectangle. This method requires several arguments:

  • gray image – The input image on which we will detect faces.
  • scaleFactor – Parameter specifying how much the image size is reduced or increased at each scale
  • minNeighbors – Parameter specifying how many neighbors each candidate rectangle should have to retain it. This parameter will affect the quality of the detected faces
# Iterate while the video lasts
while cap.isOpened():

  # Read frames
  ret, frame = cap.read()

  # Check if the video is still lasting
  if ret==True:

    # Convert frame to Grayscale
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    # Find faces in that frame

    faces = face_classifier.detectMultiScale(gray, 1.1, 6)

Now, we are crating a for loop in order to iterate through every detected face.

for (x, y, w, h) in faces:

With the function cv2.rectangle()we will draw rectangle around each detected face.

      # Draw a rectangle around the face
      cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 0, 255), 2)

To detect the eyes, first, we need to create two regions of interest which will be located inside the rectangle. We need the first region for the gray frame, where we going to detect the eyes (it is much easier and faster to detect faces in a grayscale image than in a color one), and the second region will be used for the color frame where we are going to draw circles.

 # Select the region of the face, both gray and RGB
      roi_gray = gray[y:y+h, x:x+w]
      roi_color = frame[y:y+h, x:x+w]

In the folowing line of code we are detecting left and right eye and drawing circles around them. For more detailed explanation on how to draw basic shapes and write text on images click on this link.

 # Detect the left eye and draw a circle around it 
      l_eye = l_eye_classifier.detectMultiScale(roi_gray, 1.1, 10)
      for (ex, ey, ew, eh) in l_eye:
        cv2.circle(roi_color, ((ex + ex+ew)//2,  (ey + ey+eh)//2), 10, (255, 255, 0), 1)
      
      # Detect the right eye and draw a cricle around it also
      r_eye = r_eye_classifier.detectMultiScale(roi_gray, 1.1, 10)
      for (ex, ey, ew, eh) in r_eye:
        cv2.circle(roi_color, ((ex + ex+ew)//2,  (ey + ey+eh)//2), 10, (255, 0, 255), 1)

Now, we can apply the identical method for smile detection. Then, we will save our video and brake the loop when the video is finished.

 # Detect smile and draw rectangle around it
      smile = smile_classifier.detectMultiScale(roi_gray, 1.8, 20)
      for (sx, sy, sw, sh) in smile:
        cv2.rectangle(roi_color, (sx, sy), (sx+sw, sy+sh), (255, 0, 0), 2)
      
      # Save the new video
    out.write(frame)
    print("Out!")
  else:
    break

Finlay, we need to release/close all opened files and save them.

# Close all opened files and save them
cap.release()
out.release()
cv2.destroyAllWindows()

Output:

Summary

In this post, we learned how to track faces using Haar cascade classifiers. In the next post, we will examine one interesting an very popular technique. We will learn how to apply filters and edge detectors to create a cartoon effect on images.

References:

[1] Object Tracking using OpenCV (C++/Python) by Satya Mallick

[2] OpenCV: Computer Vision Projects with Python by Joseph Howse, Prateek Joshi, Michael Beyeler

[3] The Not So Scary World of Face Detection in Digital OOH