#001 OpenCV projects – Face Tracking with OpenCV using Haar cascade detectors
Highlights: This will be our introductory post where we will explain the difference between object detection and object tracking. We will learn the easiest way to track objects and in this case, it will be a face detection. We will admit that we are cheating a bit because we are not using tracking algorithms. Due to the simplicity of this post as a simple introduction, we are going to use Viola-Jones face detection algorithm. In one of our previous posts, we have already explained in detail what Haar cascade classifiers are and how to use them to detect faces, eyes, and smiles.
- What is object tracking?
- What are Haar Cascade classifiers?
- Tracking faces with Haar Cascade classifiers
1. What is object tracking?
First, we will start to clarify the idea and concept of object tracking. The object tracking is a process of locating an object in consecutive video frames. When the object is detected we also get a bounding box for that object. This is done for every object in the image or video that we are trying to detect and track. Then, we use the object tracking algorithm which assigns an ID to each object identified in the first frame of the video. Finally, the algorithm tries to carry across this ID in consecutive frames and identify the new position of the same object.
Difference between object detection and object tracking
Now you’re probably wondering what the difference between object detection and object tracking is. Let’s explore some of them.
- Object tracking algorithms are much faster than object detection algorithms.
When we use object detection, it detects an object in every frame. So, basically it is a series of repeated detections. On the other hand, object tracking only requires object detection in the first frame. After that detection, the algorithm has enough information about the object. It learns about the appearance of the object, the location in the previous frame, and the direction and the velocity of its motion. Then, the algorithm can use all this information to predict the location of the object in the next frame. That is why this method is much faster than object detection.  
- Object tracking algorithms are more robust to occlusion than detection algorithms.
In a case where the detected object goes behind the obstacle, or it disappears from the boundaries of the video frame, object detection algorithms will not be able to pick them up. On the other hand, a tracking algorithm will be able to handle these kinds of obstacles if there are happened for a short period of time.
- Object detection algorithms are able to count the number of desirable objects in a video.
Why do we use Face Tracking?
Face tracking can be used in several different real time situations. For example, it can be used by retailers to count the number of visitors and to track the movement of visitors through their stores. This data can then be used to optimize store layout, staffing, and restocking of shelves.
Face tracking can also be used in advertising to determine how many people are viewing certain products. After that, we can also determine their gender, age group, and even attention time. Insights gathered through face tracking technology helps businesses to find the best locations for displays, custom-tailor content in real-time, and understand audience engagement levels.
2. What are Haar Cascade classifiers?
Two decades ago, face detection was a tricky job. It was ongoing research at that time, and you had to be a great programming expert to be able to perform object detection on images. Then in 2002, Paul Viola and Michael Jones came up with the research article entitled “Rapid Object Detection using a Boosted Cascade of Simple Feature”. This work revolutionized image processing and it is still the most commonly used method for face and image processing.
Thanks to Viola and Jones, the process of face detection became much faster and accurate. They came up with Haar Cascade (Viola-Jones algorithm) – a machine learning object detection algorithm that can be used to identify objects in real-time. It consists of many simple features called Haar features which are used to determine whether the object (face, eyes) is present in the image/video or not. This method is still very popular and commonly used in practice. In this post, we are going to use Viola-Jones algorithm for face detection.
3. Tracking faces with Haar Cascade classifiers
It is important to note that in this post we are going to use an object detection algorithm for the purpose of tracking. In our future posts, we are also going to cover object tracking algorithms, and then we will see why they are more suitable for this application.
First let’s import the necessary libraries.
import cv2 from google.colab.patches import cv2_imshow
Next, we will load our video and create a video capture object with the function
cv2.VideoCapture(). If you need more information on how to read a video in Python with OpenCV, check out the following post.
# Load a video # If you are working in Google colab, you need to download the video into your notebook vid_location = "vid.mp4" cap = cv2.VideoCapture(vid_location)
Now, we need to load our Haar cascade files. The easiest way to do that is by using the function
cv2.CascadeClassifier(). In this function, we have a parameter
cv2.data.haarcascades which will return the location of the Haar Cascade files. Note that OpenCV already contains these files by default. All you need to do is to choose which Haar cascade you need. Here, we choose a frontal face, left eye, right eye, and smile.
# Load our classifiers face_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_frontalface_default.xml") l_eye_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_lefteye_2splits.xml") r_eye_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_righteye_2splits.xml") smile_classifier = cv2.CascadeClassifier(cv2.data.haarcascades + "haarcascade_smile.xml")
The next step is to create a
cv2.VideoWriter() object. Using the method
cap.get() we can take the width and height of the input video and then use it as the parameter for the output video. In that way we’ll manage to create the output video with the same width and height as the original one. Also, we need to define fourcc codec and the number of frames which in our case is 20.
# Grab the width and height of the input video frame_width = int(cap.get(3)) frame_height = int(cap.get(4)) # Create the video writer for MP4 format out = cv2.VideoWriter('outpy.mp4',cv2.VideoWriter_fourcc(*'MP4V'), 20.0, (frame_width,frame_height))
The following code creates a while loop that reads frames from our video continuously. First, we need to convert the video frame into a grayscale. To extract coordinates of a rectangle that we are going to draw around the detected face, we need to create a variable
faces. In this object, we are going to store our detected faces (face rectangles). With a function
detectMultiScale() we will obtain a tuple of four elements: \(x \) and \(y \) are coordinates of a top left corner, and \(w \) and \(h \) are width and height of the rectangle. This method requires several arguments:
- gray image – The input image on which we will detect faces.
- scaleFactor – Parameter specifying how much the image size is reduced or increased at each scale
- minNeighbors – Parameter specifying how many neighbors each candidate rectangle should have to retain it. This parameter will affect the quality of the detected faces
# Iterate while the video lasts while cap.isOpened(): # Read frames ret, frame = cap.read() # Check if the video is still lasting if ret==True: # Convert frame to Grayscale gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) # Find faces in that frame faces = face_classifier.detectMultiScale(gray, 1.1, 6)
Now, we are crating a
for loop in order to iterate through every detected face.
for (x, y, w, h) in faces:
With the function
cv2.rectangle()we will draw rectangle around each detected face.
# Draw a rectangle around the face cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 0, 255), 2)
To detect the eyes, first, we need to create two regions of interest which will be located inside the rectangle. We need the first region for the gray frame, where we going to detect the eyes (it is much easier and faster to detect faces in a grayscale image than in a color one), and the second region will be used for the color frame where we are going to draw circles.
# Select the region of the face, both gray and RGB roi_gray = gray[y:y+h, x:x+w] roi_color = frame[y:y+h, x:x+w]
In the folowing line of code we are detecting left and right eye and drawing circles around them. For more detailed explanation on how to draw basic shapes and write text on images click on this link.
# Detect the left eye and draw a circle around it l_eye = l_eye_classifier.detectMultiScale(roi_gray, 1.1, 10) for (ex, ey, ew, eh) in l_eye: cv2.circle(roi_color, ((ex + ex+ew)//2, (ey + ey+eh)//2), 10, (255, 255, 0), 1) # Detect the right eye and draw a cricle around it also r_eye = r_eye_classifier.detectMultiScale(roi_gray, 1.1, 10) for (ex, ey, ew, eh) in r_eye: cv2.circle(roi_color, ((ex + ex+ew)//2, (ey + ey+eh)//2), 10, (255, 0, 255), 1)
Now, we can apply the identical method for smile detection. Then, we will save our video and brake the loop when the video is finished.
# Detect smile and draw rectangle around it smile = smile_classifier.detectMultiScale(roi_gray, 1.8, 20) for (sx, sy, sw, sh) in smile: cv2.rectangle(roi_color, (sx, sy), (sx+sw, sy+sh), (255, 0, 0), 2) # Save the new video out.write(frame) print("Out!") else: break
Finlay, we need to release/close all opened files and save them.
# Close all opened files and save them cap.release() out.release() cv2.destroyAllWindows()
In this post, we learned how to track faces using Haar cascade classifiers. In the next post, we will examine one interesting an very popular technique. We will learn how to apply filters and edge detectors to create a cartoon effect on images.
 Object Tracking using OpenCV (C++/Python) by Satya Mallick
 OpenCV: Computer Vision Projects with Python by Joseph Howse, Prateek Joshi, Michael Beyeler
 The Not So Scary World of Face Detection in Digital OOH