#017 Face detection algorithms comparison

#017 Face detection algorithms comparison

Highlights: Researchers and computer vision practitioners have generally developed many face detection algorithms. In this post, we will use 5 of the most popular algorithms. Also, we will compare their detection accuracy rate. Additionally, we will measure the runtime of these algorithms.

Facial detection is a technique used by computer algorithms to detect a person’s face through images. Accordingly, the objective of facial detection is to get different features of human faces from images. Due to the popularity of social networks and smart gadgets, the importance of facial recognition becomes more evident.

Overview:

  1. OpenCV Haarcascade
  2. OpenCV DNN (Deep Neural Network)
  3. Detecting faces using Dlib
  4. Mtcnn in Python
  5. Detecting faces with Facenet
  6. Speed and accuracy comparison of face detection algorithms

1. OpenCV Haarcascade

It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. Then, it can be used on any image we want to detect faces in. It is well known for being able to detect faces and face parts in an image, but can be trained to detect a vast majority of objects.

When you install OpenCV for the first time on your local machine you will get the haar cascade files as well. Normally, they are located in your Python directory/site-packages/cv2/data (example “python3.8/site-packages/cv2/data/haarcascade_frontalface_default.xml”). You should find this path as it is very recommended to use it explicitly in your python script.

So here is our basic python implementation of how OpenCV haarcascade is put to work:

import numpy as np
import cv2
 
#Load the haarcascade file
cascPath = "/home/cale/.local/lib/python3.8/site-packages/cv2/data/haarcascade_frontalface_default.xml"
faceCascade = cv2.CascadeClassifier(cascPath)
 
cap = cv2.VideoCapture("data.mp4")
 
while(True):
   ret, frame = cap.read()
   frame = cv2.resize(frame, (600, 400))
 
   faces = faceCascade.detectMultiScale2(frame, scaleFactor=1.1, minNeighbors=5, flags=cv2.CASCADE_SCALE_IMAGE)
 
   for (x, y, w, h) in faces[0]:
       conf = faces[1][0][0]
       if conf > 5:
           text = f"{conf*10:.2f}%"
           cv2.putText(frame, text, (x, y-20), 
                       cv2.FONT_HERSHEY_SIMPLEX, 1,(170, 170, 170), 1)

           cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 255, 255), 1)
 
   cv2.imshow("Frame", frame)
   if cv2.waitKey(25) & 0xFF == ord('q'):
       break
 
cap.release()
cv2.destroyAllWindows()

2. OpenCV DNN (Deep Neural Network)

In addition to OpenCV’s haarcascade filter based detection algorithm, OpenCV has released a dnn module, which stands for deep neural network. In previous posts we have explained Deep Learning in great depth, so feel free to check them out using the links below.

Convolutional neural networks

We have used this algorithm, based on deep learning, in our earlier post.

3. Detecting a face using Dlib

Dlib is a very useful and practical toolkit for making real world machine learning and data analysis applications. It is a CNN based detector and it is generally capable of detecting faces from almost all angles.

By typing http://dlib.net/files/mmod_human_face_detector.dat.bz2 in your browser you can download the weights to your local machine. Furthermore, if you want to implement this algorithm in google colab, you will definitely need to upload the weights to your google drive and use it from there.

I.E. let’s see how we can use this algorithm in Python:

import dlib
import cv2
 
#We create the model here with the weights placed as parameters
face_detect = 
dlib.cnn_face_detection_model_v1("/content/mmod_human_face_detector.dat")

cap = cv2.VideoCapture("data.mp4")
 
while True:
   ret, frame = cap.read()
   frame = cv2.resize(frame, (600, 400))
 
   faces = face_detect(frame, 1)
   for face in faces:
       # In dlib in order to extract points we need to do this
       x1 = face.rect.left()
       y1 = face.rect.bottom()
       x2 = face.rect.right()
       y2 = face.rect.top()
       cv2.rectangle(frame, (x1, y1), (x2, y2), (255, 0, 0), 1)
 
   cv2.imshow("Frame", frame)
   if cv2.waitKey(25) & 0xFF == ord('q'):
       break
 
cap.release()
cv2.destroyAllWindows()

4. MTCNN for face detection

MTCNN or Multi-Task Cascaded Convolutional Neural Network is unquestionably one of the most popular and most accurate face detection tools today. As such, it is based on a Deep learning architecture, it specifically consists of 3 neural networks (P-Net, R-Net, and O-Net) connected in a cascade.

So, let’s see how we can use this algorithm in Python to detect faces.

#You can install mtcnn using PIP by typing "pip install mtcnn"
from mtcnn import MTCNN
import cv2
 
detector = MTCNN()
#Load a video, if we were using google colab we would
#need to upload the video to Google Colab
cap = cv2.VideoCapture("data.mp4")
 
while(True):
   ret, frame = cap.read()
   frame = cv2.resize(frame, (600, 400))
   boxes = detector.detect_faces(frame)
   if boxes:
 
       box = boxes[0]['box']
       conf = boxes[0]['confidence']
       x, y, w, h = box[0], box[1], box[2], box[3]
 
       if conf > 0.5:
           cv2.rectangle(frame, (x, y), (x+w, y+h), (255, 255, 255), 1)
 
   cv2.imshow("Frame", frame)
   if cv2.waitKey(25) & 0xFF == ord('q'):
       break
 
cap.release()
cv2.destroyAllWindows()

5. Detecting faces with Facenet

Facenet is a face detection system that can be described as a unified embedding for Face detection and Clustering. It is a system that, when given a picture of a face, it will extract high-quality features from the face. This 128 element vector is used for future prediction and detection of faces, and it is generally known as face-embedding.

This model is a deep convolutional neural network that uses a triplet loss function for training . It encourages vectors of the same identity to become more similar, whereas vectors of different identities are expected to become less similar.

The focus on training a model is to create embeddings directly, rather than to extract them from intermediate layers of a model. Additionally, it was a very important and insightful innovation in this work.

Without further delay, let’s see how we can use this algorithm to detect faces in python.

#You can install facenet using PIP by typing "pip install facenet-pytorch"
#Import the required modules
from facenet_pytorch import MTCNN
import torch
import cv2

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
 
#Create the model
mtcnn = MTCNN(keep_all=True, device=device)
 
#Load the video and go from frame to frame
cap = cv2.VideoCapture("data.mp4")
while True:
   ret, frame = cap.read()
   if ret:
       frame = cv2.resize(frame, (600, 400))

      #Here we are going to use the facenet detector
       boxes, conf = mtcnn.detect(frame)

      # If there is no confidence that in the frame is a face, don't draw a rectangle around it
       if conf[0] !=  None:
           for (x, y, w, h) in boxes:
               text = f"{conf[0]*100:.2f}%"
               x, y, w, h = int(x), int(y), int(w), int(h)
 
               cv2.putText(frame, text, (x, y-20), 
                       cv2.FONT_HERSHEY_SIMPLEX, 1,(170, 170, 170), 1)
              cv2.rectangle(frame, (x, y), (w, h), (255, 255, 255), 1)
   else:
       break

#Show the result
#If we were using Google Colab we would use their function cv2_imshow()

#For displaying images/frames
   cv2.imshow("Frame", frame)
   if cv2.waitKey(25) & 0xFF == ord('q'):
       break
 
cap.release()
cv2.destroyAllWindows()

6. Speed and accuracy comparison of face detection algorithms

We have tested all 5 algorithms using the same video. For each algorithm we combined all detections and also compared them. It’s important to note that we haven’t done a very precise comparison, but more of a fast analysis to get a role of a thumb resolved. Surely, the results will be worth your time.

The output of each algorithm gave us two points, those two points we used to draw a rectangle around each face. Then we examined these points and checked whether there’s an overlap of these rectangles between algorithms.

The following table illustrates our results. It shows how many detections of one algorithm overlap with the detection of other algorithms. In addition, on the main diagonal we also counted the total number of detections.

FacenetMtcnnDlibOpenCV_DNNOpenCV_Haar
Facenet18121228742739834
Mtcnn122818008587661059
Dlib7428581792611537
OpenCV_DNN7397666111770584
OpenCV_Haar83410595375841605

From this table, we can see that they all have made roughly the same number of detections as a result. Facenet and Mtcnn definitely have the most overlaps, 1228 identical overlaps, whereas Dlib and OpenCV_Haar have only 537 overlaps.

Let’s see an example how two algorithms perform on the same video overall. We will use Facenet and Mtcnn algorithms and display their detections in a video accordingly. We can see that most of the time detections overlap, also we see that sometimes at short intervals false detections happen.

One of the most important characteristics that is important to us is the speed of an algorithm. Therefore when we want to choose between face detection algorithms, depending on our application, execution time can be crucial to us. In the following graph, we compared the total time that the algorithms needed to process the video. Also, the length of the video was 1 minute and 20 seconds.

Obviously, the Dlib algorithm needed the shortest time to process the video. On the other hand, MTCNN took the longest.

Summary

In this post, we analyzed various facial detection algorithms. We have seen how many detections each algorithm made in addition to their execution times. To conclude, if we want a fast face detection algorithm we should use Dlib. On the other hand, if we want an algorithm to detect a large number of faces our choice can be Facenet or Mtcnn.