#011 How to detect eye blinking in videos using dlib and OpenCV in Python
We already learned what facial landmarks are and how to detect them. Now we are ready to expand that knowledge and put it into the practice to solve other similar problems. In this post we are going to explain how we can detect and count eye blinking in videos. To develop an eye blinking detector, we need to detect facial landmarks of the eyes, and then we need to calculate the aspect ratio between these landmarks. So, let us start with our post in which we will explain how to create an eye blink detector.
Tutorial Overview:
- How to calculate the Eye Aspect Ratio (EAR)?
- How to develop eye blink detector using dlib and OpenCV?
1. How to calculate the Eye Aspect Ratio (EAR)?
In this post we will use facial landmark detection to detect specific 68 points on the face. This post is inspired by the folowing posts which are further modified and improved: [1] and [2]. By knowing the indexes of these points, we can use the same method and select a specific area of the face (e.g. eyes, mouth, eyebrows, nose, ears). To create an eye blink detector, eyes will be the area on the face that we are interested in. We can divide the process of developing eye blink detector into following steps:
- Detecting the face in the image
- Detecting facial landmarks of interest (the eyes)
- Calculating eye width and height
- Calculating eye aspect ratio (EAR) – relation between the width and the height of the eye
- Displaying the eye blink counter in the output video
In the given section we will learn what eye aspect ratio is and how to calculate it with a help of basic geometry.
What is EAR?
Real-Time Eye Blink Detection using Facial Landmarks is the research paper published in 2016 by Tereza Soukupova and Jan Cech from the Faculty of Electrical Engineering, Czech Technical University in Prague. Authors developed a real-time algorithm to detect eye blinks in a video sequence. A key part of this algorithm is the eye aspect ratio (EAR) which can be used to determine whether a person blinks or not in the given video frame. For better understanding of this concept, let us look in the following image.
In this image we can see the eye which is represented by a set of 6 labeled facial points with specific coordinates. Horizontal line is distance between points \(p_{1} \) and \(p_{4} \) (width of an eye), and vertical line is distance between middle of points \(p_{2} \) and \(p_{3} \) and middle of points \(p_{6} \) and \(p_{5} \) (height of an eye.) The length of the horizontal line will always be a constant, while the length of the vertical line will change depending on the opening and closing of the eye. We can detect blinking by calculating the length of these two lines and then finding the ratio between them. This ratio will be approximately constant while the eye is open, and it will quickly fall to zero when a blink occurs.
In the image on the left we can see that aspect ratio will be larger and relatively constant over time. On the other hand, in the second image we can see that aspect ratio will be almost equal to zero which indicates that the person is blinking at that moment. We can calculate the aspect ratio with the following equation:
Now, let us see how we can implement this in using python.
2. How to develop eye blink detector using dlib and OpenCV?
Let us first import necessary packages.
# Necessary imports
import cv2
import numpy as np
import matplotlib.pyplot as plt
import dlib
from google.colab.patches import cv2_imshow
To detect the faces and eyes in our video we need to call a frontal face detector dlib.get_frontal_face_detector()
and facial landmark predictor dlib.shape_predictor
from dlib library. We already explained what facial landmark predictor is in the previous post entitled: How to detect facial landmarks using DLIB and Open CV.
# Initializing the face detector and facial landmark predictor
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
With the help of following command, you can download and unzip dlib.shape_predictor
directly to your python script.
# Downloading and unzipping facial landmark predictor
!wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
!bunzip2 "shape_predictor_68_face_landmarks.dat.bz2"
Next, we will load the video, define the fourcc codec and create a VideoWriter
object. Additionally, we will define a font that we will use later when we will display a number of blinks in the video.
# Creating a VidoCapture and VideoWriter object
cap=cv2.VideoCapture("Blinking.mp4")
fourcc = cv2.VideoWriter_fourcc('M', 'P', '4', 'V')
out = cv2.VideoWriter('output.mp4', fourcc, 29, (1080,1920))
font=cv2.FONT_HERSHEY_SIMPLEX
Now, it is time to detect the face and facial landmarks. But first we need a while loop to load the frames from the video. Inside the loop we will create object faces and landmarks in which we will store our detected faces and facial landmarks. Also, we converted a color frame into a gray frame. We are using this gray frame to detect face landmarks.
# Creating a while loop
while True:
ret, frame = cap.read()
if ret == False:
break
# Converting a color frame into a grayscale frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Creating an object in which we will sore detected faces
faces = detector(gray)
for face in faces:
x, y = face.left(), face.top()
x1, y1 = face.right(), face.bottom()
# Creating an object in which we will store the detected facial landmarks
landmarks = predictor(gray, face)
So, we detected the face and all 68 facial landmarks on that face. But as our goal is to detect blinking, we just need to extract these points around the eye regions.
In order to detect blinking we need to calculate the length of horizontal and vertical lines. Horizontal line length of the left eye will be equal to the distance between point 36 and point 39, and vertical line length will be equal to the distance between middle point between points 37 and 38 and points 40 and 41. So let us calculate these lengths. To calculate horizontal and vertical line lengths in both eyes, we are going to create functions that define middle point and Euclidean distance.
# Defining the mid-point
def midpoint(p1 ,p2):
return int((p1.x + p2.x)/2), int((p1.y + p2.y)/2)
# Defining the Euclidean distance
def euclidean_distance(leftx,lefty, rightx, righty):
return np.sqrt((leftx-rightx)**2 +(lefty-righty)**2)
Now when we define the lengths of horizontal and vertical lines, we are ready to calculate the eye aspect ratio (EAR). We can do this as given below:
# Defining the eye aspect ratio
def get_EAR(eye_points, facial_landmarks):
# Defining the left point of the eye
left_point = [facial_landmarks.part(eye_points[0]).x, facial_landmarks.part(eye_points[0]).y]
# Defining the right point of the eye
right_point = [facial_landmarks.part(eye_points[3]).x, facial_landmarks.part(eye_points[3]).y]
# Defining the top mid-point of the eye
center_top = midpoint(facial_landmarks.part(eye_points[1]), facial_landmarks.part(eye_points[2]))
# Defining the bottom mid-point of the eye
center_bottom = midpoint(facial_landmarks.part(eye_points[5]), facial_landmarks.part(eye_points[4]))
# Drawing horizontal and vertical line
hor_line = cv2.line(frame, (left_point[0], left_point[1]), (right_point[0], right_point[1]), (255, 0, 0), 3)
ver_line = cv2.line(frame, (center_top[0], center_top[1]),(center_bottom[0], center_bottom[1]), (255, 0, 0), 3)
# Calculating length of the horizontal and vertical line
hor_line_lenght = euclidean_distance(left_point[0], left_point[1], right_point[0], right_point[1])
ver_line_lenght = euclidean_distance(center_top[0], center_top[1], center_bottom[0], center_bottom[1])
# Calculating eye aspect ratio
EAR = ver_line_lenght / hor_line_lenght
return EAR
Note that the indexes that we use in this function refers to the specific points around the eyes. (index[0] refers to point 36, index[1] refers to point 37 and so on). After we define all four points, we calculate the horizontal and the vertical line length. Finally, when we divide these lengths, this function will return eye aspect ratio (EAR).
By defining the function get_EAR()
we can calculate the blinking ratio in the input video. First, we will calculate the blinking ratio of each eye separately. As an argument we just need to put index values for both left and right eye. Finally, we can calculate the blinking ratio of both eyes by summing up the left and right ratio and dividing the sum by 2 (assuming that every person blinks both eyes at the same time).
# Creating an object in which we will sore detected facial landmarks
landmarks = predictor(gray, face)
# Calculating left eye aspect ratio
left_eye_ratio = get_EAR([36, 37, 38, 39, 40, 41], landmarks)
# Calculating right eye aspect ratio
right_eye_ratio = get_EAR([42, 43, 44, 45, 46, 47], landmarks)
# Calculating aspect ratio for both eyes
blinking_ratio = (left_eye_ratio + right_eye_ratio) / 2
Next, we want to display a moment when blink occurs in the output video. In the top left corner, we will display a blink counter, and in the top left corner we will display the EAR. Let us see how it will work.
As we have already explained the aspect ratio will be approximately constant while the eye is open, and it will quickly fall to zero when a blink occurs. We need to determine the threshold for blinking ratio that is near to zero. We will assume that every blinking ratio below that threshold will be detected as a blink, and blinking ratio above the threshold will not be detected. To achieve this, we need to create the following for loop:
if blinking_ratio < 0.20:
if previous_ratio > 0.20:
blink_counter = blink_counter + 1
We used a nested for loop to obtain the more accurate results. When EAR falls below the threshold of 0.20, we assume that a blink has occurred. But what if eyes are closed continuously for a few consecutive frames? In the case that ratio in the following frame is also below 0.20 we will detect two blinks instead of one. Due to this reason we created an additional for loop where we will check if the previous ratio is greater than 0.20. So, we will detect a blink if the first ratio of two consecutive ratios is greater than 0.20 and second is smaller than 0.20. We can visualize this if we plot our ratio signal.
eye_blink_signal=[]
eye_blink_signal.append(blinking_ratio)
plt.plot(eye_blink_signal)
Now let us write the above code together and look at our results at the end of this code.
# Necessary imports
import cv2
import numpy as np
import matplotlib.pyplot as plt
import dlib
from google.colab.patches import cv2_imshow
# Initializing the face detector and facial landmark predictor
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
# Downloading and unzipping facial landmark predictor
!wget http://dlib.net/files/shape_predictor_68_face_landmarks.dat.bz2
!bunzip2 "shape_predictor_68_face_landmarks.dat.bz2"
# Creating a VidoCapture and VideoWriter object
cap=cv2.VideoCapture("Blink.mp4")
fourcc = cv2.VideoWriter_fourcc('M', 'P', '4', 'V')
out = cv2.VideoWriter('output.mp4', fourcc, 29, (1080,1920))
font=cv2.FONT_HERSHEY_SIMPLEX
# Defining the mid-point
def midpoint(p1 ,p2):
return int((p1.x + p2.x)/2), int((p1.y + p2.y)/2)
# Defining the Euclidean distance
def euclidean_distance(leftx,lefty, rightx, righty):
return np.sqrt((leftx-rightx)**2 +(lefty-righty)**2)
# Defining the eye aspect ratio
def get_EAR(eye_points, facial_landmarks):
# Defining the left point of the eye
left_point = [facial_landmarks.part(eye_points[0]).x, facial_landmarks.part(eye_points[0]).y]
# Defining the right point of the eye
right_point = [facial_landmarks.part(eye_points[3]).x, facial_landmarks.part(eye_points[3]).y]
# Defining the top mid-point of the eye
center_top = midpoint(facial_landmarks.part(eye_points[1]), facial_landmarks.part(eye_points[2]))
# Defining the bottom mid-point of the eye
center_bottom = midpoint(facial_landmarks.part(eye_points[5]), facial_landmarks.part(eye_points[4]))
# Drawing horizontal and vertical line
hor_line = cv2.line(frame, (left_point[0], left_point[1]), (right_point[0], right_point[1]), (255, 0, 0), 3)
ver_line = cv2.line(frame, (center_top[0], center_top[1]),(center_bottom[0], center_bottom[1]), (255, 0, 0), 3)
# Calculating length of the horizontal and vertical line
hor_line_lenght = euclidean_distance(left_point[0], left_point[1], right_point[0], right_point[1])
ver_line_lenght = euclidean_distance(center_top[0], center_top[1], center_bottom[0], center_bottom[1])
# Calculating eye aspect ratio
EAR = ver_line_lenght / hor_line_lenght
return EAR
# Creating a list eye_blink_signal
eye_blink_signal=[]
# Creating an object blink_ counter
blink_counter = 0
previous_ratio = 100
# Creating a while loop
while True:
ret, frame = cap.read()
if ret == False:
break
# Converting a color frame into a grayscale frame
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Creating an object in which we will sore detected faces
faces = detector(gray)
for face in faces:
x, y = face.left(), face.top()
x1, y1 = face.right(), face.bottom()
# Creating an object in which we will sore detected facial landmarks
landmarks = predictor(gray, face)
# Calculating left eye aspect ratio
left_eye_ratio = get_EAR([36, 37, 38, 39, 40, 41], landmarks)
# Calculating right eye aspect ratio
right_eye_ratio = get_EAR([42, 43, 44, 45, 46, 47], landmarks)
# Calculating aspect ratio for both eyes
blinking_ratio = (left_eye_ratio + right_eye_ratio) / 2
# Rounding blinking_ratio on two decimal places
blinking_ratio_1 = blinking_ratio * 100
blinking_ratio_2 = np.round(blinking_ratio_1)
blinking_ratio_rounded = blinking_ratio_2 / 100
# Appending blinking ratio to a list eye_blink_signal
eye_blink_signal.append(blinking_ratio)
if blinking_ratio < 0.20:
if previous_ratio > 0.20:
blink_counter = blink_counter + 1
# Displaying blink counter and blinking ratio in our output video
previous_ratio = blinking_ratio
cv2.putText(frame, str(blink_counter), (30, 50), font, 2, (0, 0, 255),5)
cv2.putText(frame, str(blinking_ratio_rounded), (900, 50), font, 2, (0, 0, 255),5)
out.write(frame)
out.release()
Output:
Summary
In this post we learned how we can detect and count eye blinking in videos using dlib and OpenCV libraries. First, we detected the facial landmarks of the eyes, and then we calculated the aspect ratio between these landmarks. When eyes are open, we see that the aspect ratio will be larger and relatively constant over time. On the other hand, if eyes are closed, the aspect ratio will be almost equal to zero which indicates that person is blinking.
References:
[1] Eye Blinking detection – Gaze controlled keyboard with Python and Opencv p.2 by Sergio Canu
[2] Eye blink detection with OpenCV, Python, and dlib by Adrian Rosebrock