FR 001 Face Recognition with Celebrities
Highlights: In the world today, there are a lot of visual data and it is important how we utilize and interpret this data. The project is more of an evolution between traditional algorithms and deep learning techniques. How accurately can we predict and find the correct name of the celebrity in a given image or video frame.
Tutorial Overview:
This post covers the following topics:
- What is a facial recognition system?
- Applications of face recognition.
- Implementing a recognition system.
What is a face recognition system?
We can think of it as a computer program that can analyze raw pixel values from an image, transform them into an array of features and use them to recognize a desired person. This implies that a system first has to see an image of his/her picture, process it and store the face features for future comparisons.
Applications of face recognition
A long time ago, scientists and researchers around the globe have been working to develop powerful face recognition algorithms, which can be used to accurately and efficiently recognize faces of humans. Most of these systems will have huge commercial applications. Some examples are:
- Facial Biometrics – Which you might have noticed, Apple has already launched this in the form of Apple’s Face ID.
- Targeting – This can be used in stores in targeting their most valuable customers, giving them a highly personalized customer service to make sure that they stay as valuable customers.
In the past, traditional algorithms involving face recognition work by identifying facial images by extracting features, or landmarks, from the image of the face. For a computer algorithm to extract facial features, it must analyses the shape, size and certain positions such as the eyes, nose, mouth, cheekbones, jaws, etc. These extracted features would then be used for searching other images that have matching or similar features.
Over the years, they have proven to be highly inaccurate as well as inefficient, due to these features been hard coded. They have not given good results and are not scalable because there they are many people who have similar facial features.
In recent years, industries have switched towards deep learning. Convolutional neural networks have been employed lately to improve the accuracy of face recognition algorithms. The most fascinating thing about this is that these features are not hard coded, rather the network is left to learn these features by themselves, after explicitly been built.
These networks take an image as an input and extract a highly complex set large number of important features out of the image. These include, width and height of the face, nose, lips, eyes, skin color tone, texture which match with the ones stored in the database.
CNN have proven to be far better than traditional algorithms. The biggest challenge that remains is that of scaling. These algorithms require heavy computation resource to produce tangible results, which are expensive. computation resource to produce tangible results, which are expensive.
Implementing a recognition system
In this post we are going to see how we can recognize a face in a given image. This seems a bit complex at first, but it is very easy. Let’s walk you true the entire process.
The first thing we need is an image and we is to import our necessary libraries, Keras a high-level library for deep learning, built on top of Theano and TensorFlow. These are also popular deep learning frameworks.
After all the imports, next we defined our triplet loss. The triplet loss tries to bring close the Anchor (current image) with the positive (an image that is in theory similar with the anchor) as far as possible from the Negative (an image that is different from the anchor).
In simple terms, let’s say we are looking at 3 images at a time. Which we denote as positive, anchor and negative. We want the encoding of the positive and anchor image to be similar, because they are the same person ( The two pictures below of Emily Clarke) and the encoding of the anchor and negative to be quite different (The encodings of Emily Clarke and Maisie Williams), because they are two different people.
In the cell below, our loss function is relatively very high, which is not a good thing when building our application. Our major goal is to further reduce the loss using Adams optimizer.
In the next couple of cells, we are going to load an already pretrained model and find the local minimum based on the triplet loss using stochastic gradient descent. We do this by first instantiating the model, print out the total number of parameters and then training.
Already trained weight from the faceNet model would be used. We normally do this when we don’t have enough faces to train our model on or due to the computational cost.
In order to build our image classification model on faces, data collecting, and preprocessing is a very crucial step of this process. These images used for my model was all collected from the IMDB website of different actors and actresses from the series “Game of Thrones”. Since they are many people’s images to choose from, i started off small by classifying about 4 of my favorite celebrities which images contain only 1 unique person.
Each celebrity have their own unique encoding. These encodings are a good representation of these images. Now we are just testing to see the performance of our model, by calculating the distance between two different images of the same person.
We can verify that the individual on the test input is either Sophie Turner, Emilia Clarke, Maisie Williams or Peter Dinklage.
Summary
To summarize, I believe with continued research and developments in Deep Neural Network (DNN), this growth is likely to continue in the future. You can find the code to this post here.