Face Recognition System Using Siamese Neural Network

Mahesh Chatpatil
8 min readMay 9, 2021

--

I am writing this blog so that this blog would be greatly useful for all the AI enthusiasts . I made it more informative and concise so that most concepts will be cleared in very less reading time . I kept the language simple and easy explanations. Happy reading !

Photo by Possessed Photography on Unsplash

Table of Contents:-

1. Motivation

2.Introduction

3.Problem statement

4.Steps involved in face recognition

5.Architecture

6.Results

7.Conclusion

Motivation :-

In this article , I have explained how we can solve computer vision problems or we can also say Convolutional Neural Network problems using Siamese Neural Network.So, in this article we are going to see how face recognition works better with SNN. I have explained the problem statement below.

Deep Convolutional Neural Networks have become the state of the art methods for image classification tasks. However, one of the biggest limitations is they require a lots of labelled data. In many applications, collecting this much data is sometimes not feasible. One Shot Learning aims to solve this problem and One Shot Learning is implemented using Siamese neural network.

Introduction :-

We will know what is face recognition and how it can be implemented step by step. We will go briefly over the theory of face recognition and then jump on to the coding section. At the end of this article, we will make a face recognition program for recognizing faces.

Problem statement :-

Assume that we want to build face recognition system for a small organization with only 10 employees (small numbers keep things simple). Using a traditional classification approach, we might come up with a system that looks as below:

Problems :

a) To train such a system, we first require a lot of different images of each of the 10 persons in the organization which might not be feasible. (Imagine if you are doing this for an organization with thousands of employees).

b) What if a new person joins or leaves the organization? You need to take the pain of collecting data again and re-train the entire model again. This is practically not possible specially for large organizations where recruitment and attrition is happening almost every week.

Solution :

Instead of directly classifying an input(test) image to one of the 10 people in the organization, this network instead takes an extra reference image of the person as input and will produce a similarity score denoting the chances that the two input images belong to the same person. Typically the similarity score is squished between 0 and 1 using a sigmoid function; wherein 0 denotes no similarity and 1 denotes full similarity. Any number between 0 and 1 is interpreted accordingly.

Notice that this network is not learning to classify an image directly to any of the output classes. Rather, it is learning a similarity function, which takes two images as input and expresses how similar they are.

a) In a short while we will see that to train this network, you do not require too many instances of a class and only few are enough to build a good model.

b) But the biggest advantage is that , let’s say in case of face recognition, we have a new employee who has joined the organization. Now in order for the network to detect his face, we only require a single image of his face which will be stored in the database. Using this as the reference image, the network will calculate the similarity for any new instance presented to it. Thus we say that network predicts the score in one shot.

Three Major steps involved in Face Recognition :-

1 .Face Detection :-

Face detection is usually the first step towards many face-related technologies, such as face recognition or verification. To recognise a face, it is first important that we detect/locate a face in an image/video. There are various facial detection softwares that can detect a Human face in an image. We extract a human face and then move on to the next step. Viola-Jones algorithm is one of the popular face detection algorithms.

2. Feature extraction using face embedding :-

The next step is to extract features from a face using a face embedding model. A face embedding is a vector that represents the features extracted from the face and we can use these vectors to recognise faces. Note that face embedding for the same face may be really close in the vector space, whereas the face embeddings of two different faces may be really far away

3. Face Recognition :-

Face recognition is a method of identifying or verifying the identity of an individual using their face.

We have face embedding for each face in the system. Whenever we pass a new face to the system, it calculates its face embedding and compares it with the ones we already have. The face is recognised, if its face embedding closely matches any other face embedding in the database.

Lets Code :-

The complete code for this facial recognition model using a siamese network can be found at this link:

1. Dataset :-

Custom dataset can be generated by two ways :-

a) By adding a folder of images of the person , you want to recognise

b)By running the dataset generator function in project files. It will create 50 sample from the web camera .

2. Working :-

Firstly, we make our database of the faces which we want to recognise. This will be a directory named images. To do this, different functions are defined based on the users requirements. . A pre processing pipeline is involved before saving the image to database. While recognising faces, a frame (which contains a face) is taken from webcam and fed into our network. The network takes in the camera frame and database, compares the similarities and differences between each set of frame and database image. The output will be a string which is the name of the most likely similar image in the database. If the face is not found in the database, the output will be a unknown.

3.Files details in the Repository:-

1.Main function :- main file which recognises faces.

2.add_to_database :- takes a frame from the webcam and saves in the images directory (database)

3.fr_utils :- contains some important functions for Inception . code is available at :-
https://github.com/iwantooxxoox/Keras-OpenFace/blob/master/utils.py

4.inception_network :- contains Inception network blocks

5.weights :- contains weights of pre-trained Inception network

6.haarcascade_frontalface_default :-for detecting faces. Code is available at https://github.com/opencv/opencv/blob/master/data/haarcascades/haarcascade_frontalface_default.xml

7.shape_predictor_68_face_landmarks :- It’s a dlib’s pre trained model.It detect and predict 68 points in human faces. It is available at https://github.com/davisking/dlib-models

4.Architecture :-

Face net :- FaceNet is a combination of Siamese Network at the end of Inception Network.

Image(96×96×3) -> InceptionNetwork -> SiameseNetwork -> Output

FaceNet is a model that, when given a picture of a face, will extract high-quality features from it and predict a 128-element vector representation of these features, called a face embedding. These face embeddings can then be used as the basis for training classifier systems on standard face recognition benchmark datasets

Inception Network :-

To get image encoding from the input image pre-trained inception network is used. It is state-of-the-art architecture for image processing.The computational costs for training GoogLeNet are very high so I used a pre-trained model in my Face-recognition system.

An inception network is a deep neural network with an architectural design that consists of repeating components referred to as Inception modules.

GoogLeNet

For more information i am attaching some articles here that we help you to go deeper and have better understanding.

https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202

https://www.geeksforgeeks.org/understanding-googlenet-model-cnn-architecture/

Standard convolutional neural network :-

In case of a CNN model, you have a series of convolutional and pooling layers followed by some dense layers and an output layer probably with a softmax function. The convolutional layers here are responsible for feature extraction from the image, whereas the softmax layer is responsible for providing a range of probability for every class. We then decide the class of the image with the neuron that has the highest probability value.

Siamese Neural Network :-

1.Siamese network takes two different inputs passed through two similar subnetworks with the same architecture, parameters, and weights.

2.The two subnetworks are a mirror image of each other, just like the Siamese twins. Hence, any change to any subnetworks architecture, parameter, or weights is also applied to the other subnetwork.

3.The two subnetwork outputs an encoding to calculate the difference between the two inputs.

4.The Siamese network’s objective is to classify if the two inputs are the same or different using the Similarity score. The Similarity score can be calculated using Binary cross-entropy, Contrastive function, or Triplet loss, which are techniques for the general distance metric learning approach.

5.Siamese network is a one-shot classifier that uses discriminative features to generalize the unfamiliar categories from an unknown distribution.

References :- https://arxiv.org/pdf/1503.03832.pdf

5. Results :-

Some results i got i am attaching here.as i have checked already Model is showing very good results. From the attached image we can see that it clearly identifies me and abhishek. It also shows the distance between current image encoding and the encoding of image from the storage which is most similar to the current image from the webcam.

6.Conclusion :-

I hope this helped you in understanding the complete article .Mainly, my goal is to solve computer vision problems also called as CNN problems using siamese network.

we have seen implementation of facial recognition model in Python, using pre-trained FaceNet model and similarity distance measure between images.

There are several possibilities for improving the model:

  1. Poor quality cameras/images limit the model’s effectiveness. By combining our facial recognition model with image enhancement techniques to deblur/recover pixel intensity.
  2. Small size images make facial recognition more difficult. To help with this, we could upsample the input image to improve its resolution before passing it through the model.

Source Code: Please refer my source code in Jupyter Notebook on my GitHub Repository here.

Let me know if you are facing any issues. As this is my first blog post ever, let me know if it was helpful for you on your projects or if I should change the way of explaining things.

Photo by Morvanic Lee on Unsplash

--

--

Mahesh Chatpatil
Mahesh Chatpatil

No responses yet