In simple words a Facial Recognition System can be defined as a technology which can identify or verify a person from a digital image or video source by comparing and analyzing patterns based on the person’s facial contours.
Starting from mid 1900's, scientists have been working on using the computers to recognize human faces.
Face recognition has received substantial attention from researchers due to its wide range applications in the real world.
PREREQUISITESThis post assumes familiarity with basic Deep Learning concepts like Activation units, Padding, Strides, Forward propagation, Back propagation, Overfitting, Droupouts, Flatten, Python syntax and data structures, Keras library etc.
WHY FACIAL RECOGNITION IS IMPORTANT?Since face is a unique way of identifying people, facial recognition have gained high attention and growing rapidly across the world for providing safe and reliable security.
It is gaining significant importance by corporate companies and government organisations because of its high level of security and reliability.
Facial recognition is now considered to have more advantages when compared to other bio-metric systems like palm print and finger print since facial recognition doesn’t need and human interaction and can be taken without a person’s knowledge which can be highly useful in identifying the human activities found in various applications of security like airport, criminal detection, face tracking, forensic etc.
HOW TO BUILD A FACIAL RECOGNITION MODEL?Over the years there were many methods used to implement facial recognition models but thanks to Artificial Intelligence it made our life easier.
Using Deep Learning(part of AI), provided with the sufficient data a Facial Recognition System can be built simply and with a high accuracy.
We use the a simple Convolutional Neural Networks(CNN) model in order to build a Facial Recognition System.
HOW A CNN WORKS?In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery.
Convolutional networks were inspired by biological processes in that the connectivity pattern between neurons resembles the organization of the animal visual cortex.
Given an input image, a CNN model applies various filters in order to identify edges, parts in an image in order to detect the object in the given image.
Let’s understand how a CNN works in step by step process in brief.
Data Augmentation – This is an effective technique to enlarge your data and helps us to build a robust Facial Recognition system which can identify faces even if it is in different orientations.
Augmentation converts a single image to multiple images by applying various operations like squeezing, stretching, flipping, zooming in, zooming out, cropping, rotating image etc.
, This ensures to identify a face at different angles and orientation.
Images to Tensor – A color image is composed of three channels namely Red, Blue and Green.
The picture given below gives a basic understanding of how an image is composed of these three channels.
Similarly an image is dived into three channels and is converted into gray scale images in order to form a tensor.
The pixels of the three gray scale images are considered as the rows and columns of the tensor and based on the depth of the color in each pixel the rows and columns are numbered which ranges from 0 to 255, 0 being the white and 255 being the black.
This is how an image converted as a tensor.
Once the image is converted to a tensor, next step is to identify the edges.
Convolution: Edge Detection – Detection the edges can be considered as the most important part of the convolution.
In order to differentiate between two objects edges play a key role.
In a CNN network, we use Sobel edge detection technique in order to identify the edges.
In this technique, we use a mask/kernel of the same depth of the tensor and apply convolution operation between the image tensor and the kernel.
This is how a image looks after applying the Sobel edge detection technique.
We use multiple such kernels in each layer with padding, strides and ReLu activation units.
Max-Pooling – Max-Pooling is a layer which helps us detect the face or object in the given image.
It is invariant to location, scale and rotation, which means it detects the face/object in an image irrespective of the location, size or the position of the face/object in the image.
Given the output from the previous layer max-pooling is applied on top of it with given kernel size and stride it picks the maximum value and detects the faces/objects.
We use combination of convolution, activation and max-pooling layers and as we propagate forward, each layers learns new things.
For example layer 1 learns edges, layer 2 learns learns parts and layer 3 learns the faces.
Our final model will look something like the model(This is for reference and not actual) shown in the below image.
APPLYING CNN TO BUILD AN ACTUAL FACE RECOGNITION MODELSince, we’ve understood about CNN’s in brief let’s apply this on real data to build a Face Recognition model.
I’ve a data consisting 1608 images divided into 11 sub-folders and each folder consists multiple images of a single person and the sub-folders are named after the person whose images are present in those sub-folder.
Initially, we import all the required libraries.
Now, we’ve to split the total data in each folder into Train and Test in the ratio of 80:20 and normalize the data.
In the next step, we’ve to perform Data Augmentation on train and test data using ImageDataGenerator available in Keras.
This generator will read pictures found in sub-folders of ‘data/train’, and indefinitely generate batches of augmented image data.
Also, we’ve to re-shape the images.
After pre-processing the data we can now apply a basic CNN model for facial recognition model.
Here I’ve created a model with three layers and 2 drop-out layers with drop-out rate of 0.
25 and 0.
Layer 1: 64 convolutional units with kernel size of 3×3, ReLu activation and a max-pooling layer.
Layer 2: 32 convolutional units with kernel size of 3×3, ReLu activation and a max-pooling layer.
Layer 3: 16 convolutional units with kernel size of 3×3, ReLu activation and a max-pooling layer.
After the 3 layers, I’ve my drop-out, faltten and dense layers.
I’ve ran the model for 12 epochs and by the end of the final epoch I’ve got a validation accuracy of 98.
30% with a minimal validation cross entropy loss of 0.
Deep learning models will overfit easily.
To ensure the model isn’t overfitting we can plot a graph between train and test loss as shown below.
Since the train and test loss are almost similar by the end of 12 epochs, we can ensure that our model isn’t overfitting.
Since this is a basic model, we can improve the accuracy over 99.
5% of our model by playing around with hyper parameters such as number of layers, drop-out rate etc.
REAL-WORLD APPLICATIONS OF FACIAL RECOGNITIONIn the present world, Facial Recognition is being extensively used in surveillance systems.
It is also being used in criminal detection and forensic.
US Federal Bureau of Investigation is using face recognition to identify suspects from their driver’s licences.
AI equipped cameras have also been trialed in the UK to identify those smuggling contraband into prisons.
Facial recognition is also being used in payments to make secure and reliable online payments.
It is being used in mobile phones for unlocking.
This is a powerful way to protect personal data and ensure that, if a phone is stolen, sensitive data remains inaccessible by the perpetrator.