Real-time Multi-Facial attribute detection using computer vision and deep learning with FastAI and OpenCVA quick 4 part walkthrough on doing real-time Multi-Facial attribute detection by using deep learning(ResNet50 with FastAI & Pytorch), Face detection and localization using Haar cascades(OpenCV).
Aayush AgrawalBlockedUnblockFollowFollowingFeb 17The final output of the multi facial attribute detection project.
In this post, we are trying to achieve the above result.
The post guides with an end to end process on how I went about building this.
The entire codebase for replicating the project is in my GitHub repository.
Part 1 — Data Acquisition and UnderstandingFor any deep learning model to give reasonable accuracy, we need to rely on a large amount of labeled data.
Most of the repo’s on facial feature detection I found are focused only on multi-class classification like Emotion detection, smile detection, etc.
I was looking for a dataset with multiple labels attached to a facial image so that I can achieve something which a Google vision API achieve as below —Example of a facial detection output from Google vision APISo for this purpose, I found a dataset on Kaggle dataset website called CelebFaces Attributes (CelebA) Dataset which contains -202,599 number of face images of various celebrities10,177 unique identities, but names of celebrities are not given40 binary attribute annotations per image5 landmark locationsIt’s a pretty decent size dataset for doing various exciting problems in computer vision, for my purpose I was only interested in facial images and the 40 binary attribute annotations of those images.
The 40 binary attributes(Yes/No) are listed here — 5_o_Clock_Shadow, Arched_Eyebrows, Attractive, Bags_Under_Eyes, Bald, Bangs, Big_Lips, Big_Nose, Black_Hair, Blond_Hair, Blurry, Brown_Hair, Bushy_Eyebrows, Chubby, Double_Chin, Eyeglasses, Goatee, Gray_Hair, Heavy_Makeup, High_Cheekbones, Male, Mouth_Slightly_Open, Mustache, Narrow_Eyes, No_Beard, Oval_Face, Pale_Skin, Pointy_Nose, Receding_Hairline, Rosy_Cheeks, Sideburns, Smiling, Straight_Hair, Wavy_Hair, Wearing_Earrings, Wearing_Hat, Wearing_Lipstick, Wearing_Necklace, Wearing_Necktie, Young.
Here is an example -Example from CelbeA datasetThe above image has these features labeled — Arched_Eyebrows, Attractive, Big_Lips, Heavy_Makeup, Narrow_Eyes, No_Beard, Pointy_Nose, Wearing_Lipstick, Young.
Because Male flag is False, we can say that the label is Female.
Part 2 — Data PreprocessingThe entire code for data pre-processing is in this notebook.
1) On Images -Key thoughts I have when I was doing data processing for CelebA dataset was to think that how I am going to use the model built on a real video/webcam stream/image.
CelebA data is tightly cropped around the face but in a video/webcam/image the face can be anywhere, and it has to be detected first.
There are many prebuilt tools to localize a face in an image for example Face Recognition which uses a deep learning network to detect a face.
I wanted to keep this step simple, so I used Haar cascades which is a traditional computer vision approach to detect objects.
Haar cascade return the bounding box coordinates on an image where the face is detected, here is an example output of using Haar cascade -An example of Haar Cascade output.
To learn more about HAAR cascades refer to this blog.
There are pre-built haar cascade filters in OpenCV.
I am using one of them for frontal face detection.
So once decided on the methodology for face detection, the next step is to apply the same method on the CelebA dataset to detect faces and crop only the facial area of an image(with some added margins), this step will help ensure thatWe remove any faces where the frontal face is not detected using a Haar cascade for example cases where the person is facing sidewaysIt will ensure our training images are in line with the actual usage of the modelExample of Haar cascade processing on CelebA datasetNotice in the above case the picture in the left is transformed to picture in the right(looks more zoomed-in) after haar cascade cropping.
We also filtered down from 202,599 to 175,640 images as the filtered images don’t contain the front side faces.
An example of a filtered image is shown below -Example of a filtered image.
2) On Label file-Apart from pre-processing on images, we need to create our label file which can be used by FastAI dataset loader.
Original label fileIn the original label file, the multi-attribute labels contain 1 /-1 value for every 40 attributes where 1 signifying if the feature is present and -1 meaning the absence of that feature.
I just wrote a simple function to convert this file so that we only have one label column with space separated labels(figure below)-Modified Label filePart 3 — Model TrainingOnce we have pre-processed our data, the next step is to build a model which can detect these 40+ attributes given a facial image.
For this, we are going to use FastAI v1 library written over Pytorch 1.
The model training notebook can be found on my Github here.
I have divided the data in the training and validation set based on the recommended partitioning that image number from 1–182637 for training and 182638 onwards for validation.
It’s incredibly easy to train world-class models with few lines of code in FastAI library, so let’s go through the code -Boiler Plate library import commands —import pandas as pdimport numpy as npfrom fastai.
vision import *import matplotlib.
pyplot as pltDataset Loadingpath = Path('.
/data/celeba/faces/')## Function to filter validation samplesdef validation_func(x): return 'validation' in xtfms = get_transforms(do_flip=False, flip_vert=False, max_rotate=30, max_lighting=0.
3)src = (ImageItemList.
label_from_df(cols='tags',label_delim=' '))data = (src.
transform(tfms, size=128) .
normalize(imagenet_stats))Line 1 — Defining the path to the dataset folder.
Line 2–4 — Defining how we are going to find the training and validation image.
Line 6–7 — Define transformation we want to do on our data like rotating the image randomly by a maximum of 30 degrees and lighting adjustment of a max of 0.
Line 8 — Define images as Item list from the labels CSVLine 9 — Split data in training and validation by using the validation function in Line 2–4Line 10 — Helps get our label from the tags column of labels.
csv and helps us define that it’s a multi-label column where a space separates labels.
Line 12 — Passing the transformation function from Line 6–7 and resizing images to 3*128*128Line 13 — Defines our batch size of 256 images and normalize our data by using ImageNet averagesNotice that we are using a smaller image size for initial training of our model and later on we are going to increase the image size to 3*256*256.
This trick helps us train our model faster by allowing bigger batch size and experiment more quickly on what model configuration works.
Model definition —We are going to do transfer learning by using a pre-trained ResNet 50 model for this modeling exercise.
arch = models.
resnet50acc_02 = partial(accuracy_thresh, thresh=0.
2)acc_03 = partial(accuracy_thresh, thresh=0.
3)acc_04 = partial(accuracy_thresh, thresh=0.
4)acc_05 = partial(accuracy_thresh, thresh=0.
5)f_score = partial(fbeta, thresh=0.
2)learn = create_cnn(data, arch, metrics=[acc_02, acc_03, acc_04, acc_05, f_score])Line1 — Downloads a pre-trained Resnet 50 modelLine 2–6 — In FastAI we can track as many accuracy measures on validation data, these metrics are just for monitoring and are not used in training the model.
We are using partial functions to define accuracy at a different threshold and also tracking the F-score at threshold 0.
2Line 7 — Helps in creating a CNN architecture by using the pre-trained convolution part of ResNet 50 model and adding two new Fully connected layers at the top.
The good thing about FastAI is that it saves a lot of time in training by finding an ideal learning rate for a particular exercise of interest.
plot()Learning rate finder from FastAILine 1 — Finds the ideal learning rate by trying multiple learning rates on a sample of dataLine 2 — Let us plot the loss at various learning rate.
We need to choose a learning rate with the maximum declining slope on the above function.
In this case, it’s 1e-2.
lr = 1e-2learn.
fit_one_cycle(4, slice(lr))Training snapshotNow we have trained our last fully connected layer a bit.
Let us unfreeze all the layers and train the full model.
We are going to use the learning rate finder to determine the ideal learning rate again.
plot()Snapshot of learning rate finder in FastAILine 1 — Unfree all the layersLine 2–3 — Helps us find the ideal learning rate.
Now we will use different learning rate for each layer in the model by exponentially decaying the learning rate as we go back in layers.
fit_one_cycle(5, slice(1e-5, lr/5))learn.
save('ff_stage-2-rn50')Line 1 — Uses one cycle learning by the variable learning rateLine 2 — Saves our model with the specified name.
Training snapshotNow we can increase the input image size to 3*256*256 and use transfer learning on the above-trained model to adjust to the new input image size.
data = (src.
transform(tfms, size=256) .
normalize(imagenet_stats))acc_05 = partial(accuracy_thresh, thresh=0.
5)f_score = partial(fbeta, thresh=0.
5)learn = create_cnn(data, models.
resnet50, pretrained=False,metrics=[acc_05, f_score])learn.
load("ff_stage-2-rn50")Line 1–2 — Create a new data loader which resizes images to 3*256*256 and reduces batch size to 64.
Line 4–6 — Defines which metric we need to track and create a ResNet 50 model similar to the previous model.
Line 8 — Loads the weights from our previously trained model into the newly created model.
Now we can train the model a bit more following similar steps as mentioned above.
The training notebook also provides codes to visualize activations of the intermediate layer to help understand what part of the image drives the final result of the model.
Models Intermediate Activation layers heatmap over the actual image.
As we can see from the image above the model is most activated where the face is in the image, which is what we want as it’s a facial feature detection model.
Part 4 — Combining everythingNow we have our model trained let’s write a script which can do facial attribute detection, the last part it to put it all together.
The code for this part is on my Github here.
Detection script process flowThe script does the following task -Using OpenCV to access webcam for taking the input video and converts into a series of image frames.
For each frame, we run the Haar cascade model from OpenCV to locate faces and crop it out from the frame.
Pass those cropped out frames of detected faces into our trained model to find relevant facial featuresDisplay the bounding box, and all the features detected back on the frame while running the scriptOptionally save the video streamConclusionIn the above blog, we saw how to do end to end facial attribute detection problem by combining various techniques from tradition machine vision to deep learning together.
I hope you enjoyed reading, and feel free to use my code on Github to try it out for your purposes.
Also, if there is any feedback on code or just the blog post, feel free to reach out on LinkedIn or email me at aayushmnit@gmail.
You can also follow me on Medium and Github for future blog post and exploration project codes I might write.
.. More details