Well, consider a region D for which we would like to estimate the sum of the pixels.
We have defined 3 other regions : A, B and C.
The value of the integral image at point 1 is the sum of the pixels in rectangle A.
The value at point 2 is A + BThe value at point 3 is A + CThe value at point 4 is A + B + C + D.
Therefore, the sum of pixels in region D can simply be computed as : 4+1−(2+3)4+1−(2+3).
And over a single pass, we have computed the value inside a rectangle using only 4 array references.
One should simply be aware that rectangles are quite simple features in practice, but sufficient for face detection.
Steerable filters tend to be more flexible when it comes to complex problems.
Learning the classification function with AdaboostGiven a set of labeled training images (positive or negative), Adaboost is used to :select a small set of featuresand train the classifierSince most features among the 160’000 are supposed to be quite irrelevant, the weak learning algorithm around which we build a boosting model is designed to select the single rectangle feature which splits best negative and positive examples.
Cascading ClassifierAlthough the process described above is quite efficient, a major issue remains.
In an image, most of the image is a non-face region.
Giving equal importance to each region of the image makes no sense, since we should mainly focus on the regions that are most likely to contain a picture.
Viola and Jones achieved an increased detection rate while reducing computation time using Cascading Classifiers.
The key idea is to reject sub-windows that do not contain faces while identifying regions that do.
Since the task is to identify properly the face, we want to minimize the false negative rate, i.
e the sub-windows that contain a face and have not been identified as such.
A series of classifiers are applied to every sub-window.
These classifiers are simple decision trees :if the first classifier is positive, we move on to the secondif the second classifier is positive, we move on to the third…Any negative result at some point leads to a rejection of the sub-window as potentially containing a face.
The initial classifier eliminates most negative examples at a low computational cost, and the following classifiers eliminate additional negative examples but require more computational effort.
The classifiers are trained using Adaboost and adjusting the threshold to minimize the false rate.
When training such model, the variables are the following :the number of classifier stagesthe number of features in each stagethe threshold of each stageLuckily in OpenCV, this whole model is already pre-trained for face detection.
If you’d like to know more on Boosting techniques, I invite you to check my article on Adaboost.
ImportsThe next step simply is to locate the pre-trained weights.
We will be using default pre-trained models to detect face, eyes and mouth.
Depending on your version of Python, the files should be located somewhere over here :/usr/local/lib/python3.
7/site-packages/cv2/dataOnce identified, we’ll declare Cascade classifiers this way :cascPath = "/usr/local/lib/python3.
xml"eyePath = "/usr/local/lib/python3.
xml"smilePath = "/usr/local/lib/python3.
xml"faceCascade = cv2.
CascadeClassifier(cascPath)eyeCascade = cv2.
CascadeClassifier(eyePath)smileCascade = cv2.
Detect face on an imageBefore implementing the real time face detection algorithm, let’s try a simple version on an image.
We can start by loading a test image :# Load the imagegray = cv2.
show()Test imageThen, we detect the face and we add a rectangle around it :# Detect facesfaces = faceCascade.
CASCADE_SCALE_IMAGE)# For each facefor (x, y, w, h) in faces: # Draw rectangle around the face cv2.
rectangle(gray, (x, y), (x+w, y+h), (255, 255, 255), 3)Here is a list of the most common parameters of the detectMultiScale function :scaleFactor : Parameter specifying how much the image size is reduced at each image scale.
minNeighbors : Parameter specifying how many neighbors each candidate rectangle should have to retain it.
minSize : Minimum possible object size.
Objects smaller than that are ignored.
maxSize : Maximum possible object size.
Objects larger than that are ignored.
Finally, display the result :plt.
show()Face detection works well on our test image.
Let’s move on to real time now !I.
Real time face detectionLet’s move on to the Python implementation of the live facial detection.
The first step is to launch the camera, and capture the video.
Then, we’ll transform the image to a gray scale image.
This is used to reduce the dimension of the input image.
Indeed, instead of 3 points per pixel describing Red, Green, Blue, we apply a simple linear transformation :This is implemented by default in OpenCV.
video_capture = cv2.
VideoCapture(0)while True: # Capture frame-by-frame ret, frame = video_capture.
read() gray = cv2.
COLOR_BGR2GRAY)Now, we’ll use the faceCascade variable define above, which contains a pre-trained algorithm, and apply it to the gray scale image.
faces = faceCascade.
detectMultiScale( gray, scaleFactor=1.
1, minNeighbors=5, minSize=(30, 30), flags=cv2.
CASCADE_SCALE_IMAGE )For each face detected, we’ll draw a rectangle around the face :for (x, y, w, h) in faces: if w > 250 : cv2.
rectangle(frame, (x, y), (x+w, y+h), (255, 0, 0), 3) roi_gray = gray[y:y+h, x:x+w] roi_color = frame[y:y+h, x:x+w]For each mouth detected, draw a rectangle around it :smile = smileCascade.
detectMultiScale( roi_gray, scaleFactor= 1.
16, minNeighbors=35, minSize=(25, 25), flags=cv2.
CASCADE_SCALE_IMAGE ) for (sx, sy, sw, sh) in smile: cv2.
rectangle(roi_color, (sh, sy), (sx+sw, sy+sh), (255, 0, 0), 2) cv2.
putText(frame,'Smile',(x + sx,y + sy), 1, 1, (0, 255, 0), 1)For each eye detected, draw a rectangle around it :eyes = eyeCascade.
detectMultiScale(roi_gray) for (ex,ey,ew,eh) in eyes: cv2.
putText(frame,'Eye',(x + ex,y + ey), 1, 1, (0, 255, 0), 1)Then, count the total number of faces, and display the overall image :cv2.
putText(frame,'Number of Faces : ' + str(len(faces)),(40, 40), font, 1,(255,0,0),2) # Display the resulting frame cv2.
imshow('Video', frame)And implement an exit option when we want to stop the camera by pressing q :if cv2.
waitKey(1) & 0xFF == ord('q'): breakFinally, when everything is done, release the capture and destroy all windows.
There are some troubles killing windows on Mac which might require killing Python from the Activity Manager later on.
Wrapping it upI.
ResultsI’ve made a quick YouTube illustration of the face detection algorithm.
Histogram of Oriented Gradients (HOG) in DlibThe second most popular implement for face detection is offered by Dlib and uses a concept called Histogram of Oriented Gradients (HOG).
This is an implementation of the original paper by Dalal and Triggs.
TheoryThe idea behind HOG is to extract features into a vector, and feed it into a classification algorithm like a Support Vector Machine for example that will assess whether a face (or any object you train it to recognize actually) is present in a region or not.
The features extracted are the distribution (histograms) of directions of gradients (oriented gradients) of the image.
Gradients are typically large around edges and corners and allow us to detect those regions.
In the original paper, the process was implemented for human body detection, and the detection chain was the following :II.
PreprocessingFirst of all, the input images must but of the same size (crop and rescale images).
The patches we’ll apply require an aspect ratio of 1:2, so the dimensions of the input images might be 64×128 or 100×200 for example.
Compute the gradient imagesThe first step is to compute the horizontal and vertical gradients of the image, by applying the following kernels :Kernels to compute the gradientsThe gradient of an image typically removes non-essential information.
The gradient of the image we were considering above can be found this way in Python :gray = cv2.
jpeg', 0)im = np.
float32(gray) / 255.
0# Calculate gradient gx = cv2.
CV_32F, 1, 0, ksize=1)gy = cv2.
CV_32F, 0, 1, ksize=1)mag, angle = cv2.
cartToPolar(gx, gy, angleInDegrees=True)And plot the picture :plt.
show()We have not pre-processed the image before though.
Compute the HOGThe image is then divided into 8×8 cells to offer a compact representation and make our HOG more robust to noise.
Then, we compute a HOG for each of those cells.
To estimate the direction of a gradient inside a region, we simply build a histogram among the 64 values of the gradient directions (8×8) and their magnitude (another 64 values) inside each region.
The categories of the histogram correspond to angles of the gradient, from 0 to 180°.
Ther are 9 categories overall : 0°, 20°, 40°… 160°.
The code above gave us 2 information :direction of the gradientand magnitude of the gradientWhen we build the HOG, there are 3 subcases :the angle is smaller than 160° and not halfway between 2 classes.
In such case, the angle will be added in the right category of the HOGthe angle is smaller than 160° and exactly between 2 classes.
In such case, we consider an equal contribution to the 2 nearest classes and split the magnitude in 2the angle is larger than 160°.
In such case, we consider that the pixel contributed proportionally to 160° and to 0°.
The HOG looks like this for each 8×8 cell :HoGII.
Block normalizationFinally, a 16×16 block can be applied in order to normalize the image and make it invariant to lighting for example.
This is simply achieved by dividing each value of the HOG of size 8×8 by the L2-norm of the HOG of the 16×16 block that contains it, which is in fact a simple vector of length 9*4 = 36.
Block normalizationFinally, all the 36×1 vectors are concatenated into a large vector.
And we are done !.We have our feature vector, on which we can train a soft SVM classifier (C=0.
Detect face on an imageThe implementation is pretty straight forward :face_detect = dlib.
get_frontal_face_detector()rects = face_detect(gray, 1)for (i, rect) in enumerate(rects):(x, y, w, h) = face_utils.
rectangle(gray, (x, y), (x + w, y + h), (255, 255, 255), 3) plt.
Real time face detectionAs previously, the algorithm is pretty easy to implement.
We are also implementing a lighter version by detecting only the face.
Dlib makes it really easy to detect facial key-points too, but it’s another topic.
Convolutional Neural Network in DlibThis last method is based on Convolutional Neural Networks (CNN).
It also implements a paper on Max-Margin Object Detection (MMOD) for enhanced results.
A bit of theoryConvolutional Neural Network (CNN) are feed-forward neural network that are mostly used for computer vision.
They offer an automated image pre-treatment as well as a dense neural network part.
CNNs are special types of neural networks for processing datas with grid-like topology.
The architecture of the CNN is inspired by the visual cortex of animals.
In previous approaches, a great part of the work was to select the filters in order to create the features in order to extract as much information from the image as possible.
With the rise of deep learning and greater computation capacities, this work can now be automated.
The name of the CNNs comes from the fact that we convolve the initial image input with a set of filters.
The parameter to choose remains the number of filters to apply, and the dimension of the filters.
The dimension of the filter is called the stride length.
Typical values for the stride lie between 2 and 5.
The output of the CNN in this specific case is a binary classification, that takes value 1 if there is a face, 0 otherwise.
Detect face on an imageSome elements change in the implementation.
The first step is to download the pre-trained model here.
Move the weights to your folder, and define dnnDaceDetector :dnnFaceDetector = dlib.
dat")Then, quite similarly to what we have done so far :rects = dnnFaceDetector(gray, 1)for (i, rect) in enumerate(rects): x1 = rect.
left() y1 = rect.
top() x2 = rect.
right() y2 = rect.
bottom() # Rectangle around the face cv2.
rectangle(gray, (x1, y1), (x2, y2), (255, 255, 255), 3)plt.
Real time face detectionFinally, we’ll implement the real time version of the CNN face detection :IV.
Which one to choose ?Tough question, but we’ll just go through 2 metrics that are important :the computation timethe accuracyIn terms of speed, HoG seems to be the fastest algorithm, followed by Haar Cascade classifier and CNNs.
However, CNNs in Dlib tend to be the most accurate algorithm.
HoG perform pretty well but have some issues identifying small faces.
HaarCascade Classifiers perform around as good as HoG overall.
I have personally used mainly HoG in my personal projects due to its speed for live face detection.
The Github repository of this article can be found here.
Conclusion : I hope you enjoyed this quick tutorial on OpenCV and Dlib for face detection.
Don’t hesitate to drop a comment if you have any question/remark.
Sources :HOGDLIBViola-Jones PaperFace Detection 1Face Detection 2Face Detection 3DetectMultiScaleViola-Jones.. More details