A Dog Detector and Breed ClassifierHenry DashwoodBlockedUnblockFollowFollowingJan 29In a field like physics, things keep getting harder, to the point that it’s very difficult to understand what’s going on at the cutting edge unless it’s in highly simplified terms.
In computer science though, and artificial intelligence in particular, knowledge built up slowly over 70+ years by people all over the world is still very intuitive.
In fact, thanks to high level programming languages, frameworks, and a community that values sharing knowledge in easily accessible formats, it’s getting easier to enter the field!In this blog post we will do something that was impossible a decade ago but which students like myself can now pull off in a few lines of Python; build a system that can identify whether a person or dog is in a photo and tell us what breed it is (or most resembles!).
You can find the code at my Github.
Detecting People and DogsThere are several ways to solve image classifical problems.
In this blog post we will be using convolutional neural networks for determining the dog breed.
Before that though we will using the Viola-Jones haar cascade classifier method to detect if the photo contains a human face.
CNNs get the most attention nowadays but you will have definitely seen the Viola-Jones method before.
It’s what draws boxes around faces whenever you open your camera.
A great video explainer by the University of Nottingham’s Mike Pound can be watched here.
Briefly, we work out which filters are best at discriminating between faces and non faces.
The best filter is applied to each region of the image.
If a region passes it gets tested on the next filter.
This is repeated for about 6000 other filters and if a region passes them all we conclude that it contains a face.
There is lots of other stuff in the paper.
For instance, the authors developed a simple yet clever way to efficiently calculate the pixel value of a region in the image.
All we need to do though is download these pretrained filters from a library called OpenCV and run our photo through them.
import cv2def face_detector(img_path): img = cv2.
imread(img_path) gray = cv2.
COLOR_BGR2GRAY) faces = face_cascade.
detectMultiScale(gray) return len(faces) > 0And just like that it counts up how many human faces are in the image!Unfortunately, the good people at OpenCV have not built us some nice haar filters for dogs.
Instead we will use ImageNet, a dataset of 14 million images labelled into 20,000 categories.
It’s been one of the leading computer vision benchmarks over the last decade.
We will be using a smaller version of imagenet with 1000 categories of which categories 151–268 are dog breeds.
We can use a pretrained CNN called Resnet which we will download from Keras’s website (more on Keras in a bit).
resnet50 import ResNet50ResNet50_model_ = ResNet50(weights='imagenet')We need to do a little bit of preprocessing so that the model can make a prediction about our images:from keras.
preprocessing import image from tqdm import tqdmdef path_to_tensor(img_path): img = image.
load_img(img_path, target_size=(224, 224)) x = image.
img_to_array(img) return np.
expand_dims(x, axis=0)def paths_to_tensor(img_paths): list_of_tensors = [path_to_tensor(img_path) for img_path in tqdm(img_paths)] return np.
vstack(list_of_tensors)And now we can see if the prediction made by the model matches one of the dog breed categories:from keras.
resnet50 import preprocess_input, decode_predictionsdef ResNet50_predict_labels(img_path): img = preprocess_input(path_to_tensor(img_path)) return np.
predict(img))def dog_detector(img_path): prediction = ResNet50_predict_labels(img_path) return ((prediction <= 268) & (prediction >= 151))At the end of all that, when I tested the detectors on 100 sample images, the face and dog detectors didn’t have any false negatives.
The dog detector didn’t have any false positives either!Our human face detector resultsOur dog detector resultsDetermining the Dog BreedSo we can say that there is some sort of dog, or person, in the picture.
But I could have done that.
Could I tell the difference between 117 different breeds though, probably not.
Let’s see if we can tell the difference with machine learning.
It’s time to crack out the Keras library.
Frameworks like Tensorflow, Pytorch, Theano, or CNTK perform machine learning operations for us.
Keras is a library that sits on top of some of these and so we can write our code in a more concise and readable way.
Here is how we define our CNN in Keras:model = Sequential()model.
add(Conv2D(32, (3, 3), input_shape = (224, 224, 3), use_bias=False))model.
add(MaxPooling2D(pool_size = (2,2)))model.
add(Conv2D(64, (2,2), use_bias=False))model.
add(MaxPooling2D(pool_size = (2,2)))model.
add(Conv2D(128, (2,2), use_bias=False))model.
summary()What’s going on here?.We have 3 convolutional layers followed by 3 fully connected layers.
This final layer has a softmax activation function which means our output will be the a probability distribution with 1000 values, one for each possible result in the ImageNet-1000 database we are using.
We will define our optimizer and loss function next.
sgd = SGD(lr=0.
01, clipnorm=1, decay=1e-6, momentum = 0.
compile(optimizer=sgd, loss='categorical_crossentropy', metrics=['accuracy'])And then we are good to train!checkpointer = ModelCheckpoint( filepath='saved_models/weights.
hdf5', verbose=1, save_best_only=True)model.
fit(train_tensors, train_targets, validation_data=(valid_tensors, valid_targets), epochs=5, batch_size=20, callbacks=[checkpointer], verbose=1)If we run it for 6 epochs this is what happens:The training and validation losses as our model trainedThe training loss improves as we train but the validation loss looks like it is about to flatten out.
This means our model would start to overfit if we trained it for much longer.
On the test set the model correctly predicted the breeds of 3.
7% of the dogs it saw.
This is about 4 times better than random guessing but leaves quite a bit of room for improvement.
Transfer LearningEnter transfer learning.
We can take a pretrained model, like we did with our detector and add our own layers to the end of it to make predictions for us.
The model we will use is Resnet50 and has been trained by researchers at Microsoft on bigger machines and for more time than I have access to.
We can download the weights, store them locally and then unpack the features like so:bottleneck_features = np.
npz' )train_Resnet50 = bottleneck_features['train']valid_Resnet50 = bottleneck_features['valid']test_Resnet50 = bottleneck_features['test']It’s very easy with Keras to take Resnet50 and add our own layers to the end of it.
Once again, we want our output to be a distribution each breed’s probabilityResnet50_model = Sequential()Resnet50_model.
add(Dense(133, activation='softmax'))After that it’s just the same as beforeResnet50_model.
compile( loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'] )checkpointer = ModelCheckpoint( filepath='saved_models/weights.
hdf5', verbose=1, save_best_only=True )resnet50_hist = Resnet50_model.
fit( train_Resnet50, train_targets, validation_data=(valid_Resnet50, valid_targets), epochs=20, batch_size=56, callbacks=[checkpointer], verbose=1 )How good is this model?.Well, on the test set it correctly predicts 84% of the breeds!.That’s way more than even my dog mad family and I would get.
Trying it outCombining our human and dog detectors with the breed predictor, let’s try this out on some photos.
Correct!.Not even the Barbour jacket confuses itCorrect!.Ironically, my it’s my dog who is overfitting in this photo (those ducks are plastic!)Wrong, this is actually a Border Collie.
But he does look like an Australian ShepherdSilky Terrier, who knew?It’s got a bit confused by Pongo…However if we peer into the model’s predictions we can see that Dalmation came second so not too bad.
bottleneck_feature = extract_Resnet50(path_to_tensor('images/pongo.
png'))predicted_vector = Resnet50_model.
predict(bottleneck_feature)top_predictions = np.
flatten(), -4)[-4:]for i in top_predictions: print(dog_names[i])Well you can’t win them all!Potential ImprovementsThere is lots more we could do here.
The best models in the world get better test accuracies than me on versions of ImageNet with many more possible categories.
We could get some of that improvement by renting multiple GPUs, acquiring more labelled images etc but that basically boils down to spending money and isn’t conceptually very interesting.
One way we could improve this model would be to use data augmentation.
For instance, by flipping, shifting, rotating, darkening, lightening, adding noise and many more possible operations we can artificially increase the number of images the to which the model could be exposed by several times.
I experimented with data augmentation during this project with disappointing results.
It is possible that with some parameter tuning and longer training I would have seen some improvement.
We could alter the CNN itself.
It is possible that more layers with more features, different loss functions, learning rates and all sorts of other changes would have made it perform better.
A big part of deep learning is making changes to a model and only learning afterwards if it actually improved things.
If I had more GPU time, I might have tried using grid search to systematically experiment with different hyperparameters overnight.