How to Get Started With Deep Learning for Computer Vision (7-Day Mini-Course)

We are awash in digital images from photos, videos, Instagram, YouTube, and increasingly live video streams.

Working with image data is hard as it requires drawing upon knowledge from diverse domains such as digital signal processing, machine learning, statistical methods, and these days, deep learning.

Deep learning methods are out-competing the classical and statistical methods on some challenging computer vision problems with singular and simpler models.

In this crash course, you will discover how you can get started and confidently develop deep learning for computer vision problems using Python in seven days.

Note: This is a big and important post.

You might want to bookmark it.

Let’s get started.

How to Get Started With Deep Learning for Computer Vision (7-Day Mini-Course)Photo by oliver.

dodd, some rights reserved.

Before we get started, let’s make sure you are in the right place.

The list below provides some general guidelines as to who this course was designed for.

Don’t panic if you don’t match these points exactly; you might just need to brush up in one area or another to keep up.

You need to know:You do NOT need to be:This crash course will take you from a developer that knows a little machine learning to a developer who can bring deep learning methods to your own computer vision project.

Note: This crash course assumes you have a working Python 2 or 3 SciPy environment with at least NumPy, Pandas, scikit-learn, and Keras 2 installed.

If you need help with your environment, you can follow the step-by-step tutorial here:This crash course is broken down into seven lessons.

You could complete one lesson per day (recommended) or complete all of the lessons in one day (hardcore).

It really depends on the time you have available and your level of enthusiasm.

Below are the seven lessons that will get you started and productive with deep learning for computer vision in Python:Each lesson could take you anywhere from 60 seconds up to 30 minutes.

Take your time and complete the lessons at your own pace.

Ask questions and even post results in the comments below.

The lessons might expect you to go off and find out how to do things.

I will give you hints, but part of the point of each lesson is to force you to learn where to go to look for help on and about the deep learning, computer vision, and the best-of-breed tools in Python (hint: I have all of the answers on this blog, just use the search box).

Post your results in the comments; I’ll cheer you on!Hang in there; don’t give up.

Note: This is just a crash course.

For a lot more detail and fleshed out tutorials, see my book on the topic titled “Deep Learning for Computer Vision.

”Take my free 7-day email crash course now (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Download Your FREE Mini-CourseIn this lesson, you will discover the promise of deep learning methods for computer vision.

Computer Vision, or CV for short, is broadly defined as helping computers to “see” or extract meaning from digital images such as photographs and videos.

Researchers have been working on the problem of helping computers see for more than 50 years, and some great successes have been achieved, such as the face detection available in modern cameras and smartphones.

The problem of understanding images is not solved, and may never be.

This is primarily because the world is complex and messy.

There are few rules.

And yet we can easily and effortlessly recognize objects, people, and context.

Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.

A property of deep learning is that the performance of this type of model improves by training it with more examples and by increasing its depth or representational capacity.

In addition to scalability, another often-cited benefit of deep learning models is their ability to perform automatic feature extraction from raw data, also called feature learning.

Deep learning methods are popular for computer vision, primarily because they are delivering on their promise.

Some of the first large demonstrations of the power of deep learning were in computer vision, specifically image classification.

More recently in object detection and face recognition.

The three key promises of deep learning for computer vision are as follows:Computer vision is not “solved” but deep learning is required to get you to the state-of-the-art on many challenging problems in the field.

For this lesson, you must research and list five impressive applications of deep learning methods in the field of computer vision.

Bonus points if you can link to a research paper that demonstrates the example.

Post your answer in the comments below.

I would love to see what you discover.

In the next lesson, you will discover how to prepare image data for modeling.

In this lesson, you will discover how to prepare image data for modeling.

Images are comprised of matrices of pixel values.

Pixel values are often unsigned integers in the range between 0 and 255.

Although these pixel values can be presented directly to neural network models in their raw format, this can result in challenges during modeling, such as slower than expected training of the model.

Instead, there can be great benefit in preparing the image pixel values prior to modeling, such as simply scaling pixel values to the range 0-1 to centering and even standardizing the values.

This is called normalization and can be performed directly on a loaded image.

The example below uses the PIL library (the standard image handling library in Python) to load an image and normalize its pixel values.

First, confirm that you have the Pillow library installed; it is installed with most SciPy environments, but you can learn more here:Next, download a photograph of Bondi Beach in Sydney Australia, taken by Isabell Schulz and released under a permissive license.

Save the image in your current working directory with the filename ‘bondi_beach.

jpg‘.

Next, we can use the Pillow library to load the photo, confirm the min and max pixel values, normalize the values, and confirm the normalization was performed.

Your task in this lesson is to run the example code on the provided photograph and report the min and max pixel values before and after the normalization.

For bonus points, you can update the example to standardize the pixel values.

Post your findings in the comments below.

I would love to see what you discover.

In the next lesson, you will discover information about convolutional neural network models.

In this lesson, you will discover how to construct a convolutional neural network using a convolutional layer, pooling layer, and fully connected output layer.

A convolution is the simple application of a filter to an input that results in an activation.

Repeated application of the same filter to an input results in a map of activations called a feature map, indicating the locations and strength of a detected feature in an input, such as an image.

A convolutional layer can be created by specifying both the number of filters to learn and the fixed size of each filter, often called the kernel shape.

Pooling layers provide an approach to downsampling feature maps by summarizing the presence of features in patches of the feature map.

Maximum pooling, or max pooling, is a pooling operation that calculates the maximum, or largest, value in each patch of each feature map.

Once the features have been extracted, they can be interpreted and used to make a prediction, such as classifying the type of object in a photograph.

This can be achieved by first flattening the two-dimensional feature maps, and then adding a fully connected output layer.

For a binary classification problem, the output layer would have one node that would predict a value between 0 and 1 for the two classes.

The example below creates a convolutional neural network that expects grayscale images with the square size of 256×256 pixels, with one convolutional layer with 32 filters, each with the size of 3×3 pixels, a max pooling layer, and a binary classification output layer.

Your task in this lesson is to run the example and describe how the shape of an input image would be changed by the convolutional and pooling layers.

For extra points, you could try adding more convolutional or pooling layers and describe the effect it has on the image as it flows through the model.

Post your findings in the comments below.

I would love to see what you discover.

In the next lesson, you will learn how to use a deep convolutional neural network to classify photographs of objects.

In this lesson, you will discover how to use a pre-trained model to classify photographs of objects.

Deep convolutional neural network models may take days, or even weeks, to train on very large datasets.

A way to short-cut this process is to re-use the model weights from pre-trained models that were developed for standard computer vision benchmark datasets, such as the ImageNet image recognition tasks.

The example below uses the VGG-16 pre-trained model to classify photographs of objects into one of 1,000 known classes.

Download this photograph of a dog taken by Justin Morgan and released under a permissive license.

Save it in your current working directory with the filename ‘dog.

jpg‘.

The example below will load the photograph and output a prediction, classifying the object in the photograph.

Note: The first time you run the example, the pre-trained model will have to be downloaded, which is a few hundred megabytes and make take a few minutes based on the speed of your internet connection.

Your task in this lesson is to run the example and report the result.

For bonus points, try running the example on another photograph of a common object.

Post your findings in the comments below.

I would love to see what you discover.

In the next lesson, you will discover how to fit and evaluate a model for image classification.

In this lesson, you will discover how to train and evaluate a convolutional neural network for image classification.

The Fashion-MNIST clothing classification problem is a new standard dataset used in computer vision and deep learning.

It is a dataset comprised of 60,000 small square 28×28 pixel grayscale images of items of 10 types of clothing, such as shoes, t-shirts, dresses, and more.

The example below loads the dataset, scales the pixel values, then fits a convolutional neural network on the training dataset and evaluates the performance of the network on the test dataset.

The example will run in just a few minutes on a modern CPU; no GPU is required.

Your task in this lesson is to run the example and report the performance of the model on the test dataset.

For bonus points, try varying the configuration of the model, or try saving the model and later loading it and using it to make a prediction on new grayscale photographs of clothing.

Post your findings in the comments below.

I would love to see what you discover.

In the next lesson, you will discover how to use image augmentation on training data.

In this lesson, you will discover how to use image augmentation.

Image data augmentation is a technique that can be used to artificially expand the size of a training dataset by creating modified versions of images in the dataset.

Training deep learning neural network models on more data can result in more skillful models, and the augmentation techniques can create variations of the images that can improve the ability of the fit models to generalize what they have learned to new images.

The Keras deep learning neural network library provides the capability to fit models using image data augmentation via the ImageDataGenerator class.

Download a photograph of a bird by AndYaDontStop, released under a permissive license.

Save it into your current working directory with the name ‘bird.

jpg‘.

The example below will load the photograph as a dataset and use image augmentation to create flipped and rotated versions of the image that can be used to train a convolutional neural network model.

Your task in this lesson is to run the example and report the effect that the image augmentation has had on the original image.

For bonus points, try additional types of image augmentation, supported by the ImageDataGenerator class.

Post your findings in the comments below.

I would love to see what you find.

In the next lesson, you will discover how to use a deep convolutional network to detect faces in photographs.

In this lesson, you will discover how to use a convolutional neural network for face detection.

Face detection is a trivial problem for humans to solve and has been solved reasonably well by classical feature-based techniques, such as the cascade classifier.

More recently, deep learning methods have achieved state-of-the-art results on standard face detection datasets.

One example is the Multi-task Cascade Convolutional Neural Network, or MTCNN for short.

The ipazc/MTCNN project provides an open source implementation of the MTCNN that can be installed easily as follows:Download a photograph of a person on the street taken by Holland and released under a permissive license.

Save it into your current working directory with the name ‘street.

jpg‘.

The example below will load the photograph and use the MTCNN model to detect faces and will plot the photo and draw a box around the first detected face.

Your task in this lesson is to run the example and describe the result.

For bonus points, try the model on another photograph with multiple faces and update the code example to draw a box around each detected face.

Post your findings in the comments below.

I would love to see what you discover.

You made it.

Well done!Take a moment and look back at how far you have come.

You discovered:This is just the beginning of your journey with deep learning for computer vision.

Keep practicing and developing your skills.

Take the next step and check out my book on deep learning for computer vision.

How Did You Do With The Mini-Course?.Did you enjoy this crash course?Do you have any questions?.Were there any sticking points?.Let me know.

Leave a comment below.

…with just a few lines of python codeDiscover how in my new Ebook: Deep Learning for Computer VisionIt provides self-study tutorials on topics like: classification, object detection (yolo and rcnn), face recognition (vggface and facenet), data preparation and much more…Skip the Academics.

Just Results.

Click to learn more.

.

. More details

Leave a Reply