Microsoft Introduction to AI — Part 3Christine CaloBlockedUnblockFollowFollowingApr 20Computer VisionIn order for a machine to actually view the world as we do, it relies on Computer Vision.
Computer vision is a form of artificial intelligence where computers can “see” the world, analyse visual data and then gain an understanding about the environment and situation.
You probably don’t realise it but you use computer vision every day.
We use it to unlock apps with facial and fingerprint recognition.
Our homes are fortified with security surveillance systems that use computer vision.
Autonomous cars and drones use it to avoid obstructions.
Computer Vision is a very interesting field of AI and its future is full of many amazing opportunities.
This is Part 3 of the ‘Microsoft Introduction to Artificial Intelligence’ course notes.
Let’s look into the wondrous world of computer vision.
Let’s learn how software can be used to process images and video to understand the world the way that we do.
Background(skip the Background info if you have seen Part 1 or 2)For those who haven’t seen Part 1 or 2 of this series, here is some background info.
I’ve always wanted to learn about Artificial Intelligence (AI) although felt a little intimidated by the maths involved and thought maybe some of the concepts would be out of my depth.
Fortunately, my curiosity overcame my fear so I’ve started to do a few courses related to AI.
I recently completed the Microsoft Introduction to AI course and wrote course notes to help me retain the knowledge that I have learnt.
I have tried to write these notes in a basic way to make it easy to consume.
I’ve recently become an aunt and have bought a few children’s books related to technology and space and really love how the authors and illustrators have managed to simplify complicated topics.
So, I’ve been inspired to treat these topics in a similar way by simplifying them to make it a lot more accessible, especially to those who share my initial AI jitters.
*If you would like to know more info behind the course notes and other notes related to tech and product design you can find out more through here.
*Summary(skip the Summary if you have seen Part 1 or 2)The Microsoft Introduction to AI course provides an overview of AI and explores machine learning principles that provide the foundation for AI.
From the course you can discover the fundamental techniques that you can use to integrate AI capabilities into your apps.
Learn how software can be used to process, analyse and extract meaning from natural language.
Find out how software processes images and video to understand the world the way humans do.
Learn about how to build intelligent bots that enable conversations between humans and AI systems.
Microsoft Introduction to Artificial Intelligence CourseThe course takes approximately 1 month to complete so 1 medium article I write contains 1 week worth of content.
This means that it would only take you approximately half an hour to read this article which is 1 week worth of content.
That is a fast way of learning.
The course is free without a certificate however, if you’d like a certificate as proof of completion there is a fee.
There are labs associated with this course which I won’t include in the notes as I believe the best way to learn is to actually do the labs.
However, these notes are useful if you’d like to know about the fundamental theory behind AI and would like to learn it in a way that might be a lot more simpler than other resources.
I’ve tried to write it in layman terms and have included visuals to help illustrate the ideas.
These notes are useful if you don’t have time to do the course, it’s a quick way to skim through the core concepts.
Alternatively if you have done the course like me you can use these notes to retain what you have learnt.
Instructor:Graeme Malcolm — Senior Content Developer at Microsoft Learning Experiences.
Syllabus(skip the Syllabus if you have seen Part 1 or 2)The course is broken into the four parts which include:1.
Machine LearningLearn about the fundamentals about AI and machine learning.
Language and CommunicationLearn how software can be used to process, analyse and extract meaning from natural language.
Computer vision (*this medium article will focus on just this section)Learn how software can be used to process images and video to understand the world the way that we do.
Conversation as a PlatformFind out how to build intelligent bots that enable conversational communication between humans and AI systems.
Computer VisionThe ‘Computer Vision’ part of the course will tackle the following topics:Getting Started with Image Processing· Image Processing Basics· Equalization· Filters· Edge Detection· Corner DetectionWorking with Images and Video· Image Classification· Image Analysis· Face Detection and Recognition· Video Basics· The Video Indexer· The Video Indexer APIGetting Started with Image ProcessingImage Processing BasicsBefore we can explore how to process and analyse images we need to understand how a computer sees an image.
For example, here our friendly robot has a family portrait.
As a digital image this is actually represented as an array of numbers that indicate pixel intensities between 0 and 255.
Since this is a colour image it may be represented as a three-dimensional array with each dimension corresponding to the red, green and blue hues in the image.
The specifics depend on the format of the image and how you want to work with it.
For example, we could convert this image to grayscale.
When it’s converted to grayscale it can now be represented as a two-dimensional array.
Now there are various libraries for working with images in python.
So let’s take a look at some simple code to load, display and convert an image.
Don’t worry, you don’t need to know about how to code in Python for this particular course.
The example shown is just to get an idea how we can use code to load, display and convert an image.
So here we are adding this matplotlib inline little command because we want to be able to display images in the notebook.
That’s the way that we are able to do that inline with the cells in the notebook.
Then we’ve imported a bunch of libraries that we want to work with.
The really important ones for us are the matplotlib.
pyplot library and the PIL library which allows us to work with images.
What we’re going to do is just go and use curl to grab an image from this GitHub repository and download that.
Then we’ll simply open that and show it.
Notice we are opening it as a np array, a Numpy array.
So let’s go ahead and run that.
So we download the image and here it is an image of a beach.
Let’s take a look at some of the numerical properties of that.
First of all, what type of thing is that image?.Well it’s a numpy.
ndarray a multidimensional array.
So although we’re viewing it as an image, it’s obviously just represented as an array of values.
Well, what type are those values that are in the array?.Well they’re actually unsigned integers, 8-bit integers.
So in other words values between 0 and 255.
The numbers that are in that array actually identify the intensity of the pixels at each point.
Let’s take a look at the shape of the image.
We’ve got a 433 by 650 rectangle and we’ve actually got 3 dimensions of that.
433 by 650 is obviously the dimensions of the image.
The 3 different rectangles because it’s a colour image.
We’ve got one for the red, one for the green and one for the blue.
Now what we can do is convert this image to greyscale.
Once we’ve converted it to grey we’ll just show it as a greyscale image and we’ll look at the shape of that.
This time it’s still 433 by 650 however, now there’s only one rectangle.
This is because we’re now showing the image as intensities of grey.
We don’t need the RGB layers.
So we are able to make it into a single channel image by converting it to grey scale.
Quite often when you’re doing some sort of image analysis, this is a good way to simplify the process by getting rid of the multiple channels and focusing on the one grey scale channel.
EqualizationNow because an image is actually an array of numeric values, the distribution of these values can tell you something about the range of colours or shades the image contains or the objects that are in it.
You can plot these as a histogram to see the distribution of values.
You can also use a cumulative distribution function (CDF) to see how the pixel values accumulate.
Each point on the y axis shows the frequency of all the values up to and including the value on the x axis.
Now for image processing it’s usually good to have a relatively even distribution of pixel values.
A uneven distribution can indicate there are extremes of contrast in the image that need to be equalised.
So you can normalise an image to equalise pixel distribution.
The CDF for a perfectly equalised image will show a straight diagonal line.
Let’s take a look at examples of normalising an image using Python.
The first thing we’re going to do in Jupyter notebook is we’re going to create a histogram with 256 bins for all of the different possible values in pixel intensities.
We’ll create a histogram so that we can see what the distribution of those values looks like.
When we run this code we’ll see the distribution of those values in a form a histogram.
The histogram shows what the distribution of the different kind of shades of grey in that image look like.
Another thing we can do is we can use this cumulative distribution function (CDF).
This basically shows the distribution for all the values up to this point.
We just go ahead and run that.
We’ll get the following graph.
In a perfectly normalised image the CDF would be a perfect straight diagonal line.
As you can see it’s got some curves at the ends and we have a fairly uneven distributed of values.
What we might want to do in order to simplify the processing of that is equalise it.
The unevenness at the moment might indicate that there are some issues with contrast in the image so we would want to equalise out the contrast.
So to do that we’ve got this function below.
So let’s go ahead and run that.
As we can see the equalised image is a bit more even in tone.
We can also view the histogram and CDF plots.
Let’s go ahead and run that.
HistogramCumulative Distribution Function CDFFor the histogram we can still see some spikes here but we can see it’s much more levelled in the middle here so there’s more equal distribution of values across.
We have a more or less diagonal straight line for our CDF.
So we’ve certainly equalised out the values in the histogram and that may well make it easier for us to work with those values to try and extract some features.
FiltersSometimes the images you need to work with may contain noise.
This is often referred to as salt-and-pepper or scatter.
This can make it difficult to detect features in the image because the noise obscures the details.
You can de-noise an image by applying filters.
For example a Gaussian filter works by defining a patch of the image and determining the intensity value for the center pixel based on the weighted average of the pixels that surround it.
The coarser pixels have greater weight than the more distant pixels.
So the average value is assigned to the center pixel and then the patch is moved.
The process is repeated until the entire image has been processed.
The Gaussian filters often produce a blurring effect.
This is because it averages out the pixel intensity, even in areas of the image where there are contrasting shades that define edges or corners.
So as an alternative a median filter works in the same way as a Gaussian filter except that it applies the median value to the center pixel.
This approach can be better for removing small areas of noise in detailed images as it tends to keep the pixel values that are in the same area of the image alike.
This is regardless of how close they are to a contrasting area.
So let’s try this out with code.
We can add some noise to an image.
So this code will basically generate some random noise using a kind of Gaussian blur.
We might want to apply a filter to that in order to connect, clarify and clear out some of the the noise.
One approach we might take is applying a Gaussian filter.
This basically applies an average to a pixel as we move a mask across the image.
So in this case we’re going to create a mask that’s ten pixels in size and we’ll go and run that mask across.
We might want to take something that’s a little less dramatic than a Gaussian filter and apply something like a median filter where rather than taking the average, we’re taking the median value.
This may help clean up the image and de-noise all the speckles.
Edge DetectionAfter you’ve prepared an image you can extract features from it using a variety of techniques.
So for example you could use an edge detection algorithm to find contrasting edges of colour or intensity.
Edge detection algorithm is great for indicating the boundaries of shapes or objects in the image.
So for example the Sobel Edge Detection Algorithm works by combining an operation over the image similar to Gaussian a median filters that you saw previously.
However, this time the matrix that we apply is a two-stage masking function that is applied to the pixel intensity values in the image to calculate a gradient value based on changes in intensity.
This mask is applied to find the horizontal gradient for each pixel.
Let’s call it Gx.
Then this mask is applied to detect the vertical gradients which we’ll call Gy.
We then add the squares of the x and y gradient values that are calculated for each pixel.
We take the square root to determine a gradient value for each pixel and we can calculate the inverse tangent of those values to determine the angle of the edges that have been detected.
So here’s some Python code.
Here we’re creating a function called edge_sobel which passes through the image.
All we’re doing with this is, first we are converting an image to grayscale.
The Sobel edge detection only really works properly with greyscale images and one dimension so we’ll convert it to grayscale just in case.
Then we’re going to take our horizontal pass through and get the horizontal edges.
Then we will do the vertical pass through.
So remember there’s these two passes to get, the X and y of the edges.
Then we apply this hypotenuse function here to basically add the squares of those and take the square root to get the actual magnitude of the different edges that are found.
Then we’re going to apply this function here to normalise that.
We then return that back and display those edges in our image.
When you run that code it will go though the image and detect reasonably sophisticated edges just by applying that particular Sobel algorithm.
So we’re able to work with the fact that the image is really just numbers.
From these numbers we’re able to go and grab these features and extract them so that we can work with them in the image.
Corner DetectionEdge detectors use gradients to find contrast and pixel intensity when a mask is moved horizontally or vertically.
However to detect corners in an image you need to detect contrast in intensity in any direction.
The harris corner detector algorithm works by testing patches of the image.
The patch is moved in multiple directions and the pixel intensity is compared for each position.
Now in a featureless area of the image there’s no significant difference so no corner is detected as shown in the image below.
Now let’s try the patch in an area that contains an edge.
In this case there is a difference in intensity when the patch is moved in one direction but not when moved along the edge.
So again there’s no corner here.
Now let’s position the patch in an area containing a corner.
A movement in any direction results in a change of intensity.
So this looks like it might be a corner.
Now the formula that underlies the Harris algorithm is a little involved it uses a little calculus and some matrix math.
However, there are implementations of this algorithm in many languages including Python.
Let’s try this out with some Python code.
What we’ve got is a function that we’ve defined called corner_harr and we’re going to specify an image and a minimum distance.
Really the important thing here is from the the skimage library.
From here we’re importing the corner_harris which obviously applies the algorithm for finding corners and also something called corner_peaks.
We’re going to use that just to filter out all of the corners of lower magnitude so we’re left with the more prominent corners that are identified in our image.
So we’ll go ahead and work our way through the image, then filter it based on the distance that we’ve specified to the size of the corners.
Now what we’re going to do here is just simply pass in our equalised image.
Remember we learnt how to equalise an image earlier on.
So we’ll pass in our equalised image and then we’re going to plot the results.
We get back that Harris result.
We’re basically going to plot the image and we’re going plot the corners that were detected by the Harris algorithm.
So we’ll just plot those as little red markers so that we can see where the corners are.
When you run that, you will see that it comes back with the most prominent corners.
So again we’ve seen that because our image is actually just an array of numbers we were able to apply these mathematical formulae and algorithms to extract features and discern some sort of meaning from the images.
That’s really what AI technologies are going to do when they’re processing image.
Working with Images and VideoImage ClassificationWell, we’ve looked at various ways to process images and extract features from them.
Hopefully you’ve seen that to a computer, images are just numeric arrays on which mathematical and statistical operations can be performed.
Now that means that we can apply similar machine learning techniques to images just as we do to any other data.
So for example we can train a classification model by tagging images with a label like these images of cars and these images of rockets shown in the image below.
Then we can use the classification model to predict the appropriate tags for new images.
The custom Microsoft computer vision cognitive service provides a framework for building custom image classifiers so you can build AI solutions that identify images of objects that are important to you.
So to take a look at custom image classification, we are going to use the custom vision API.
Let’s imagine that we are working for a very specialised grocers company and what we’re doing is we only sell carrots and apples.
What we really want to be able to do is to automatically detect whether an object is a carrot or an apple.
So let’s take a look at how we might do that.
We can use a classification approach when we’re looking at these from an AI programs perspective.
So the first thing we’re going to do is we’re going to go to this custom vision ai.
This is a service we can sign up for and it’s available through the Microsoft cognitive services.
We can create projects specifically from our own custom sets of image classification.
So we’re going to create a new project here and we’ll just call it fruit and veg.
There are various existing domains or subject areas if you like and this obviously falls in the category of food so we’ll go with food.
We’ll create a project.
Once I’ve created the project the first thing it’s obviously going to need is some images.
We need to train our classification model using some existing images for which we already know what the appropriate tags are.
So we’re going to go and add some images.
Here we’ve got some pictures of carrots so we’ll go ahead and grab these carrots and upload them.
We’ll add a tag ‘carrot’ and upload those files.
That’s going to add them to the project and assign them the tag carrot so that we know that these are our pictures of carrots.
Alright, well that’s one set of one class of images that we’ve got.
Now we’ll add some other images of apples.
We will just tag those as well.
What we want to do is to train our classification model using these images.
Now there aren’t very many images here.
Obviously, the more images you add the more discerning your model will be but this should be enough to get going.
So go ahead and kick off our training.
It’s going to train those and we’re going to withhold some images to do some testing with.
We end up with a pretty impressive performance there of 100% that’s really because there’s very small images with very little to tell between them but you would get more realistic precision and recall depending on the number of images that you upload.
Great, we’ve now got our model trained.
We will set this as being the default model for this particular project.
We’re going to get the information that will need from a client application to actually call this classification model and use it.
There’s a couple of ways you can do this, you can upload an image or you can specify a URL to an image for it to test.
We’re just going to use a URL so we need to have the prediction key, that’s the key from our project.
That’s what authenticates us.
We need to have the URL from our project.
Shown below it gives us the entire URL.
Now let’s look at some Python code.
Basically we’re importing the pill library and the map fault lab pipe or library just so we can display the image and so on.
We’re importing some HTTP libraries here so we can make our HTTP request.
We’re going to be passing some JSON in that request so we’ve got the JSON library, so everything we need to call that API just by making the request.
Here’s the URL of the image that we wanted to classify so it’s basically called test.
We’re going to test that and find out whether that is an apple or a carrot.
So we’ve got our prediction key already in here, that’s the key from our account.
We don’t need to change that, what we do need to change is this project ID in here so we’ll change that so that’s the project we’ve just created.
Basically what we’re going to do here (conn.
request) is set the headers of our HTTP request to be content type is application/json.
We’re passing up a JSON request and here’s the prediction key so that’s our key for our prediction service.
There aren’t any parameters but there is a body and our body just consists of some JSON that says the URL and then whatever the image URL is so the URL to our test image.
Then what we’re going to do is going to create this HTTP connection to the cognitive services.
We’re going to post this up to the custom vision version 1 prediction URL here.
We’re going to pass in our project ID and there’s a URL on the end of that.
Then basically just the body and the parameters in the headers all gets passed off as one HTTP request.
We get the response from that.
That response it’s going to be a JSON response and we’ll load that into a parsed JSON document.
Now what’s actually going to happen is it’s going to come back with all of the possible tags that could be applied to the image.
It’s going for each of the tags and indicate what it’s probability it thinks that it’s the right tag.
So we’ll get a list of tags back because the images might have multiple tags associated with them.
What we want to do is find the most probable tag.
The tag that has the highest probability of being correct, we’re just going to sort those predictions into the right order by tag.
Then we’re going to go and display the image and the tag for that image above it.
So let’s go ahead and run it.
It predicts that the right tag for this is an ‘apple’ and sure enough here is our image and that looks pretty much like an apple.
So we’ve built a custom classifier for images that works very specifically with the images that we need to work with.
We’re able to use that to automate detection and identity of different images that our AI application sees.
Image AnalysisSometimes you might need a more general solution that can not only differentiate between specific classes of object but begin to make sense of what’s in an image.
You might need to be able to identify everyday objects of various types and maybe describe more complex scenes or possibly even read text in images.
The computer vision API is a cognitive service that’s been trained using millions of images and has built an optical character recognition capabilities.
Let’s take a look at it.
Now to use the vision API you’re going have to add a service in your Azure subscription.
Once you’ve done that you should have some access keys that you can use to connect to it from a client application.
Now we can go and write some code.
We’ve set up the URL.
This is a location specific so depending on where you provision it that URL might be different and then there’s the key for your specific service.
So we’ll go ahead and run that.
We’ve got those variables populated ready to use them.
Now the first thing we’re do is grab an image file and display it.
We run this code below in order to do this.
What we’re going to do is we’re going to use the computer vision API to go and grab some features about that image.
So like we did with our custom classification or a custom classifier that we built where we were able to grab some tags, we’re just going to grab some tags from this now.
We’ll do this with the code below.
We haven’t trained this.
This has been trained by Microsoft that’s been trained up with millions and millions of images, so there’s a whole lot of information in here.
With the code above, what we’re going to do is we’re going to set up our headers so but put it in an application/json content type so it will pass up some JSON.
We’ll pass up the key so that we are authenticated.
Then we’ve got some parameters here and basically what we’re saying is these are the things we want you to return.
We want you to return the categories, the description and some information about the colour.
There are a whole bunch other things we could specify but what we really want is to do is categorise the image, give some descriptions about the image such as the colour.
We will bring those back as details described in the English language.
Then the body is just simply the URL that points to the image that we want to analyse.
So we’re going to make a HTTP POST request to the vision v1 analysed method passing in those bits of information and we’ll get back the response.
That response that comes back is JSON so we’ll go ahead and simply read that JSON.
Once we’ve got that, we’re able to go and get the image features for our image URL.
That’s what this function is doing.
Then we’re just going to display a description that it’s recommended.
So let’s go ahead and run that and what comes back is the following:A close up of an umbrella.
We can actually see what the full responses that came back and view further details with the code below.
This is what it returns:So we’ve trained these images or Microsoft has trained these images using millions and millions of images to actually give you some fairly accurate and useful bits of information back just by simply analysing the image against that API.
Face Detection and RecognitionAs humans we are genetically predisposed to detect faces.
That is why so often we can imagine we see faces in random things such as clouds or the moon.
As we collaborate with AI software, we can use live cameras and images to enable computers to see the world around them.
Face detection, the ability to determine whether or not a human face is present in an image, is an important capability for AI.
Especially if this can be combined with face recognition to match the same person in multiple images or to identify specific individuals.
The face api is a cognitive service that’s designed to work with faces and images.
Well the first thing we need to do is to provision the face API.
Once you have set that up you should have some keys that are required from our client applications to work.
The face api is a cognitive service that’s designed to work with faces and images.
We’re now going to write some code.
The first thing we need to do is specify the URI and the key that’s required to connect to the service.
We’ll just get those variables initialised.
Now the face API, there is a cognitive face SDK and it’s available as a Python package so that acts as a wrapper around that.
We’re also going to install the pillow API which again is used to work with images.
So we’ll run those.
So now what we want to do is take an image and see if we can detect a face in that image.
We can do that with the code below.
With the code above, what are going to do is get a bunch of imports.
We’ve imported that cognitive face API here as CF.
There’s a bunch of functions we can use on that CF library to make life easier.
So we’re going to set the base URL to our URI from our specific service and set the key to our key.
That means that we’re ready to go in and make calls to our own instance of that service.
Now we’re going to go and grab an image of a face from a URL here with a file.
We’re going to grab that and then call the the face detect method passing it in that URL to see if we get back any information about faces.
Now if there are one or more faces detected in that image we’ll get back a collection of these face IDs.
This can of course detect multiple faces in an image and we could actually process all of the faces in there but in this example we are only really concerned about the first one in this instance.
So we’re going to get the ID and it will assign an ID to that face.
We’ll then just go and take a look at what that ID looks like.
Then we’re going to grab the image itself and open it up and display it.
For each of the faces that’s found and in this case we are only concerned about the first one for each one that’s found, we’re going to go and find the location of the face in the image.
Then we’re going to basically draw a blue rectangle around the face.
What we get back is the upper-left coordinate of the face and the width and height so we can work with that to draw a rectangle around it.
We’ll then just display that face.
So let’s go ahead and run that code.
Here’s the image that it comes back with and sure enough we’ve got a blue rectangle around the face that it has detected.
Alright, what if we’ve got that image and we want to compare that with another image.
Maybe we are implementing some sort of automatic face detection system, we could be using this for security for getting into a building.
We want the AI solution in our building to recognise this person and compare it to another image that has been recognised and see if it’s the same person.
The code below does just that.
With the code above, what we are going to do is grab another image and just open that.
We’re going to do a face detect on that image, as well as, take that image and do exactly what we did before.
We’ll say here’s an image, tell me is there a face and if there is a face then tell me where it is.
More importantly we will be given an ID so what we get back is the ID for the second face.
We’ve already got the ID for the first face.
So we’ll get the ID for the second face.
We’ve got this function to verify the face.
We’ll pass in the two face IDs and we will use this verify method from my face API to see if the ID for the first one is the same as the the second one.
It can check to see if it’s the same person and if it is we will draw a face with a rectangle around it.
We’ll use a red rectangle if it’s not the same person and a green rectangle if it is the same person.
There’s a confidence level that’s associated with this that tells us how confident it is in that match.
So we can go ahead and run that code.
We can see it has come back with a green rectangle around this image as an indication that it’s the same person as the other image.
It also has a 91% confidence level.
If the image was not a match to the previous image the confidence level would be a lot lower and the rectangle would be the colour red.
This is really useful thing for AI.
If you think about the fact that when we deal with people, we quite often recognise them.
We know who we are and we can tell if we’ve met that person before.
This is just a way of extending that to AI and giving AI the ability to first of all identify faces and secondly to match them to other images or other recollections of those faces.
Video BasicsSo up till now we’ve considered how to work with static images and there’s a lot that we can do with that.
However the world is in motion and video is becoming a more prevalent media format on the web and in general.
Here’s a short video of a guy walking.
Now in reality this video is really just a stream of static images known as frames.
They’re encoded in a format specific codec and encapsulated in a container which also includes a header with metadata about the format duration and so on.
Now to take a look at working with some videos, we’re just going to run some very simple Python code which is going to grab a video play it, as well as, take a look at the frames that are in it.
First of all lets install this library called AV.
So we get that installed and with that package installed we can now start to do some stuff with videos.
!conda install av -c conda-forge -yNow we’re going to grab a little video file and play it in the notebook.
Let’s take a look at the frames in the video.
We’re going to count the frames and show the 25th frame using the code below.
With the code above, we’re going to open up that video and then go into the container and grab that video codec that will give us the frames.
We’re just going to go through each frame and check to see if the index is number 25.
So in other words we find the 25th frame and convert it to an image to see what that frame looks like.
It’s just going to keep counting through the frames and then once we get to the end of that list of frames we’ll take the frame that we are currently at and we’ll subtract one because the frame indexes start at zero.
We’ll subtract one and that should tell us how many frames were in the video now.
This is a small video so this is not the most efficient way to count the frames for larger videos.
When I run that it’s able to run through it and it finds 111 frames in the video and this is what the 25th frame looks like.
This process is going to be the basis of a lot of the work we’re going to do with videos.
The Video IndexerThe video indexer is a cognitive service that brings together many of the AI capabilities we’ve seen so far in this course.
It’s a great example of how AI can help automate time-consuming tasks like compiling and editing the metadata for video files.
So here is what the video indexer UI looks like.
We’ve logged into the service and have already uploaded a video.
This is a video just taking an extract from one of the other Microsoft courses.
The video has not been altered, it’s just been uploaded.
If we go ahead and play it from here, we can actually see some insights appearing.
Over on the right-hand side you can see that there are the people in here.
It’s identified two different faces.
So we’ll just go and tell it the correct names.
It’s also picked up some keywords.
It’s picked up that this is from a course about statistics and it’s picked up the keyword “statistical underlying methodologies”.
It’s picked up some sentiment, it’s positive about 53 percent of the time and neutral about 47 percent of the time.
So it’s a reasonably apositive video.
We can also jump to the bit of the video using these keywords ‘statistical underlying methodologies’ or the bit about the ‘stats course’.
We can also see that it’s automatically generated a transcript for that video.
It’s generated the transcript in English but we could go in a change that to French.
We can also search for a word for example ‘data’.
It will try and find the locations where data appears in the transcripts and jump to that bit of the video automatically.
We can go and edit the insights.
For example we can edit the transcript from “it’s not a fool on stats course” to “it’s not a full-on stats course”.
We can make changes and edit on the fly and publish the video, make it available with all these insights.
This will help people navigate through and find things in a video.
All of this is available both in this user interface and also as an API.
So we could write a piece of code that uploads a video, automatically gets these insights generated and then downloads the insights that get generated.
This is able to analyse those so it’s a pretty impressive use of artificial intelligence to analyse video, audio, do some text analytics and do all sorts of interesting things, making life easier for somebody who needs to manage lots of video files.
The Video Indexer APIVideos are not particularly interesting unless they show some movement.
For example, imagine a video camera used to monitor an endangered species or a secure location.
AI can help you find the frames where movement occurs instead of having to watch the entire video.
Maybe you want to be more specific and only highlight points in the video where a human face is visible or even identify the features of a person in the video.
Well to analyse our video we’re going to use the video indexer API.
So in our code we reference the account and account ID.
viLocation = 'trial'viAccountID = 'YOUR_ACCOUNT_ID'So we’ll run that code to set those variables.
The next thing we do is create a Video Indexer API Subscription.
viKey = 'YOUR_PRIMARY_KEY'OK, well let’s now take a look at the process of connecting and using that API.
First thing we need to do is get an access token.
Now there are various different levels of access token and the first thing we need is an account level access token so that we can connect to our video indexer account.
So we’ve got some code here to do that.
With the code above, we’re going to make an HTTP request and basically we’re going to connect to our API for the video indexer API.
We’re going to pass in our a key and pass in the location and account ID that we want an access token for.
We should get back that access token as a JSON document.
If werun that code, it’s going to come back with the JSON.
Here’s the the value that we extracted from that.
It’s just an access token, a big, long token that we use to make an authenticated connection to the API.
Well now that we’ve got that we can actually start to use the video indexer API.
What we’re going to do is upload a video for processing with the following code:So we run the code and we get a response with some metadata that’s been set for that video.
Now one of the things that’s in there is the video ID.
You could get the Video ID but that’s too tedious so what you could do is run some code to get the Video ID.
This code goes and parses the JSON and grabs the ID.
videoID: a6c7b509adNow uploading a video, the processing takes a bit of time.
So you might just want to check the status of what’s happening with that video as it’s been uploaded.
The code below allows you to check the status.
State: ProcessingOnce the video is finished processing, we’re now ready to connect and start getting some insights.
However, in order to do that we need a different level of access token.
We need an access token at the video level and we can get that with the code below.
So there’s the video access token, same idea as the last time, a big, long security token.
We use that token to view the video.
We can bring elements of the user interface from the video indexer website and go and embed that into our own application just using some code like this.
We’re going to get that URL, passing in the various details for our specific video and token.
Then we’re to display that in an iframe in this notebook.
So what we want to do is use the video indexer to get a breakdown of insights of what’s actually happening in this video with the code below.
What it returns is some insights as shown below.
We get some metadata about the the video and it looks like there is at least one face.
It says there’s an appearance of a face that ends after for 14 seconds it starts at 8.
We don’t know who that person is, we’ve got a known person ID and it’s all zeros.
Lets go and get details of the face identified in the video which the python code below.
Lets go ahead and view that face as a thumbnail using this python code.
As you can see it has returned the following thumbnail.
We can then view the insights of that thumbnail using the code below.
As we can see the person is unknown however, we know that this person is Graeme Malcolm the instructor for this course.
We can edit this and change the unknown to Graeme Malcolm.
Since we’ve now trained the AI to know that this is Graeme Malcolm, we can test it again to see if it can identify that it’s Graeme Malcolm.
To do that we need to reload breakdown and check updated face details.
We can do that with the code below.
As we can see in the meta data, it now identifies that this is Malcolm Graeme.
Final WordThanks for reading this article, Part 3 of the Microsoft Introduction to Artificial Intelligence course.
If you found this helpful then stay tuned for the last part, Part 4 which will go into detail about Conversation as a Platform.
Find out how to build intelligent bots that enable conversational communication between humans and AI systems.
Check back on my Medium account for the final part.
If you had some trouble with some of the concepts in this article (don’t worry it took me awhile for the information to sink in) and you need a bit more info, then enrol for free in the Microsoft Introduction to AI course.
It’s helpful to watch the course videos alongside with these notes.
*If you would like to know a little background info behind the course notes and other notes related to tech and product design you can find out more through here.
*A little backgroundHi, I’m Christine 🙂 I’m a product designer who’s been in the digital field for quite some time and have worked at many different companies; from large companies (as large as 84,000 employees), to mid size and to very small startups still making a name for themselves.
Despite having a lot of experience I’m a product designer who has a fear of suffering from the dunning-kruger effect and so I’m continuously trying to educate myself and I’m always searching for more light.
I believe to be a great designer you need to constantly hone your skills especially if you are working in the digital space which is constantly in motion.
Image Source: Please note that I’ve created the images on this article however some of the images are from Vecteezy and Freepik.