Build a Multi Digit Detector with Keras and OpenCV

Yes, OCR can be a good starting point to automatically detecting numbers but OCR doesn’t always work and sometime we need to train a neural network for our specific task.

Digit detection pipelineThe digit detection problem can be divided into 2 partsDigits localisationDigits identificationDigits Localization :An image can contain digits in any position and for the digits to be detected we need to first find the regions which contain those digits.

The digits can have different sizes and backgrounds.

There are multiple ways to detect location of digits.

We can utilize simple image morphological operations like binarization , erosion , dilations to extract digit regions in the images.

However these can become too specific to images due to the presence of tuning parameters like threshold , kernel sizes etc.

We can also use complex unsupervised feature detectors, deep models etc.

Digits Identification :The localized digit regions serve as inputs for the digit identification process.

MNIST dataset is the canonical data set for handwritten digit identification.

Most data scientists have experimented with this data set.

It contains around 60,000 handwritten digits for training and 10,000 for testing.

Some examples look like :MNIST ImagesHowever, the digits in real life scenarios are generally very different.

They are of different colours and generally printed like the below cases.

Day to day digit imagesA bit for research leads us to one more public dataset SVHN — Street View House Numbers dataset.

The dataset consists of house-number images gathered from Google’s street view and annotated.

Sample images from SVHN below :SVHN ImagesThis data set has a variety of digit combinations against many backgrounds and will work better for a generalized model.

Modelling in KerasWe chose this repo for implementing a multiple digit detector.

It is well written and easy to follow.

Digit Localization is done using Maximally Stable Extremal Regions (MSER) method which serves as a stable feature detector.

MSER is mainly used for blob detection within images.

The blobs are continuous sets of pixels whose outer boundary pixel intensities are higher (by a given threshold) than the inner boundary pixel intensities.

Such regions are said to be maximally stable if they do not change much over a varying amount of intensities.

MSER has a lighter run-time complexity of O(nlog(log(n))) where n is the total number of pixels on the image.

The algorithm is also robust to blur and scale.

This makes it a good candidate for extraction of text / digits.

To learn more about MSER, please check out this link.

Digit recognition is done using a CNN with convolution, maxpool and FC layers that classify each detected region into 10 different digits.

The classifier gets to 95% accuracy on the test set.

We tested the repo on a variety of examples and found that it works quite well.

See examples shared above.

There were some gaps where either the localizer didn’t work perfectly (digit 1’s location not detected) or the detector failed ( $ detected as 5).

ConclusionWe hope this blog proves to be a good starting point to understand how multi-digit detection pipeline works.

We have shared a good github link that can be used to build a model on the SVHN data set.

If this model doesn’t work well.

you can collect your own data and fine tune the trained model.

I have my own deep learning consultancy and love to work on interesting problems.

I have helped many startups deploy innovative AI based solutions.

Check us out at — http://deeplearninganalytics.


If you have a project that we can collaborate on, then please contact me through my website or at info@deeplearninganalytics.

orgYou can also see my other writings at: https://medium.


dwivediReferences:SVHN Data setUnderstanding MSERCode for multi-digit detector.. More details

Leave a Reply