Deep Learning & Handwritten Arabic Digits

Deep Learning & Handwritten Arabic DigitsUsing the fast.

ai library to classify the AHCD at 99% accuracy!Matthew ArthurBlockedUnblockFollowFollowingFeb 8photo: Morocco, 2000The ‘hello world’ of deep learning is often the MNIST handwritten number dataset, and I wanted to apply the same techniques to a more interesting application: the Arabic Handwritten Characters Dataset (AHCD), a dataset developed by the American University in Cairo.

¹In this example I use the fast.

ai library to train a convolutional neural net (CNN) to correctly classify the AHCD at 99+% accuracy.

Here’s how:First, import the libraries we need and set our GPU to use cuda:%reload_ext autoreload%autoreload 2%matplotlib inlinefrom fastai.

vision import *from fastai.

metrics import error_rateimport csvimport numpy as npimport PIL import pandas as pddefaults.

device = torch.

device('cuda')As with many data science workflows, the data pre-processing is the most substantial component.

Here are the steps to get the data ready for our convolutional neural net:1 — Ingest from csvLike the MNIST Latin-alphabet version, the AHCD is presented as a 784-column csv where each row contains one 28×28 image flattened into a single row of numeric values.

The first task is to load this into memory, and as the dataset is 60k rows to speed the process I set an arbitrary 4k training set limit.

We imported Pandas as pd, so this uses the built-in Pandas read_csv function:trainrows = 4000train = pd.


csv', nrows=trainrows)2 — Convert to a 3D data structure for image processingWe have the data in memory, but each to-be-image is still flat (1 tall by 784 wide) and we want it to be square and multi-dimensional so we can convert it to an RGB image using matplotlib.

Why RGB?.We’re going to use a pretrained restnet34 model that was developed on RGB images.

This simple function takes our Pandastrain dataframe and extracts a single row (passed as a variable), reshapes this row into a square structure, normalizes the digits into the range [0,1], adds two additional dimensions of all zeros, and uses the matplotlib.

plot library to save the image as a png in our path/digits/ folder.

Note: Eventually I’ll add logic to pass the folder as a variable.

For now, it’s hard-coded.

def pdMakePlot(row): pixels = np.


iloc[[row]], dtype='uint8') pixels = pixels.

reshape((28, 28)).

T pixels = np.

true_divide(pixels, 255) dim2 = np.

zeros((28,28)) dim3 = np.

zeros((28,28)) pix = np.

stack((pixels, dim2,dim3), axis=2) row += 1 filename = "digits/%s.

png" % row plt.

imsave(filename, pix) plt.

close('all') return3 — Prepare our source-of-truth dataframeWe are using the fast.


from_df method² of ingesting image data for this convolutional neural net, so we need a Pandas dataframe containing our training filenames & the valid labels.

#import training labels into numpy arraycsv = np.

genfromtxt ('csvtrainlabel.

csv', delimiter=",")csv = csv[0:trainrows]csv = csv.

astype('int32')csv = np.

add(csv,1)csv[csv == 10] = 0#np array that we'll make into the filenames#from 1 to trainrowstrainrange = trainrows +1files = np.

arange(1,trainrange)files = files.

astype(str)#convert to filenamesi = 0;j = 1;for file in files: files[i] = "%s.

png" % j i += 1 j += 1 if i >= trainrange: break#combine two arrays into dataframe and add headerdf = pd.

DataFrame({'name':files, 'label':csv})df.

head()our dataframeAgain, a bit of the ETL process I’ll revisit.

4 — Process & save our training imagesWith this in hand we can use the pdMakePlot() function we defined earlier to process the training images.

The number of images processed is also set by the trainrange variable we set earlier.

i = 0max = trainrange-1for x in range(i,max): pdMakePlot(i) i += 1Now we’re ready for deep learning!.It’s only a few lines of code:#define our transformstfms = get_transforms(do_flip=False)#define our DataBunchdata = ImageDataBunch.

from_df(path=path, df = df, ds_tfms=tfms, size=24)#define our learnerlearn = create_cnn(data, models.

resnet34, metrics=accuracy)Before we train, we can look at a small selection from our DataBunch to confirm we’ve processed everything correctly:data.

show_batch(rows=3, figsize=(7,6))9 handwritten characters & labelsAll good!.We can also run learn.

model to take a detailed look at the learner architecture.

If you’re interested, it’s available.

Anyway, let’s train!Initial Traininglearn.

fit_one_cycle(4)transfer learning from resnet34, 95% accuracy in 16 secondsI think we can do better.

Let’s find the best learning rate and train again.


lr_find()LR Finder is complete, type {learner_name}.


plot() to see the graph.



plot()learning rate against lossWe see that the optimum learning rate is really high.

Let’s try to get a learning rate that’s a little lower than what minimizes loss, say .

05?.Then we’ll unfreeze some of the CNN’s layers and retrain.



fit_one_cycle(3, max_lr=slice(.

006, .

004))much betterFifteen seconds later we have a model that is 99.

6% accurate against the sub-set of the training data we set aside for validation.

Using the ModelNow that we have a model, let’s use it!.After using the function above to read some test data from the test csv:img = open_image('/path/3.

png')pred_class,pred_idx,outputs = learn.

predict(img)pred_classCategory 4 <— that is correctNextNow that we have a working model, and an accurate one, I would like to update the pipeline code to make it more elegant.

I would also like to run the model against the full set of test data to see how it compares against state-of-the-art.

More to come!AlsoAdditional code is on my github: www.



My LinkedIn is https://www.



Say hi!Notes[1] https://www.


com/mloey1/ahcd1 & http://datacenter.


edu/shazeem/[2] https://docs.






. More details

Leave a Reply