Transform Grayscale Images to RGB Using Python’s MatplotlibLearn about image data structures while adding two dimensions for computer vision & deep learning pipelinesMatthew ArthurBlockedUnblockFollowFollowingFeb 10R, G, & B — Arabic numeral ‘3’Data pre-processing is critical for computer vision applications, and properly converting grayscale images to the RGB format expected by current deep learning frameworks is an essential technique.
What does that mean?Understanding Color Image StructureMost color photos are composed of three interlocked arrays, each responsible for either Red, Green, or Blue values (hence RGB) and the integer values within each array representing a single pixel-value.
Meanwhile, black-and-white or grayscale photos have only a single channel, read from one array.
Using the matplotlib library, let’s look at a color (RGB) image:img = plt.
shape)(525, 1050, 3)The output of the matplotlib.
shape call tells us that the image has height of 525 pixels, width of 1050 pixels, and there are three arrays (channels) of this size.
A whale image, from a recent kaggle competitionThe img object is <class ‘numpy.
ndarray’>, so let’s look at the shape and values of each layer:#valuesprint(img)198print(img)[155 177 198]print(img)[[ 68 107 140] [ 76 115 148] [ 76 115 148] [ 75 114 147] .
[171 196 216] [171 193 214] [171 193 214] [155 177 198]]All right, what are the print commands above telling us about this image which is composed of 1050 columns (width) each with 525 rows (height)?First, we look at the value of the very last pixel, at the last row of the last column and the last channel: 198.
This tell us that the file most likely uses values from 0 to 255.
Next, we look at the values of this pixel across all three channels: [155, 177, 198].
These are the Red, Green & Blue values at that pixel.
³And for fun we can look at the values of the last row across all layers and all rows.
Understanding Grayscale Image StructureGrayscale images only have one channel!.That’s it!The problemQuoting the Pytorch documentation:¹ All pre-trained models expect input images normalized in the same way, i.
mini-batches of 3-channel RGB images of shape (3 x H x W).
So trying to ingest your grayscale with many computer vision / deep learning pipelines relying on transfer learning from a standard commodity model such as Resnet18 or -34 will result in a variety of errors.
The SolutionAdd two additional channels to a grayscale!.There are a variety of ways to do this, so my way is below: copy the first layer into new layers of a new 3D array, thus generating a color image (of a black-and-white, so it’ll still be B&W).
I’ll work with a square image from the Arabic Handwritten Digit Dataset as an example.
The shape is (28, 28) which confirms it is a single-channel image.
matplotlib defaults grayscales as aboveSince I want to feed this into a model based on Resnet34, I need three channels.
The obvious (and less-than-correct) way is to add two arrays of zeros of the same size:dim = np.
zeros((28,28))R = np.
stack((O,dim, dim), axis=2)O is our Original array.
We can add two zero arrays of the same shape easily enough but we will get a red-dominated image:three channels, all values in channel 1Whoops!.We want to populate the same values across all channels.
Before we do, though, let’s see what happens if we roll our original array through each of the non-fully-RGB possibilities:R, G, BRG, RB, GBAll right, here’s RGB:Finally!And that’s it.
AlsoAdditional code is on my github: www.
My LinkedIn is https://www.
Say hi!Notes https://pytorch.
html Indexing in numpy is detailed here: https://docs.
html.. More details