A PyTorch implementation of a neural network looks exactly like a NumPy implementation.

The goal of this section is to showcase the equivalent nature of PyTorch and NumPy.

For this purpose, let’s create a simple three-layered network having 5 nodes in the input layer, 3 in the hidden layer, and 1 in the output layer.

We will use only one training example with one row which has five features and one target.

import torch n_input, n_hidden, n_output = 5, 3, 1 The first step is to do parameter initialization.

Here, the weights and bias parameters for each layer are initialized as the tensor variables.

Tensors are the base data structures of PyTorch which are used for building different types of neural networks.

They can be considered as the generalization of arrays and matrices; in other words, tensors are N-dimensional matrices.

## initialize tensor for inputs, and outputs x = torch.

randn((1, n_input)) y = torch.

randn((1, n_output)) ## initialize tensor variables for weights w1 = torch.

randn(n_input, n_hidden) # weight for hidden layer w2 = torch.

randn(n_hidden, n_output) # weight for output layer ## initialize tensor variables for bias terms b1 = torch.

randn((1, n_hidden)) # bias for hidden layer b2 = torch.

randn((1, n_output)) # bias for output layer After the parameter initialization step, a neural network can be defined and trained in four key steps: Forward Propagation Loss computation Backpropagation Updating the parameters Let’s see each of these steps in a bit more detail.

Forward Propagation: In this step, activations are calculated at every layer using the two steps shown below.

These activations flow in the forward direction from the input layer to the output layer in order to generate the final output.

z = weight * input + bias a = activation_function (z) The following code blocks show how we can write these steps in PyTorch.

Notice that most of the functions, such as exponential and matrix multiplication, are similar to the ones in NumPy.

## sigmoid activation function using pytorch def sigmoid_activation(z): return 1 / (1 + torch.

exp(-z)) ## activation of hidden layer z1 = torch.

mm(x, w1) + b1 a1 = sigmoid_activation(z1) ## activation (output) of final layer z2 = torch.

mm(a1, w2) + b2 output = sigmoid_activation(z2) Loss Computation: In this step, the error (also called loss) is calculated in the output layer.

A simple loss function can tell the difference between the actual value and the predicted value.

Later, we will look at different loss functions available in PyTorch.

loss = y – output Backpropagation: The aim of this step is to minimize the error in the output layer by making marginal changes in the bias and the weights.

These marginal changes are computed using the derivatives of the error term.

Based on the Calculus principle of the Chain rule, the delta changes are back passed to hidden layers where corresponding changes in their weights and bias are made.

This leads to an adjustment in the weights and bias until the error is minimized.

## function to calculate the derivative of activation def sigmoid_delta(x): return x * (1 – x) ## compute derivative of error terms delta_output = sigmoid_delta(output) delta_hidden = sigmoid_delta(a1) ## backpass the changes to previous layers d_outp = loss * delta_output loss_h = torch.

mm(d_outp, w2.

t()) d_hidn = loss_h * delta_hidden Updating the Parameters: Finally, the weights and bias are updated using the delta changes received from the above backpropagation step.

learning_rate = 0.

1 w2 += torch.

mm(a1.

t(), d_outp) * learning_rate w1 += torch.

mm(x.

t(), d_hidn) * learning_rate b2 += d_outp.

sum() * learning_rate b1 += d_hidn.

sum() * learning_rate Finally, when these steps are executed for a number of epochs with a large number of training examples, the loss is reduced to a minimum value.

The final weight and bias values are obtained which can then be used to make predictions on the unseen data.

Use Case 1: Handwritten Digital Classification In the previous section, we saw a simple use case of PyTorch for writing a neural network from scratch.

In this section, we will use different utility packages provided within PyTorch (nn, autograd, optim, torchvision, torchtext, etc.

) to build and train neural networks.

Neural networks can be defined and managed easily using these packages.

In our use case, we will create a Multi-Layered Perceptron (MLP) network for building a handwritten digit classifier.

We will make use of the MNIST dataset included in the torchvision package.

The first step, as with any project you’ll work on, is data preprocessing.

We need to transform the raw dataset into tensors and normalize them in a fixed range.

The torchvision package provides a utility called transforms which can be used to combine different transformations together.

from torchvision import transforms _tasks = transforms.

Compose([ transforms.

ToTensor(), transforms.

Normalize((0.

5, 0.

5, 0.

5), (0.

5, 0.

5, 0.

5)) ]) The first transformation converts the raw data into tensor variables and the second transformation performs normalization using the below operation: x_normalized = x-mean / std The values 0.

5 and 0.

5 represent the mean and standard deviation for 3 channels: red, green, and blue.

from torchvision.

datasets import MNIST ## Load MNIST Dataset and apply transformations mnist = MNIST(“data”, download=True, train=True, transform=_tasks) Another excellent utility of PyTorch is DataLoader iterators which provide the ability to batch, shuffle and load the data in parallel using multiprocessing workers.

For the purpose of evaluating our model, we will partition our data into training and validation sets.

from torch.

utils.

data import DataLoader from torch.

utils.

data.

sampler import SubsetRandomSampler ## create training and validation split split = int(0.

8 * len(mnist)) index_list = list(range(len(mnist))) train_idx, valid_idx = index_list[:split], index_list[split:] ## create sampler objects using SubsetRandomSampler tr_sampler = SubsetRandomSampler(train_idx) val_sampler = SubsetRandomSampler(valid_idx) ## create iterator objects for train and valid datasets trainloader = DataLoader(mnist, batch_size=256, sampler=tr_sampler) validloader = DataLoader(mnist, batch_size=256, sampler=val_sampler) The neural network architectures in PyTorch can be defined in a class which inherits the properties from the base class from nn package called Module.

This inheritance from the nn.

Module class allows us to implement, access, and call a number of methods easily.

We can define all the layers inside the constructor of the class, and the forward propagation steps inside the forward function.

We will define a network with the following layer configurations: [784, 128,10].

This configuration represents the 784 nodes (28*28 pixels) in the input layer, 128 in the hidden layer, and 10 in the output layer.

Inside the forward function, we will use the sigmoid activation function in the hidden layer (which can be accessed from the nn module).

import torch.

nn.

functional as F class Model(nn.

Module): def __init__(self): super().

__init__() self.

hidden = nn.

Linear(784, 128) self.

output = nn.

Linear(128, 10) def forward(self, x): x = self.

hidden(x) x = F.

sigmoid(x) x = self.

output(x) return x model = Model() Define the loss function and the optimizer using the nn and optim package: from torch import optim loss_function = nn.

CrossEntropyLoss() optimizer = optim.

SGD(model.

parameters(), lr=0.

01, weight_decay= 1e-6, momentum = 0.

9, nesterov = True) We are now ready to train the model.

The core steps will remain the same as we saw earlier: Forward Propagation, Loss Computation, Backpropagation, and updating the parameters.

for epoch in range(1, 11): ## run the model for 10 epochs train_loss, valid_loss = [], [] ## training part model.

train() for data, target in trainloader: optimizer.

zero_grad() ## 1.

forward propagation output = model(data) ## 2.

loss calculation loss = loss_function(output, target) ## 3.

backward propagation loss.

backward() ## 4.

weight optimization optimizer.

step() train_loss.

append(loss.

item()) ## evaluation part model.

eval() for data, target in validloader: output = model(data) loss = loss_function(output, target) valid_loss.

append(loss.

item()) print (“Epoch:”, epoch, “Training Loss: “, np.

mean(train_loss), “Valid Loss: “, np.

mean(valid_loss)) >> Epoch: 1 Training Loss: 0.

645777 Valid Loss: 0.

344971 >> Epoch: 2 Training Loss: 0.

320241 Valid Loss: 0.

299313 >> Epoch: 3 Training Loss: 0.

278429 Valid Loss: 0.

269018 >> Epoch: 4 Training Loss: 0.

246289 Valid Loss: 0.

237785 >> Epoch: 5 Training Loss: 0.

217010 Valid Loss: 0.

217133 >> Epoch: 6 Training Loss: 0.

193017 Valid Loss: 0.

206074 >> Epoch: 7 Training Loss: 0.

174385 Valid Loss: 0.

180163 >> Epoch: 8 Training Loss: 0.

157574 Valid Loss: 0.

170064 >> Epoch: 9 Training Loss: 0.

144316 Valid Loss: 0.

162660 >> Epoch: 10 Training Loss: 0.

133053 Valid Loss: 0.

152957 Once the model is trained, make the predictions on the validation data.

## dataloader for validation dataset dataiter = iter(validloader) data, labels = dataiter.

next() output = model(data) _, preds_tensor = torch.

max(output, 1) preds = np.

squeeze(preds_tensor.

numpy()) print (“Actual:”, labels[:10]) print (“Predicted:”, preds[:10]) >>> Actual: [0 1 1 1 2 2 8 8 2 8] >>> Predicted: [0 1 1 1 2 2 8 8 2 8] Use Case 2: Object Image Classification Let’s take things up a notch.

In this use case, we will create convolutional neural network (CNN) architectures in PyTorch.

We will perform object image classification using the popular CIFAR-10 dataset.

This dataset is also included in the torchvision package.

The entire procedure to define and train the model will remain the same as the previous use case, except the introduction of additional layers in the network.

Let’s load and transform the dataset: ## load the dataset from torchvision.

datasets import CIFAR10 cifar = CIFAR10(data, train=True, download=True, transform=_tasks) ## create training and validation split split = int(0.

8 * len(cifar)) index_list = list(range(len(cifar))) train_idx, valid_idx = index_list[:split], index_list[split:] ## create training and validation sampler objects tr_sampler = SubsetRandomSampler(train_idx) val_sampler = SubsetRandomSampler(valid_idx) ## create iterator objects for train and valid datasets trainloader = DataLoader(cifar, batch_size=256, sampler=tr_sampler) validloader = DataLoader(cifar, batch_size=256, sampler=val_sampler) We will create the architecture with three convolutional layers for low-level feature extraction, three pooling layers for maximum information extraction, and two linear layers for linear classification.

class Model(nn.

Module): def __init__(self): super(Model, self).

__init__() ## define the layers self.

conv1 = nn.

Conv2d(3, 16, 3, padding=1) self.

conv2 = nn.

Conv2d(16, 32, 3, padding=1) self.

conv3 = nn.

Conv2d(32, 64, 3, padding=1) self.

pool = nn.

MaxPool2d(2, 2) self.

linear1 = nn.

Linear(1024, 512) self.

linear2 = nn.

Linear(512, 10) def forward(self, x): x = self.

pool(F.

relu(self.

conv1(x))) x = self.

pool(F.

relu(self.

conv2(x))) x = self.

pool(F.

relu(self.

conv3(x))) x = x.

view(-1, 1024) ## reshaping x = F.

relu(self.

linear1(x)) x = self.

linear2(x) return x model = Model() Define the loss function and the optimizer: import torch.

optim as optim loss_function = nn.

CrossEntropyLoss() optimizer = optim.

SGD(model.

parameters(), lr=0.

01, weight_decay= 1e-6, momentum = 0.

9, nesterov = True) ## run for 30 Epochs for epoch in range(1, 31): train_loss, valid_loss = [], [] ## training part model.

train() for data, target in trainloader: optimizer.

zero_grad() output = model(data) loss = loss_function(output, target) loss.

backward() optimizer.

step() train_loss.

append(loss.

item()) ## evaluation part model.

eval() for data, target in validloader: output = model(data) loss = loss_function(output, target) valid_loss.

append(loss.

item()) Once the model is trained, we can generate predictions on the validation set.

## dataloader for validation dataset dataiter = iter(validloader) data, labels = dataiter.

next() output = model(data) _, preds_tensor = torch.

max(output, 1) preds = np.

squeeze(preds_tensor.

numpy()) print (“Actual:”, labels[:10]) print (“Predicted:”, preds[:10]) Actual: [truck, truck, truck, horse, bird, truck, ship, bird, deer, bird] Pred: [truck, automobile, automobile, horse, bird, airplane, ship, bird, deer, bird] Use Case 3: Sentiment Text Classification We’ll pivot from computer vision use cases to natural language processing.

The idea is to showcase the utility of PyTorch in a variety of domains.

In this section, we’ll leverage PyTorch for text classification tasks using RNN (Recurrent Neural Networks) and LSTM (Long Short Term Memory) layers.

First, we will load a dataset containing two fields — text and target.

The target contains two classes, class1 and class2, and our task is to classify each text into one of these classes.

You can download the dataset here.

train = pd.

read_csv(“train.

csv”) x_train = train[“text”].

values y_train = train[target].

values I highly recommend setting seeds before getting into the heavy coding.

This ensures that the results you will see are the same as mine – a very useful (and helpful) feature when learning new concepts.

np.

random.

seed(123) torch.

manual_seed(123) torch.

cuda.

manual_seed(123) torch.

backends.

cudnn.

deterministic = True In the preprocessing step, convert the text data into a padded sequence of tokens so that it can be passed into embedding layers.

I will use the utilities provided in the Keras package, but the same can be done using the torchtext package as well.

from keras.

preprocessing import text, sequence ## create tokens tokenizer = Tokenizer(num_words = 1000) tokenizer.

fit_on_texts(x_train) word_index = tokenizer.

word_index ## convert texts to padded sequences x_train = tokenizer.

texts_to_sequences(x_train) x_train = pad_sequences(x_train, maxlen = 70) Next, we need to convert the tokens into vectors.

I will use pretrained GloVe word embeddings for this purpose.

We will load these word embeddings and create an embedding matrix containing the word vector for every word in the vocabulary.

EMBEDDING_FILE = glove.

840B.

300d.

txt embeddings_index = {} for i, line in enumerate(open(EMBEDDING_FILE)): val = line.

split() embeddings_index[val[0]] = np.

asarray(val[1:], dtype=float32) embedding_matrix = np.

zeros((len(word_index) + 1, 300)) for word, i in word_index.

items(): embedding_vector = embeddings_index.

get(word) if embedding_vector is not None: embedding_matrix[i] = embedding_vector Define the model architecture with embedding layers and LSTM layers: class Model(nn.

Module): def __init__(self): super(Model, self).

__init__() ## Embedding Layer, Add parameter self.

embedding = nn.

Embedding(max_features, embed_size) et = torch.

tensor(embedding_matrix, dtype=torch.

float32) self.

embedding.

weight = nn.

Parameter(et) self.

embedding.

weight.

requires_grad = False self.

embedding_dropout = nn.

Dropout2d(0.

1) self.

lstm = nn.

LSTM(300, 40) self.

linear = nn.

Linear(40, 16) self.

out = nn.

Linear(16, 1) self.

relu = nn.

ReLU() def forward(self, x): h_embedding = self.

embedding(x) h_lstm, _ = self.

lstm(h_embedding) max_pool, _ = torch.

max(h_lstm, 1) linear = self.

relu(self.

linear(max_pool)) out = self.

out(linear) return out model = Model() Create training and validation sets: from torch.

utils.

data import TensorDataset ## create training and validation split split_size = int(0.

8 * len(train_df)) index_list = list(range(len(train_df))) train_idx, valid_idx = index_list[:split], index_list[split:] ## create iterator objects for train and valid datasets x_tr = torch.

tensor(x_train[train_idx], dtype=torch.

long) y_tr = torch.

tensor(y_train[train_idx], dtype=torch.

float32) train = TensorDataset(x_tr, y_tr) trainloader = DataLoader(train, batch_size=128) x_val = torch.

tensor(x_train[valid_idx], dtype=torch.

long) y_val = torch.

tensor(y_train[valid_idx], dtype=torch.

float32) valid = TensorDataset(x_val, y_val) validloader = DataLoader(valid, batch_size=128) Define loss and optimizers: loss_function = nn.

BCEWithLogitsLoss(reduction=mean) optimizer = optim.

Adam(model.

parameters()) Training the model: ## run for 10 Epochs for epoch in range(1, 11): train_loss, valid_loss = [], [] ## training part model.

train() for data, target in trainloader: optimizer.

zero_grad() output = model(data) loss = loss_function(output, target.

view(-1,1)) loss.

backward() optimizer.

step() train_loss.

append(loss.

item()) ## evaluation part model.

eval() for data, target in validloader: output = model(data) loss = loss_function(output, target.

view(-1,1)) valid_loss.

append(loss.

item()) Finally, we can obtain the predictions: dataiter = iter(validloader) data, labels = dataiter.

next() output = model(data) _, preds_tensor = torch.

max(output, 1) preds = np.

squeeze(preds_tensor.

numpy()) Actual: [0 1 1 1 1 0 0 0 0] Predicted: [0 1 1 1 1 1 1 1 0 0] Use Case #4: Image Style Transfer Let’s look at one final use case where we will perform artistic style transfer.

This is one of the most creative projects I have worked on and hopefully you’ll have fun with this as well.

The basic idea behind the style transfer concept is: Take the objects/context from one image Take the style/texture from a second image Generate a final image which is a mixture of the two This concept was introduced in the paper: “Image Style Transfer using Convolutional Networks”.

An example of style transfer is shown below: Awesome, right?.Let’s look at it’s implementation in PyTorch.

The process involves six steps: Low-level feature extraction from both the input images.

This can be done using pretrained deep learning models such as VGG19.

from torchvision import models # get the features portion from VGG19 vgg = models.

vgg19(pretrained=True).

features # freeze all VGG parameters for param in vgg.

parameters(): param.

requires_grad_(False) # check if GPU is available device = torch.

device(“cpu”) if torch.

cuda.

is_available(): device = torch.

device(“cuda”) vgg.

to(device) Load the two images on the device and obtain the features from VGG.

Also, apply the transformations: resize to tensor, and normalization of values.

from torchvision import transforms as tf def transformation(img): tasks = tf.

Compose([tf.

Resize(400), tf.

ToTensor(), tf.

Normalize((0.

44,0.

44,0.

44),(0.

22,0.

22,0.

22))]) img = tasks(img)[:3,:,:].

unsqueeze(0) return img img1 = Image.

open(“image1.

jpg”).

convert(RGB) img2 = Image.

open(“image2.

jpg”).

convert(RGB) img1 = transformation(img1).

to(device) img2 = transformation(img2).

to(device) Now, we need to obtain the relevant features of the two images.

From the first image, we need to extract features related to the context or the objects present.

From the second image, we need to extract features related to styles and textures.

Object Related Features: In the original paper, the authors have suggested that more valuable information about objects and context can be extracted from the initial layers of the network.

This is because in the higher layers, the information space becomes more complex and detailed pixel information is lost.

Style Related Features: In order to obtain the texture information from the second image, the authors used correlations between different features in different layers.

This is explained in detail in point 4 below.

But before get there, let’s look at the structure of a typical VGG19 model: For object information extraction, Conv42 was the layer of interest.

It’s present in the 4th convolutional block with a depth of 512.

For style representation, the layers of interest were the first convolutional layer of every convolutional block in the network, i.

e.

, conv11, conv21, conv31, conv41, and conv51.

These layers were selected purely based on the author’s experiments and I am only replicating their results in this article.

def get_features(image, model): layers = {0: conv1_1, 5: conv2_1, 10: conv3_1, 19: conv4_1, 21: conv4_2, 28: conv5_1} x = image features = {} for name, layer in model.

_modules.

items(): x = layer(x) if name in layers: features[layers[name]] = x return features img1_features = get_features(img1, vgg) img2_features = get_features(img2, vgg) As mentioned in the previous point, the authors used correlations in different layers to obtain the style related features.

These feature correlations are given by the Gram matrix G, where every cell (i, j) in G is the inner product between the vectorised feature maps i and j in a layer.

def correlation_matrix(tensor): _, d, h, w = tensor.

size() tensor = tensor.

view(d, h * w) correlation = torch.

mm(tensor, tensor.

t()) return correlation correlations = {l: correlation_matrix(img2_features[l]) for l in img2_features} We can finally perform style transfer using these features and correlations,.

Now in order to transfer the style from one image to the other, we need to set the weight of every layer used to obtain style features.

As mentioned above, the initial layers provide more information so we’ll set more weight for these layers.

Also, define the optimizer function and the target image which will be the copy of image1.

weights = {conv1_1: 1.

0, conv2_1: 0.

8, conv3_1: 0.

25, conv4_1: 0.

21, conv5_1: 0.

18} target = img1.

clone().

requires_grad_(True).

to(device) optimizer = optim.

Adam([target], lr=0.

003) Start the loss minimization process in which we run the loop for a large number of steps and calculate the loss related to object feature extraction and style feature extraction.

Using the minimized loss, the network parameters are updated which further updates the target image.

After some iterations, the updated image will be generated.

for ii in range(1, 2001): ## calculate the content loss (from image 1 and target) target_features = get_features(target, vgg) loss = target_features[conv4_2] – img1_features[conv4_2] content_loss = torch.

mean((loss)**2) ## calculate the style loss (from image 2 and target) style_loss = 0 for layer in weights: target_feature = target_features[layer] target_corr = correlation_matrix(target_feature) style_corr = correlations[layer] layer_loss = torch.

mean((target_corr – style_corr)**2) layer_loss *= weights[layer] _, d, h, w = target_feature.

shape style_loss += layer_loss / (d * h * w) total_loss = 1e6 * style_loss + content_loss optimizer.

zero_grad() total_loss.

backward() optimizer.

step() In the end, we can view the predicted results.

I ran it for only a small number of iterations, but one can run up to 3000 iterations (if computation resources are no bar!).

def tensor_to_image(tensor): image = tensor.

to(“cpu”).

clone().

detach() image = image.

numpy().

squeeze() image = image.

transpose(1, 2, 0) image *= np.

array((0.

22, 0.

22, 0.

22)) + np.

array((0.

44, 0.

44, 0.

44)) image = image.

clip(0, 1) return image fig, (ax1, ax2) = plt.

subplots(1, 2, figsize=(20, 10)) ax1.

imshow(tensor_to_image(img1)) ax2.

imshow(tensor_to_image(target)) End Notes There are plenty of other use cases where PyTorch can, and has been used.

It has quickly become the darling of researchers around the globe.

The majority of the open source libraries and developments you’ll see happening nowadays have a PyTorch implementation available on GitHub.

In this article, I have illustrated what PyTorch is and how you can get started with implementing it in different use cases.

One should treat this guide as the starting point.

The performance in every use case can be improved with more data, more fine-tuning of network parameters, and most importantly, applying creative skills while building network architectures.

Thanks for reading and do leave your feedback in the comments section below.

References Official PyTorch Guide: https://pytorch.

org/tutorials/ Deep Learning with PyTorch: https://pytorch.

org/tutorials/beginner/deep_learning_60min_blitz.

html Faizan’s Article on AnalyticsVidhya: https://www.

analyticsvidhya.

com/blog/2018/02/pytorch-tutorial/ Udacity Deeplearning using Pytorch: https://github.

com/udacity/deep-learning-v2-pytorch Image Style Transfer Original Paper: https://www.

cv-foundation.

org/openaccess/content_cvpr_2016/papers/Gatys_Image_Style_Transfer_CVPR_2016_paper.

pdf You can also read this article on Analytics Vidhyas Android APP Share this:Click to share on LinkedIn (Opens in new window)Click to share on Facebook (Opens in new window)Click to share on Google+ (Opens in new window)Click to share on Twitter (Opens in new window)Click to share on Pocket (Opens in new window)Click to share on Reddit (Opens in new window)Like this:Like Loading.

(adsbygoogle = window.

adsbygoogle || []).

push({});.