Reading between the layers (LSTM Network)

Reading between the layers (LSTM Network)Using PyTorch framework for Deep LearningSamarth AgrawalBlockedUnblockFollowFollowingFeb 20Photo by Paul Skorupskas on UnsplashOne of the most crucial part of building a Deep Neural Network is— to have a clear view on your data as it flows through layers undergoing change in dimensions, alteration in shape, flattening and then re-shaping…We will refer to the LSTM Architecture that we have seen earlier in our Sentiment Analysis Tutorial.

Link to the article here.

Sentiment Analysis using LSTM Step-by-StepUsing PyTorch framework for Deep Learningtowardsdatascience.

comLSTM Network Architecture for Sentiment AnalysisThe layers are as follows:0.

Tokenize : This is not a layer for LSTM network but a mandatory step of converting our words into tokens (integers)Embedding Layer: that converts our word tokens (integers) into embedding of specific sizeLSTM Layer: defined by hidden state dims and number of layersFully Connected Layer: that maps output of LSTM layer to a desired output sizeSigmoid Activation Layer: that turns all output values in a value between 0 and 1Output: Sigmoid output from the last timestep is considered as the final output of this networkBefore you define the Model Class it will give a good insight to have a closer look at each of the layers.

This will help you get more clarity on how to prepare your inputs for Embedding, LSTM, Linear layers of your model architectureContext:We are using IMDB movies review dataset, data processing and preparation steps are already done.

Click here, if you need to revisit those steps.

We are starting off with dataloaders (where we have defined batch_size=50 and sequence length=200).

As per my article on building Sentiment Analysis model using LSTM, we are looking at Step 14 with a microscope —Let’s start by looking at inupts and targets from dataloaders :dataiter = iter(train_loader)x, y = dataiter.

next()x = x.


LongTensor)print ('X is', x)print ('Shape of X and y are :', x.

shape, y.

shape)Reviews converted into tokens (integers)From the shape of X we can see that X is a tensor of 50 rows (=batch size) & 200 columns (=sequence length).

This assures that our process of tokenization is working fine.

This X will go as input into Embedding layerEmbedding layer :The module that allows you to use Embedding is torch.



It takes two parameters : the vocabulary size and the dimensionality of the embeddingfrom torch import nnvocab_size = len(words)embedding_dim = 30embeds = nn.

Embedding(vocab_size, embedding_dim)print ('Embedding layer is ', embeds)print ('Embedding layer weights ', embeds.


shape)Embedding Layer ‘Weight Matrix’ or ‘Look-up Table’embeds_out = embeds(x)print ('Embedding layer output shape', embeds_out.

shape)print ('Embedding layer output ', embeds_out)Input tokens converted into embedding vectorsFrom the output of embedding layer we can see it has created a 3 dimensional tensor as a result of embedding weights.

Now it has 50 rows, 200 columns and 30 embedding dimension i.


for each tokenized word in our review we have added embedding dimension.

This data will now go to LSTM LayerLSTM Layer :While defining the LSTM layer we have kept Batch First = True and number of hidden units = 512.

# initializing the hidden state to 0hidden=Nonelstm = nn.

LSTM(input_size=embedding_dim, hidden_size=512, num_layers=1, batch_first=True)lstm_out, h = lstm(embeds_out, hidden)print ('LSTM layer output shape', lstm_out.

shape)print ('LSTM layer output ', lstm_out)Output of LSTM layerBy looking at the output of LSTM layer we see that our tensor is now has 50 rows, 200 columns and 512 LSTM nodes.

Next this data is fetched into Fully Connected layerFully Connected Layer :For fully connected layer, number of input features = number of hidden units in LSTM.

Output Size = 1 because we only binary outcome (1/0; Positive/Negative)Note that before putting the lstm output into fc layer it has to be flattened out.

fc = nn.

Linear(in_features=512, out_features=1)fc_out = fc(lstm_out.


view(-1, 512))print ('FC layer output shape', fc_out.

shape)print ('FC layer output ', fc_out)Output from Fully Connected LayerSigmoid Activation Layer :This is needed just to turn all output value from fully connected layer into a value between 0 and 1sigm = nn.

Sigmoid()sigm_out = sigm(fc_out)print ('Sigmoid layer output shape', sigm_out.

shape)print ('Sigmoid layer output ', sigm_out)Output from Sigmoid Activation LayerFinal Output :This includes two steps: First, to reshape the output such that rows = batch sizebatch_size = x.

shape[0]out = sigm_out.

view(batch_size, -1)print ('Output layer output shape', out.

shape)print ('Output layer output ', out)Sigmoid Activation Layer output reshapedSecond, as we see in the Network Architecture — we only want the output after the last sequence (after the last timestep)print ('Final sentiment prediction, ', out[:,-1])Final output from the modelThese outputs are from an untrained network and hence the values might not indicate anything yet.

This was just for the sake of illustration and we will use this knowledge to define the model correctly.

Closing Remarks:I hope you have had as much fun reading it as I had writing in this pieceTry to replicate it the process for any other deep learning model you are trying to implementFeel free to write your thoughts / suggestions / feedback.

. More details

Leave a Reply