Let’s define the transition probabilities between two sentences as equal to the cosine similarity between the two sentences.
We’ll then find the stationary probability distribution for our Markov Chain.
The sentences with the highest stationary probability are the nodes that are most well connected within our graph.
In the example below, Node A would likely have the highest stationary probability.
Highly connected nodes will have a high stationary probability.
These nodes should represent summaries of a key topic because these nodes are most related to many other sentences.
But, before we get too far ahead of ourselves, we need to define cosine similarity.
Suppose we have the two sentences — “Jack and Jill went up the hill” and “Jill and Jack run down the hill”.
Cosine similarity considers these sentences as vectors of words and measures their overlap using the formula below.
Cosine similarity calculates the dot product of two word vectors and divides this by the product of each vector’s magnitude.
And now we are all set.
We’re going to represent our graph as a matrix.
The value at index (X, Y) will be the cosine similarity between Sentence X and Sentence Y.
This value is the transition probability between Sentence X and Sentence Y.
We’ll use these transition probabilities to find the stationary probability of each node.
Finding stationary probabilities is relatively straight forward in a Markov Chain.
We can repeatedly multiply the transition probability matrix by itself until we reach a steady state — when all the transition probabilities each converge to a single value.
A more efficient solution is to use left eigenvectors.
For those interested in this method, see here.
Now that we have a steady-state, we can look for the highest probabilities.
The sentences with the highest steady state probability are below.
"I'm Gatsby," he said suddenly.
——————————————————————–"You two start on home, Daisy," said Tom.
" ——————————————————————–"I told you I went there," said Gatsby.
——————————————————————–"I want you and Daisy to come over to my house," he said, "I'd like to show her around.
" ——————————————————————–She had told him that she loved him, and Tom Buchanan saw.
He was astounded.
His mouth opened a little and he looked at Gatsby and then back at Daisy as if he had just recognized her as some one he knew a long time ago.
Now, here comes the most fun part of data science — making conclusions that our data does not support.
Let’s evaluate our summary below.
In our last sentence, Daisy tells Gatsby she loves him and Tom Buchanan, her husband, sees.
This sentence captures the complex relationship between Gatsby, Daisy, and Tom.
In our fourth sentence, we see Gatsby wanting to show Daisy around his house.
He’s convinced Daisy will want to be with him if she sees that he is now rich and successful.
This captures Gatsby’s struggle to obscure his past with his current success, a central theme in the novel.
And our first sentence captures the iconic moment of Gatsby introducing himself.
Our model has done it!.We have summarized The Great Gatsby!Credit: The Great Gatsby, Warner Brothers, 2013There’s an easy way to get from our analysis to the above paragraph.
It only requires one hop, one skip, and one jump.
Our data in no way implies the above.
Our methods were strong and our analysis was thoughtful.
But I brought in a lot of outside knowledge to get to the conclusion above.
We highlight this point not to shortchange this approach but to recognize the limitations of our method.
We can reasonably conclude that Gatsby, Daisy, and Tom are relevant characters and that there is some relationship between Gatsby and Daisy.
We have certainly found some key ideas but we are far from a well-formed and thorough summary.
Looking ForwardThere are certainly some things we can do to improve our method, primarily surrounding the determination of sentence similarity.
We can use TF*IDF to see which words are most relevant in a sentence and weight them accordingly.
When measuring cosine similarity, we do not need to only consider strict equality.
We can consider words that are similar in meaning but not similar in spelling (e.
happy and elated).
If we want to get more intense, we can use advanced topic models like Latent Dirichlet Allocation (LDA).
Automated summarization is broken down into two main areas — extractive methods and abstractive methods.
Everything we have talked about here is an extractive method.
We are trying to pull relevant information from the text itself.
But no human being writes an extractive summary like that.
Humans take concepts, generalize them, consider patterns, and produce a result.
This is an abstractive method.
And for that, we’ll need the three buzziest words in computer science: a deep neural network.
We’re going to use that approach soon so keep an eye out for that article.
CodeFor those interested, all the code required to run this can be found in the Github repo here.
There is Python code and a Jupyter Notebook.
Cleaning the data and calculating the adjacency matrix does take a little bit of time.
With Jupyter Notebook, you only have to run these methods once.
The method definition and code structure parallel this article so it’ll be easy to follow along.
Thanks for reading!Questions?.Comments?.Email me at andrew.
I’d love to hear from you!.