Therefore, Huang et al.
proposed an improvement and named Supervised Word Mover’s Distance (S-WMD).
Introduce to Supervised Word Mover’s Distance (S-WMD)Before word embeddings is introduced, bag-of-words (BoW), Latent Semantic Indexing (LSI) and Latent Semantic Analysis (LSA) are the most promising skill for measuring NLP tasks.
Word Mover’s Distance (WMD) is introduced in 2015.
It leverages word emveddings (word2vec is introduced in 2013).
It uses another approach which is earth mover distance to measure difference between vectors.
One year later, Huang et al.
proposed an improvement which called Supervised Word Mover’s Distance (S-WMD).
The difference t-SNE plots of WMD and S-WMD (Huang et al.
, 2016)In short, WMD algorithm measure the minimum to transport one word vector to another vector in two documents.
If two documents share lots of words, it only needs a small movement to transport between two documents.
In other word, these two documents may classify as similar documents.
Weighting MechanismHow does weight mechanism help on NLP task?.By introducing weight, it helps on solving document classification problem.
Intuitively, pre-trained word vectors should be very good as it is trained on lots of data.
However, there is an known issue that pre-trained vectors may not apply to some problems very good.
For example, pre-trained vectors may put all eatable food together and mixing vegetable and meat together.
What if the classification problem is going to classify whether or not it is vega.
On the other than, two documents share lots of words does not imply both of them describing same topic.
For example, “I go to school to teach student” and “I go to school to learn English”.
It can be talking about life in school but it can also talking about the task in school among different parties.
In other words, it really depends on NLP tasks.
It may or may not related.
kNN test error (Huang et al.
, 2016)About MeI am Data Scientist in Bay Area.
Focusing on state-of-the-art in Data Science, Artificial Intelligence , especially in NLP and platform related.
You can reach me from Medium Blog, LinkedIn or Github.
ReferenceHuang Gao, Guo Chuan, Kusner Matt J.
, Sun Yu, Weinberger Kilian Q.
, Sha Fei.
pdfS-WMD in Matlab (Original)S-WMD in python.