A bitesize explanationPaul MayBlockedUnblockFollowFollowingApr 23NLP keeps being bounded about as a magic panacea or a digital tower of babel and with many conflicting ideas.
I’ve even seen data scientists arguing over whether only specific machine learning algorithms count as NLP or if it’s really the approach that makes it NLP.
So here I present a quick explanation of what it can mean.
Photo by Mark Rasmuson on UnsplashSo what is NLP?Natural Language Processing (NLP) has a very interesting history and is itself a very old discipline with some of the earliest examples being hand-written rules before more complicated methods were implemented.
A common question I get asked is “what is NLP?” and I think really the answer depends on what you want it to do.
NLP is generally broken down into different needs and I found a nice list here of seven different uses:Text Classification — e.
Spam FilteringLanguage Modelling — e.
Spell CheckingSpeech Recognition — Breaking down our verbal speech into a textual format that can then be ingested into other algorithmsCaption Generation — Describing the contents of picturesMachine Translation — Converting one language into anotherDocument Summation — Abstract creation for a documentQuestion Answering — Understanding a question and then formulating a human readable answerSurprisingly you can use different models for different parts of these.
For example, traditionally classification problems used SVMs or Decision Trees working with Bag of Words representations of the text (a bag of words at its simplest is a frequency counts of the words appearing in a piece of text and so loses ordering and grammar), but being supplanted slowly by more sophisticated algorithms (e.
Deep learning), but really it depends on the task at hand as for simple problems a bag of words approach and an SVM can do surprisingly well in a much shorted space of time than a sophisticated word embeddings (a vector for each word that contains information on context and association with other words) constructed dataset fed into a specialized neural network.
NLP is not a simple task of just trying to break down and understand the complexities of human languages, but is about understanding enough of how we talk that it can accomplish tasks that can at times give the impression it understands what you say like a human does.
I say this as machines never do actually understand our language in the way we do and a good example is when Facebook tried to get chatbots to communicate with each other but they rapidly altered the English language to a level that people could no longer understand (but they seemed to).
Another problem that NLP algorithms can struggle with are issues such as polysemy (multiple meanings for the same word and 40% of the English language is estimated to be like this), homonym (words that sound the same but are spelt different), and homographs (words that are spelt the same, but are different) leading to models being trained for their specific target areas (though work is being done to attend to this).
Examples of such words or sentences are (many taken from here):For semantic analysis the word “ignition” means different things in the lumber (bad) and boiler industries (good) on engineer reportsFor polysemy the verb “get” can mean to become or understand depending on contextHomonym: “climbed down a ladder” or “bought a down blanket”.
Spelt and pronounced the same but different thingsHomograph: “axes”.
If it’s the plural of the axe that cuts wood or the plural of axis of a graphSo in summary NLP is hard and also covers a huge diverse set of problems that revolve around using human language.
So next time Alexa or Siri get it wrong, just think about what it’s trying to do.
For a lighter note and fun fact NLP is required for any artificial intelligence wanting to pass the Turing Test (as it requires it to produce conversational text that is indistinguishable from human written text).
.. More details