If you haven’t, here’s a great chance of discovering how hard the task is.
I am sure that if you started your machine learning journey with a sentiment analysis problem, you mostly downloaded a dataset with a lot of pre-labelled comments about hotels/movies/songs.
May question is: did you even stopped to read some of them?If you did, you will find out that some of the labels are not exactly the ones you’d give on the first place.
You may discord that some comments are really positive or negative.
And this happens because negative/positive labelling is very subjective.
If you are unable to tell what’s positive or negative in there, your computer will surely perform as bad as you.
That’s why I will insist: labelling data is an art and should be done by someone with a very deep knowledge of the problem that you are trying to solve from a human standpoint.
But you can train yourself to get better at it.
Define clear rulesA good approach to label text is defining clear rules of what should receive which label.
Once you do a list of rules, be consistent.
If you classify profanity as negative, don’t label the other half of the dataset as positive if they contain profanity.
But this won’t always work.
Depending on the problem, even irony can be a problem and a sign of negativity.
So, the second rule of thumb for labelling text is to label the easiest examples first.
The obvious positive/negative examples should be labelled as soon as possible, and the hardest ones should be left to the end, when you have a better comprehension of the problem.
Another possibility is pre-labelling the easiest examples and build a first model only with them.
Them, you can submit the remaining examples to this model and check what’s the ‘opinion’ of the model about the hardest examples.
Test randomnessIf you did all of the above and you are still not sure about the quality of your classification or of you model, you can try to test randomness.
Do the following: get the examples that you are using to create your model and assign random labels to them.
You can do it using Aruana.
When you randomly label your examples, you can check how important are the labels for the predictions.
In other terms, you check that the text has good labels.
If you are unsure about the rightness of the labels (let’s say that you think that the examples received bad labels in first place), you can assign random labels and see how the model performs.
Another possibility is that the model itself is broken.
In order to test if the model is always giving the same predictions despite the examples it’s receiving, we can feed the model with random text.
In this case, instead of changing only the labels, you can also create blobs of text, with no meaning, and see how the model performs.
ExperimentingI used the two theories above to test a model that I was working with.
I was not sure if my model was broken or if the examples I was working with were not good labelled.
Here’s how I conducted the experience: using the same examples, I trained a model three times using the same configuration (but a little of randomness will always exist).
On the first run I tested the model with random labels.
On the second run, I used text blobs and on the third run, I used the correct examples.
It’s important to say that I worked on a balanced dataset.
I loaded the data into a pandas data set with two columns: ‘text’ and ‘sentiment’.
The sentiment column holds the text classification.
First runfrom aruana import Aruanaaruana = Aruana('pt-br')sentiment = aruana.
random_classification(data['text'], classes=[0,1], balanced=True)data['sentiment'] = sentimentThe results:As you can see, general accuracy was of 50%, which means that the model is no better than random guess.
Second runfrom aruana import Aruanaaruana = Aruana('pt-br')texts = aruana.
replace_with_blob(data['text'])data['text'] = textsThe results:Changing the text to random blobs also decreases the performance of the model.
Third runFor the third run, I fed the model with the good texts/labels and the results were the following:See how it improved!.So, the text and the labels are not the vilains here.
ConclusionIf you are working on sentiment analysis problems, be careful about text labelling.
If you have never labelled text in your life, this is a good exercise to do.
If you only rely on clean/processed text to learn, you can face a problem where the problem is not your model, but the information that you are using to train it.