My First Adventures in NLP

It’s not like they’re going to get this message soon enough to do anything about it.

”Often we’d end up waiting to report back to the doctor until the follow-up appointment.

But it is hard to remember how difficult each day, or week, of trying a new medication was when you’re on the spot.

Especially when it was 8 weeks into using a new drug.

This creates a slow, inconsistent and labor intensive medication-experience feedback cycle.

These breakdowns & bottlenecks in doctor — patient communication slow down the treatment process.

If doctors can find the correct medications immediately then patients care outcomes improve.

Correct drug fit means fewer followup visits, which helps to reduce the cost of patient care.

A problem with real costs & lives attached.

So what if I could create a system to allow health care providers to identify effective / ineffective drugs more quickly?An idea: medication experience feedback analysisQuantified reviews create clean simple data.

Data that can visualized, analyzed, monitored, and compared.

By gathering daily patient drug-feedback and performing sentiment analysis one could:Examine the patient’s experience with a medication over the trial period to track how long the drug takes to begin working.

(Or whether it works at all)Identify when patients are struggling and notify the physician to provide support.

Evaluate drugs over time and across various patient types to monitor non-clinical performance.

Provide doctors with a chart level view of the patient’s ongoing experience during follow-up visits.

But this system’s success would hinge on one key component.

A model that can consistently and accurately predict the sentiment of a patient’s drug review.

So that is what I set out to see if I could build.

Into the UnknownA bit like wandering down an unfamiliar road.

Walking into this project I knew very little about NLP and sentiment analysis other than what it can do conceptually.

But I needed some quality data to start with.

The internet delivered some terrific data.

The “Drug Review (Drugs.

com) Dataset” from the UCI Machine Learning Repository contributed by Felix Gräßer & Surya Kallumadi.

“The dataset provides patient reviews on specific drugs along with related conditions and a 10 star patient rating reflecting overall patient satisfaction.

The data was obtained by crawling online pharmaceutical review sites.


com)”An excellent dataset that contains the text reviews, a rating, as well as category fields for condition and medication.

I started by importing & exploring the dataset.

Once satisfied that I grasped the structure of the data I began looking for guidance on sentiment analysis.

Attempt #1 — A shot in the darkI googled, read many, and selected a simple starting tutorial to launch my NLP journey.

My first attempt at NLP began with taking a bit of code from the “Sentiment Analysis with Python” tutorial and applying it to my dataset.

(See my notebook here) I quickly learned that adapting even the best written code to a new dataset can be quite the challenge.

The RegEx provided for cleaning the data was insufficient, so I wrote and tested my own.

After normalizing and vectorizing the review text I binned the review-ratings from 11 categories into three.

“Positive”, “Negative”, and “Neutral”.

Then using scikit-learn’s Logistic Regression I trained a model and checked the accuracy of the model’s performance.

After a small amount of tuning, this model produced a final accuracy of 81.


At this point I was uncertain how to measure the performance of this model and unsure whether accuracy alone was a valid metric.

But as I inspected the weights for each of the top and bottom 10 features (words) I realized that there were some discrepancies that I was unable to explain.


These negative weighted words seem pretty positive…I’m not the type to give up.

So, I needed to learn more and sought out after a more in depth tutorial.

Attempt #2 — Deconstructing the Black BoxAfter some questioning with one of my friends he recommended a post from Insight AI’s Emmanuel Ameisen.

Using that post as a guide and Leonardo Apolonio’s “How-to-solve-NLP” code as a framework I set off on my next journey in NLP with my freshly inspected dataset.

(My notebook is here.

)I first reviewed the original paper that was published using the dataset.

I’ve included notes about the methodology for data collection, preprocessing, tokenizing, and modeling used in that study.

As I worked through the tutorial I followed a simple process of “stop and learn” anytime there was ML vocabulary I didn’t understand.

I learned about embedding, Word2Vec, Bag-of-Words, corpus, tokenizing, normalizing, non-ASCII characters, stemming, lemmatizing, and vectorizing in data preparation.

This process is time consuming but effective.

My mindset: Make the black box work, then tear apart the black boxMy goal with each of these tutorials is simple: make it work, then gain understanding.

This process works well for people like me who could be described as “mechanically inclined”.

It allows us to build an understanding of the entire system’s structure first, then take a deeper dive into knowledge of each part.

I find that this method helps me to retain what I learn.

My brain locks in the explanation of the details because they are perceived as relevant and essential parts of the whole.

(Here’s a short presentation I did on this method.

)“Take chances, make mistakes, GET MESSY!” — Ms.

 FrizzleThe danger of this method is when the working internals are not explored.

If the mechanisms of the model are treated like a black box forever the user misses the opportunity to learn.

But did it actually work?It took a lot of work to get it working cleanly with my dataset but work it did.

The tutorial clearly illustrated new-to-me concepts within a framework of evaluating model performance including:Evaluating the effectiveness of various embeddings through projected visualization.

Utilizing precision, recall, f1, and accuracy for evaluating Logistic Regression performance.

Use of a confusion matrix for model performance visualization.

Visualizing feature inspection through the use of an Important Words plot.

Using Lime, a blackbox explainer, for word2vec to retrieve and inspect features.

The results of all this work?The highest performing embedding + logreg model was my Bag-of-Words model at 78% accuracy.

The CNN ended up coming in at ~75% accurate leaving plenty of room for improvement.

But the resulting feature weighting & importance improved dramatically.

With these small successes under my belt I was still hungry to find and build an even better fitting model.

And in the third attempt I was able to do just that.

Attempt #3 — A journey with the end in mind.

This attempt was different as I had been reading and learning about LSTM’s while working through my previous attempts and I had resolved to give it a go on this same dataset with some fresh LSTM code.

(See my notebook here)Preparing data with my new bag of tricksPreprocessing data was fun this time after all the struggles in my first two attempts.

This time I inspected my NaN data using MissingNo.

I think this visualization package is super clever.

Look for NaN’s with MissingNoThen I binned my values as I had done before and inspected the distribution in a plot.

Bin Test & Training Data by RatingI removed outlier-length sentences from the dataset.

Outlier length sentence removalLastly I applied my RegEx, stemming, and stop word removal (including my custom made drug name stop word corpus) to further clean and reduce the feature size while maintaining the relevant words.

The end result is a substantially smaller feature set that is easier to effectively process.

Word counts across all the reviews in the dataset pre/post stemming and stop word removal.

Hello LSTMI’ll spare you the details of how I worked through all the different components of the LSTM and evaluated various parameters through time-consuming trial and error.

But the end result of all that hard work was a model that performed better than any of my previous models.

My simple LSTM model ended up achieving 88.

4% accuracy.

Even more impressive was how well it performed at predicting positive sentiment with a 90% accuracy rate.

Most surprising of all… this attempt was fun.

Much of the drudgery of learning the preprocessing had already taken place and the model performed so much better than I expected.

The AftermathAll things considered this journey into understanding how sentiment analysis has deepened my understanding of NLP.

Most of all, it has helped me to see how much more I have to learn and I look forward to returning to this dataset in the future.

Is it possible to predict the sentiment of a review effectively?Yes.


Is my model sufficient to do so in a meaningful way?It would certainly be better than nothing but it is far from perfect.

Done > Perfect.

And this blog post is done.

Note for readers: As of the time of publishing this post my wife has since found some relief from her depression thanks to gene-drug testing.

If you or a loved one are suffering with depression and are frustrated with the effectiveness of your medication I would highly recommend it.

It should be part of every depression diagnosis.


. More details

Leave a Reply