Essentially, articles are written by large newspaper organizations often reflect the authors’ inherit point of view, especially because news organizations tend to prefer writers from a certain area on the political spectrum.
The current American President, Donald J.
Trump, consistently creates different viewpoints: conservative sources are more likely to report Trump’s actions in a favorable manner while liberalmedia outlets tend to portray Trump’s actions in a negative light.
So it’s obvious that media outlets are very divisive and biased on key points — but what if we could directly identify the biases in any news article?Simplified representation of the American political spectrum: liberal sources tend to report more negatively about Donald Trump, while conservative sources tend to report more positivelyData.
There’s a ton of data that represents the bias in media — I found that the ProQuest Newspaper Database had thousands of articles from many news sources about Donald Trump.
I also realized that I could reasonably determine the general political leaning of a large-enough news organization using Media Bias/Fact Check, the largest website that identifies political biases of major news sources.
With such a large amount of data, there’s one great way to see if I could analyze the articles for those biases — machine learning.
By utilizing neural networks and acting as an artificial brain, machines are able to find patterns in a big dataset with minimal human involvement (which is fantastic when there are millions of data points!).
Machine learning has recently seen a huge increase because of a rise in both available data and computational power.
Researchers have also been working to make even more complex neural networks with more and more layers (deep learning), which allows them to solve even harder problems.
Machine learning itself has a bunch of applications in almost every field imaginable; recent advances in machine learning include self-driving cars, language translation, and facial recognition.
I previously used machine learning to predict cryptocurrency prices ( link) and generate shoe designs ( link).
This time, I’ll be using it to identify the news origin of articles and predict the bias in an article.
Now to hone in on a more specific neural network architecture…A simple feedforward neural networkLSTM Cells.
One particular neural network that is a really revolutionary way to find patterns is the Long Short-Term Memory (LSTM) Recurrent Neural Network (RNN) from this paper, which is composed of multiple individual LSTM cells.
But how does it work?.It basically works by using special gates to allow each LSTM layer to take information from both previous layers and the current layer.
The data runs through multiple gates (e.
forget gate, input gate, etc.
) and various activation functions (e.
the tanh function) and is passed throughout the LSTM cells.
The main advantage of this is that it allows each LSTM cell to remember patterns for a certain amount of time; they essentially can “remember” important information and “forget” irrelevant information.
This is super useful when analyzing text (e.
the word “and” is not as relevant to political bias as a loaded word like “freedom”).
Now that I have a plan for both the data and the model, I can finally get started.
An LSTM cell and its internal components — cell state c.
input gate i, forget gate g, output gate o, and external input gate gData Collection.
With my plan ready to go, I got started on the real work.
I went back on the ProQuest Newspaper Database and downloaded 5,000 articles per news source for four news sources, totaling 20,000 articles.
These news sources are New York Times, CNN, Wall Street Journal, and New York Post (listed from most liberal to most conservative, as determined by Media Bias/Fact Check).
Lucky for me, the articles download into one file per source, so I simply labeled the four collections according to their news source (0 through 3).
It can’t be that easy right?.That is absolutely right!.The problem here is that my neural network takes in numbers, not words…Word Embeddings.
In order to feed in the words to my model, I have to convert each word in each article into a vector representation using a word embedding method.
For this project, I used the GloVe method from this paper, which converts words into a 300-dimensional vector such that similar words are close to each other in that 300-dimension space.
But how can you possibly represent a word with numbers?.Well, GloVe represents a word with 300 certain characteristics.
Think of it like features of the word (e.
when using GloVe, the vector representations of (man minus woman) equals (king minus queen) because in both cases, almost all features are the same for the two except gender).
Now that the words are represented numerically, I can proceed to some final tweaks in my data.
Words are embedded with “features” where similar words are close to each otherData Preprocessing.
The articles are split into sections based on paragraph breaks in the original text.
Why did you do that tho?.It’s hard for a model to analyze an entire article at once (RAM issues ouch), so a better way is to look through each sentence and average the results from each sentence to get a prediction for the article.
I took 1,000 articles from each source to use as my test set (20% split), and the remaining data were shuffled into the training set, where 5% of that training set was used as a validation set.
Finally, I can get ready to start coding my deep learning model!Deep Learning Model.
I focused on using a Long Short-Term Memory Recurrent Neural Network to allow the neural network to identify important information in the data and predict the origin of an article based on the attributes it finds.
I also decided to add in some dropout layers from this paper to make sure my model wasn’t fitting too much to the training data (even though that sounds like mission accomplished, it actually makes the model less accurate overall).
I used Keras with a Tensorflow backend in Python 3.
6 to create my model.
The layers are input layer, LSTM layer, dropout layer, LSTM layer, dropout layer, dense layer, and softmax output.
Deep recurrent neural network architecture for textual origin classificationTraining.
I trained my model for 10 epochs.
Why 10 epochs?.This was determined by Early Stopping, where the model automatically stops training when the validation accuracy stops increasing — it just happened to be around the number.
I used some pretty standard hyperparameters: a batch size of 1024, categorical cross-entropy loss function, AdamOptimizer, and ReLu/softmax activation function.
Progress of the model’s accuracy throughout trainingResults.
Now for the juicy part!.I found that there are two things I wanted to test my model’s capability of doing: predicting the source of an article and predicting the political bias of an article.
I measured the capability of news source prediction through the F1 score and the binary accuracy; I analyzed the capability of political bias detection through n-grams and a t-SNE visualization.
The following paragraphs detail each analysis of the model.
F1 Score (Accuracy).
The model got an F1 Score (i.
accuracy for this type of problem) of 77.
How do I know the model didn’t just get lucky?.I did a statistical significance test of this number and got a p-value of 0.
Essentially there is a 0% chance that my model just happened to predict that accurately by luck.
Accept that the model is good.
I also tested the model’s binary accuracy for each class using the area under the Receiver Operating Characteristic (ROC) curve; this gave me above 94% binary accuracy for all classes.
What does that mean?.It basically means the model’s pretty good at distinguishing between the origins of articles.
Area under the ROC curve for each classN-Grams.
First, I conducted an n-gram test — essentially, I took all possible sequences of 7 words from the test set such that one of the words must be “Trump” (this gives me a sentence that is likely biased) and fed those phrases back into the model.
I selected the highest-confidence phrases, which can be thought of as a way to let me see which phrases the model associated with each news source.
The model was able to associate liberal sources with negatively worded phrases about Donald Trump (which is generally how liberal sources would be expected to report about Trump) and conservative sources with positively worded phrases about Donald Trump (which is generally how conservative sources would be expected to report about Trump).
Phrases of length 7 (one of the words must be “Trump”) that the model associates with each classt-SNE Visualization.
This thing sounds complicated, what is it?.Let’s back up a bit — my model’s outputs are 4-dimensional since there are four classes.
I can’t plot that, so I gather the confidences from the 4-dimensional output layer and resize it using the t-distributed stochastic neighbor embedding (t-SNE) visualization method.
What this means is that the 4-D predictions are converted into 2-D points.
I plot these points and color them based on their true news-source origin (blue for liberal sources and red for conservative sources).
As a general statement, this method lets me plot the model’s predictions and see how they are distributed.
I found that the model was able to group liberal sources together and conservative sources together.
This grouping is significant enough that you can actually draw a pretty clear dividing line between liberal and conservative sources.
Wow!.It’s kind of like the model has its own political spectrum for articles!. More details