Fit to Print: Finding the medium in the message

This tool can take a new article, find its best stylistic match and highlight some changes that might bring the article more in line with the writing style of a desired publication..I took 5000 articles from these publications, and compared some writing style features that I thought would reveal some differences like the number of commas and number of words per sentence..Like other tree models, XGBoost is prone to overfitting, so I used a relatively low learning rate, and made sure that the accuracy on the training set was comparable to that on the test set when I was adding layers to the tree.Ultimately, the model trained using the 31 writing style features was able to correctly predict the publication of origin at least 30% of the time and as much as ~70% in the case of the New York Times, as can be seen in this confusion matrix..At worst, the model is still three times more accurate than random guessing, which seems like an encouraging result when trying to detect something as vague as an institution’s writing style.Confusion matrix showing how often the model predicts the right publication.Final Product — Fit to PrintFit to Print, the final product, is a web app where the user enters an article and a desired publication as input..As output, the user gets recommendations about the stylistic best fit for the article, as well as information about how the writing style features in the article stack up against the mean values of those features in other publications..Scrolling down, the user sees a comparison between the writing style features in their article and those in the desired publication (where all the publication features are normalized to 1)..Vox uses these words less frequently, indicating that they incorporate fewer interviews into their articles, focusing on analysis of the news, rather than its reporting.The user then sees a comparison between their article and all the publications across the top 6 most informative features in the model..Qualitatively, the two most similar publications based on these features are the New York Times and The Atlantic, which is consistent with the predictions of the model..Despite knowing nothing about the content of the article, Fit to Print was able to highlight some anomalous, but justifiable choices that the writer made, as well as meaningful writing style differences between two major publications.Comparison of the most informative features for writing style across all the publications in the dataset..Narrowing down the dataset to articles that are similar to the user’s article using a topic modelling or word2vec approaches might also gives the user a more relevant set of articles against which to compare their writing style.TakeawayFit to Print is essentially a first pass editor that uses features of writing style to reveal the differences between news publications.. More details

Leave a Reply