Let’s first take a look at my workflow:As you might know, there is no available dataset with all Medium articles tagged ‘Data Science’.
Thus, I had to create that dataset myself by scraping Medium’s article archive using Selenium and Python.
To determine what makes an article successful, I decided to collect the following information from each article:First, I obviously needed the number of claps to determine the success of the article.
Second, I scraped the title, date of publication, and how long it’ll take to read the article since these, along with the title picture, are the first things people see when scrolling through articles.
Finally, I collected the article text itself and the tags for each article:Ultimately, I collected 736 articles from the 19th of December 2018 until the 3rd of January 2019.
This part proved to be a bit of a challenge since the ‘Latest Articles’ page on Medium loads new articles by scrolling down all the way instead of clicking on something like a ‘Next Page’ button.
Having collected the data I needed in a CSV, I imported it into Python and converted it into a DataFrame using Pandas.
As with most unprocessed datasets, it required quite a bit of data cleaning.
This included extracting more information from my data by, for instance, storing the number of characters of the title as a separate column.
Another interesting aspect of data cleaning was that the publication data is stored in ISO 8601 format (including information up to microseconds), even though you can only see the day, month, and year on the website.
As Medium also lists comments on the ‘Latest Articles’ page for articles tagged ‘Data Science’, I removed those and converted all text to lowercase to make sure I wouldn’t count certain words twice during analysis.
Exploratory Data Analysis (EDA)Enough with the technicalities.
Let’s take a look at the data.
Since we’re measuring success in terms of claps, it’d be interesting to know what the distribution of claps looks like.
Unfortunately, it seems like the vast majority of data science articles don’t receive any claps at all.
The distribution of claps is heavily right-skewed with a mean of 58.
If your article received 49 claps, you already received more claps than 75% of all data science articles.
This suggests that there are a few outliers that receive far more claps than all other articles.
What about the length of the articles?.After all, the minutes it will take you to read the article is one of the first things potential readers of your article will glance over.
This distribution still looks a little right-skewed, however, it seems to be centered at around 5 minutes.
As one would expect, most articles are rather short with the exception of a few very long articles.
The longest data science article in this dataset will take readers around 26 minutes to read.
Intuitively, one would assume that articles as long as that deter potential readers from clicking on the article.
Another aspect that could be of importance concerns the number and type of tags.
For each article on Medium, you can select up to five tags.
Naturally, one would assume that all authors make full use of that and select five tags to maximize the visibility of their article but that’s not the case:For whatever reason, 51% of the authors in this dataset used less than five tags.
This certainly seems like an aspect many authors could easily work on to increase the visibility of their articles.
The title of an article is extremely important.
Therefore, before diving into the secret sauce of how to write a successful data science article, it’s worth taking a look at the length of the title and the most commonly used words in the titles.
This distribution is centered at a mean of approximately 47 characters (including spaces).
Commonly used words, ignoring fillers such as ‘and’, include ‘data science’, ‘machine learning’, and ‘python’.
How to Write a Successful Data Science ArticleNow comes the part you’ve been waiting for.
How do you write a successful data science article?.What are the characteristics I need to pay attention to?Don’t worry, your questions will be answered now.
To define success, I added a percentile column to my DataFrame and created a separate DataFrame that only contained articles that received more claps than 80% of all articles and retrieved their characteristics.
Let’s go through each aspect step by step:What Title Should I Use?Your title should have a length of 47–48 characters including spaces (mean of successful article titles).
What words should you use?.Take a look at the 5 most commonly used words in successful articles:Using ‘Data Science’ and ‘Machine Learning’ seems to be a good idea.
I decided not to remove ‘with’ as it might suggest that specifying what programming language you’re using also impacts success.
How Many and What Tags Should I Use?Unsurprisingly, the answer is 5!.As opposed to the 51% of all articles, only 19% of successful articles use less than 5 tags.
Similar to the title, let’s examine what specific tags successful articles most often use:Obviously, you shouldn’t just tag your article untruthfully.
This suggests that if your article deals with these five topics, it’s more likely to receive more claps.
How Long Should My Article Be?The mean time it takes to read the article is slightly longer for successful articles than for all articles with a mean of ~6.
5 minutes suggesting that aiming for 6–7 minutes is a good idea.
On average, successful articles have a total text length (again, including spaces) of around 6750 characters.
When Should I Post?Don’t forget to include the publication time into your calculations!.As all the timestamps I collected are in UTC (Coordinated Universal Time), you might have to make some adjustments based on where you live.
As evident from this seaborn distplot, most successful articles were published around 3 pm UTC.
As a final note, having an attractive title picture also plays a large role in getting readers to click on your post.
Finding some on free stock image websites or creating them yourself shouldn’t be too hard.
ConclusionIf you follow all of these instructions, then your article, too, will have the characteristics of a successful data science article on Medium and will look something like the article you just read.
As always, if you have any feedback or found mistakes, please don’t hesitate to reach out to me.