Suicide in the 21st Century (Part 1)

This can again be accomplished in Python/Pandas in three fairly simple steps.

Creating continent arrays and assigning countries to them, according to The United Nations Statistics DivisionMove these to a dictionaryUse the Map function in Pandas to map continents to the countries Note that Step 1 can be skipped and the countries put straight into a dictionary, but moving them to an array first makes it easier in the future, for example if a country was to be added to the dataset.

#create lists of countries per continenteurope = ['Albania', 'Austria', 'Azerbaijan', 'Belarus', 'Belgium', 'Bosnia and Herzegovina', 'Bulgaria', 'Croatia', 'Cyprus', 'Czech Republic', 'Denmark', 'Estonia', 'Finland', 'France', 'Georgia', 'Germany', 'Greece', 'Hungary', 'Iceland', 'Ireland', 'Italy', 'Latvia', 'Lithuania', 'Luxembourg', 'Malta', 'Montenegro', 'Netherlands', 'Norway', 'Poland', 'Portugal', 'Romania', 'Russian Federation', 'San Marino', 'Serbia', 'Slovakia', 'Slovenia', 'Spain', 'Sweden', 'Switzerland', 'Ukraine', 'United Kingdom']asia = ['Armenia', 'Bahrain', 'Israel', 'Japan', 'Kazakhstan', 'Kuwait', 'Kyrgyzstan', 'Macau', 'Maldives', 'Mongolia', 'Oman', 'Philippines', 'Qatar', 'Republic of Korea', 'Singapore', 'Sri Lanka', 'Thailand', 'Turkey', 'Turkmenistan', 'United Arab Emirates', 'Uzbekistan']northamerica = ['Antigua and Barbuda', 'Bahamas', 'Barbados', 'Belize', 'Canada', 'Costa Rica', 'Cuba', 'Dominica', 'El Salvador', 'Grenada', 'Guatemala', 'Jamaica', 'Mexico', 'Nicaragua', 'Panama', 'Puerto Rico', 'Saint Kitts and Nevis', 'Saint Lucia', 'Saint Vincent and Grenadines', 'United States']southamerica = ['Argentina', 'Aruba', 'Brazil', 'Chile', 'Colombia', 'Ecuador', 'Guyana', 'Paraguay', 'Suriname', 'Trinidad and Tobago', 'Uruguay']africa = ['Cabo Verde', 'Mauritius', 'Seychelles', 'South Africa'] australiaoceania = ['Australia', 'Fiji', 'Kiribati', 'New Zealand']#move these to a dictionary of continentscontinents = {country: 'Asia' for country in asia}continents.

update({country: 'Europe' for country in europe})continents.

update({country: 'Africa' for country in africa})continents.

update({country: 'North_America' for country in northamerica})continents.

update({country: 'South_America' for country in southamerica})continents.

update({country: 'Australia_Oceania' for country in australiaoceania})Then we can simply map the continents to our countriesdf['Continent'] = df['Country'].

map(continents)Now that the data has been preprocessed, the data frame has gone from 27820 rows and 12 columns to 2668 rows and 10 columns, and is now ready to be analysed.

Exploratory Data Analysis (EDA)First of all, let’s define a nice colour palette for our plots.

flatui = ["#6cdae7", "#fd3a4a", "#ffaa1d", "#ff23e5", "#34495e", "#2ecc71"]sns.



color_palette())Seaborn Colour PaletteFirstly, we will be showing some basic plots that show interesting data in a graphical form.

By grouping the data by Year and doing a .

sum(), we are able to create a temporary data frame with the total number of suicides by year globally.

Taking this frame and applying Matplotlib code with Seaborn aesthetics allows us to show the rate of global suicide, whilst also plotting an average line across.



title('Total No.

of Suicides per Year: 2000 To 2015', fontsize = 22)plt.

axhline(y=52720, color='black', linestyle='–')plt.


Suicides', fontsize = 20)plt.

xlabel('Year', fontsize = 20)Global suicides (2000–2015)Here we can see that there is a downward trend and the global rate of suicide is falling over the years.

It could be speculated that this is because of increasing awareness, or funding etc.

, but this is something that can be explored deeper later.

Next, we can show the mean number of suicides per 100k population per year, by continent, by using a bar chart in Matplotlib.

A new data frame is created grouping by continent, using .

mean() this time.

This data frame is then represented below:data_per_continent = df.


mean()data_per_continentax = data_per_continent['Suicides/100kPop'].

plot(kind='bar', figsize=(15, 10), fontsize=14)plt.

title('Mean Suicides/Year by Continent', fontsize = 22)ax.

set_xlabel("Continent", fontsize=20)ax.

set_ylabel("Suicides/100k Population", fontsize=20)plt.

show()Suicides by ContinentInterestingly, we can see that South America is the continent with the highest rate of suicide in young men, followed by Europe.

Although useful, it does not show the change over time in the rate of suicide of these continents.

After grouping the data using ‘Continent’ and ‘Year’, and executing the following code, we are able to plot the rate of change of suicides/100k population by continent:dfAgg = dftesting.


mean()by_cont = dfAgg.

groupby('Continent')for name, group in by_cont: plt.

plot(group['Year'], group['Suicides/100kPop'], label=name, linewidth=6.


title('Mean Suicide/100k, Year by Year, per Continent', fontsize = 22)plt.

ylabel('Suicides/100k', fontsize = 20)plt.

xlabel('Year', fontsize = 20)leg = plt.

legend(fontsize = 12)for line in leg.

get_lines(): line.


showSuicide rate by Continent over the yearsAs can be seen, this graph shows the overall downwards trend but also the vicious spikes in continents such as South America and Africa (the latter, likely due to the inconsistencies of the reported data).

Next we wish to find out which countries have the highest suicide rates.

We could also find out the lowest; however this would be skewed due to countries with low incidence of reporting, etc.

(mainly African countries).

In Python, we are able to create a visual plot by creating a data frame grouping the data by the mean SuicideNo of each country, sorting the values by descending and plotting the .

head() of the data frame as a bar plot.

data_suicide_mean = df['Suicides/100kPop'].




sort_values(ascending=False)f,ax = plt.

subplots(1,1,figsize=(15,4))ax = sns.





ylabel('Suicides/100k', fontsize = 20)plt.

xlabel('Country', fontsize = 20)Countries with the highest suicide ratesLithuania shows the highest suicide rate over the years, followed closely by Russia and Kazakhstan, with all three countries having a mean suicide rate of over 50 per 100k population.

It is interesting to note that Lithuania and Kazakhstan both border Russia.

As the 2016 data was removed due to incompleteness, the most recent year we can run analysis on is 2015.

Matplotlib allows the use of scatterplots, giving the ability to plot suicide rates vs.

GDP, plotted as countries.

Again, preparing the data frame is important, such as excluding any non-2015 data and also irrelevant columns.

Grouping by Continent and Country, whilst including suicide rate and GDP.

sum() gives the correct shape of the data frame that is needed.

Plotting suicide rate vs.

GDP for this data frame will scatter the data as Country, showing GDP vs.

suicide rate for every country in the frame.

Furthermore, adding hue=‘Continent’ to the scatterplot parameters shows the data coloured according to the continent that the country resides in.

#plot suicide rate vs gdpplt.


scatterplot(x='GdpPerCapital($)',s=300, y='Suicides/100kPop',data=dfcont, hue='Continent') plt.

title('Suicide Rates: 2015', fontsize= 30)plt.

ylabel('Suicide Rate /100k Population', fontsize = 22)plt.

xlabel('GDP ($)', fontsize = 22)plt.



legend(loc=1, prop={'size': 30})plt.

show()Suicide rates vs GDP, coloured by ContinentInterestingly, there looks to be many countries with very low GDP and also very low suicide rates, which is slightly unexpected.

However, this could be due to poorer countries having a low rate of reported suicide when in fact the number could be much higher.

Still, GDP seems to have an interesting effect on the rate of suicide.

It would also be interesting to see if the general happiness of a country affects its suicide rates amongst young men.

Taking the 2015 world happiness report 10], a list can be created of all the happiness scores for the countries in the data frame; this can then simply be read into a new column ‘HappinessScore’ with the values converted to Float.

For this plot, Countries with a HappinessScore of less than or equal to 5.

5 are removed — this is because many of these countries with low scores have low suicides rates probably due to incomplete data, non-reporting of suicide, or different classifications of suicide.

This data can then be plotted using a scatterplot in Matplotlib/Seaborn to give the following visualization, again using hue=’Continent’ :#plot suicide rates vs happiness scoreplt.


scatterplot(x='HappinessScore',s=300, y='Suicides/100kPop',data=dfcont, hue='Continent') plt.

title('Suicide Rates: 2015', fontsize= 30)plt.

ylabel('Suicide Rate /100k Population', fontsize = 22)plt.

xlabel('HappinessScore', fontsize = 22)plt.



legend(loc=1, prop={'size': 30})plt.

show()Suicide rates vs HappinessScore, coloured by ContinentAgain, it is difficult to tell if there is any real relationship between the suicide rates of a country and its Happiness score; therefore, the relationship will be explored further.

We can do this by applying bivariate analysis, plotting a correlation matrix in Pandas, which computes the pairwise correlation of columns.


corr(method = 'pearson')Pearson correlation matrixIt can be observed that in this data frame, there is a correlation of -0.

175131 between GdpPerCapita($) and Suicides/100kPop using the Pearson method, meaning there a relation between the two but not a strong one, with negative indicating that the correlation relationship is inversely proportional, i.


as one increases, the other decreases.

This can also be visualized as a heatmap using Seaborn, giving a more pleasing view of the correlation matrix.



corr(method = 'pearson'),cmap='YlGnBu',annot=True)Seaborn heatmap matrixThanks for reading!Stay tuned for part 2 which will be out within the next week.

We’ll stick with this dataset and jump into some machine learning.


. More details

Leave a Reply