At what age will you win the Nobel Prize? Let’s Visualize using R.

Let’s Visualize using R.

Let us explore what’s hidden for you in all the numbers around you.

We’ll visualize the Nobel-laureates data-set and answer all the curiosities using R.

Satyam Singh ChauhanBlockedUnblockFollowFollowingMay 5Nobel Prize, any of the prizes (five in number until 1969, when a sixth was added) that are awarded annually from a fund bequeathed for that purpose by the Swedish inventor and industrialist Alfred Nobel.

The Nobel Prizes are widely regarded as the most prestigious awards given for intellectual achievement in the world.

— britannica.

comLet’s DiscoverDataset used can be downloaded from kaggle.

We’ll try to visualize how the age (while receiving the Nobel prize) changed over years.

What we can observe from this is that the average age was almost constant till the year 1950 after that the age of laureates while receiving the prize constantly increased by almost 8 years.

We’ve used the library ‘tidyverse’ to perform all the manipulation.

tidyverse is very useful library which loads several packages such as ‘dplyr’, ‘ggplot2’, ‘dbplyr’, ‘lubridate’, ‘readxl’, ‘readr’, ‘tidyr’ and many several other important packages.

tidyverse — The ‘tidyverse’ is a set of packages that work in harmony because they share common data representations and ‘API’ design.

This package is designed to make it easy to install and load multiple ‘tidyverse’ packages in a single step.

csv’ reads the data from csv file and store as a data frame into a variable, here named as ‘nobel’.

Function ‘mutate’ adds new variables and preserves existing variables already in the data frame.

Here, mutate adds a new column in the data frame named as ‘age’.

age is calculated using another function ‘as.

Date’ which converts between character representations and objects of class “Date” representing calendar dates.

finally, ‘ggplot’ is used to plot a scatter plot.

Scatter plot is plotted using the variable ‘year’ on x-axis while ‘age’ on y-axis.

‘geom_point’ is used to plot points (dots), we can specify the size of each point as well.

Here, the size is specified as ‘3’ and alpha is set to ‘0.

5’ which helps to define the opacity (transparency) of points which in turn helps to identify the overlapping of points.

csv("nobel.

csv")nobel_age <- nobel %>% mutate(age = year – year(as.

Date(birth_date)))ggplot(nobel_age,aes(x=year,y=age)) + geom_point(size = 3, alpha = 0.

5,color = "black") + geom_smooth(aes(fill = "red")) + xlab("Year of Nobel Prize Distribution") + ylab("Age while Receiving the Prize")Let’s ExploreOne problem while seeing the above plot was that, it was reflecting very less information.

In data visualization the popping out of data using colors or shapes plays the similar role to that of sugar in a sugar candy.

Here from the below graphs we observe that the three categories ‘Chemistry’, ‘Medicine’, ‘Physics’ observe some similar trend, the average age in all these categories constantly increased over the years.

The biggest deviation can be seen in category ‘Physics’.

The only category which showed the decrease in average age over years was category ‘Peace’.

While, category ‘Economics’ and ‘Literature’ stayed almost the same over the years.

One more important observation we can make out of in the graphs is, in the category ‘Peace’ the points are more distributed which shows that there are many Laureates receiving the prize very away from the average age.

So what might be your hypothetical age if you received the prestigious award Nobel Prize in several categories,Chemistry, Age= 70 | Most probable: (Age > 70)Economics, Age = 69 | Most probable: (67<Age<71)Literature, Age = 70 | Most probable: (67<Age<75)Medicine, Age = 68 | Most probable: (Age > 52)Peace, Age = 55| Most probable: (Age < 65)Physics, Age = 69| Most probable: (Age > 66)Let’s understand the code that helped us to derive the above result.

we load the same library that we loaded earlier in the article.

facet_wrap — wraps a 1d sequence of panels into 2d.

This is generally a better use of screen space than ‘facet_grid()’ because most displays are roughly rectangular.

ggplotly — Converts ‘ggplot2' To ‘plotly’, This function converts a ‘ggplot2::ggplot()’ object to a ‘plotly’ object.

So what does the following code actually do?A Plot is generated using ggplot function in which the mutated data frame ‘nobel_age’ is considered as data, x-axis and y-axis is specified as ‘year’ and ‘age’ respectively.

geom_point is used to specify that scatter plot is plotted, while geom_smooth is used which aids the eye in seeing patterns in the presence of overplotting under which color and fill is defined by a variable ‘category’ this helps to plot a graph with different color for different categories.

For example, if we have to plot a graph for a chocolate store and consider there are three categories ‘milk’, ‘white’, ‘dark’, then the above method will plot all the points of different categories with different color to help us differentiate the points in a better way.

‘facet_wrap’ here, plots each subplots with different category to obtain six different subplots of divided on the basis of categories.