Here is a snapshot of the CPR data we are dealing with:We can see a few things that would be of interest.
First, the ‘Observation’ column helps us identify what type of plastic was found, allowing us to add a ‘type’ to the data set (similar to how the scientists in the Nature article conducted their analysis), which may help for filtering.
There is also a ‘Year of tow’ column, helping us understand the timeline.
There are co-ordinates for the start and end of the tow, helping us visualize where the plastic was found, and finally there are names for different maritime regions, which may also be helpful for filtering.
Therefore, it might be useful to include the following in our app:The ability to filter by plastic type and maritime region for all analysesThe ability to filter by year period for some analysesThe ability to see some basic statistics on incidences of plastics in CPRs (potentially guided by the analysis done in the Nature article).
The ability to visualize where and when these instances occurred on maps.
Therefore, we are going to design our dashboard as follows:The GLOBAL SIDEBAR will always be visible.
It will give some general use guidelines and context and it will allow filtering by plastic type and by maritime region, so that any results displayed in the app will reflect these filters.
Then there will be three independent pages, each one focusing on a different aspect of the data:STATS will present some descriptive statistics on the incidents, with some further filtering optionsLOCATIONS will present the locations of incidents on a map of some formTIME will show in some form how incidents occurred over timePreparing the data for use in the appWith our design in mind, we now need to take the dataset given to us by the researchers and adapt it to be used in our app.
There are two steps to this.
First, we need to add any new columns necessary for a particular analysis we have in mind in our design.
As I mentioned above, this dataset is already in pretty good shape, but we do need to parse the Observation column to determine a ‘type’ for each incident, and we could do with cleaning up the column names to make them simpler to work with, as they will look pretty messy when loaded directly into R.
So let’s start up a project in RStudio and call it cpr_data, and let’s create a subfolder in that project called called data, and in this we can place the original xlsx file the researchers provided us.
We can write a simple R script to add the new ‘type’ column and to tidy up the column names — let’s call that prep_data.
We can load the dataset in by either opening it in Excel, deleting the first row, resaving as a CSV file and use read.
Or, like I do below, you can read it into R directly using the openxlsx package.
# prep data for use in CPR app# load librarieslibrary(dplyr)library(openxlsx)# prep data for use in CPR app# load librarieslibrary(dplyr)library(openxlsx)# load original data file and make colnames easier to code withdata <- openxlsx::read.
xlsx", sheet = 1, startRow = 2)colnames(data) <- gsub("[.
]", "", colnames(data)) %>% tolower()colnames(data)[grepl("region", colnames(data))] <- "region"# create columns to classify by key termdata <- data %>% dplyr::mutate( type = dplyr::case_when( grepl("net", observation, ignore.
case = TRUE) ~ "Netting", grepl("line|twine|fishing", observation, ignore.
case = TRUE) ~ "Line", grepl("rope", observation, ignore.
case = TRUE) ~ "Rope", grepl("bag|plastic", observation, ignore.
case = TRUE) ~ "Bag", grepl("monofilament", observation, ignore.
case = TRUE) ~ "Monofilament", grepl("string|cord|tape|binding|fibre", observation, ignore.
case = TRUE) ~ "String", 1L == 1L ~ "Unclassified" ))# save as an RDS filesaveRDS(data, "data/data.
RDS")In this script we use grepl() to identify terms in the text strings in the Observation column, and then we use dplyr::case_when() to assign these terms to a type.
In the event that no terms match, we define a type ~ "Unclassified" .
We also change column names into simple lower case strings that are easy to code.
Second, we need to re-save this transformed data.
Later, when we have written our app and we deploy it so that others can access it, this data file will be bundled with it, and the app will read it into its environment.
If this dataset was very large, we would need to think about the fastest file format that R could read from.
But in this case the dataset is small, so we can choose any file format to save it as.
In this case we will keep things simple and save the dataset as an R object into an RDS file.
Next time…So we have got our basic design planned, and we have the data set up in the right way.
In the next part of this series, I’ll go through how to get the simple outline of the dashboard up and running.
I’ll also discuss how to handle inputs and reactive variables and how to build some basic descriptive plots in ggplot2that respond to user input.
ExercisesHere are some follow-up exercises which you can use to test how well you have absorbed the content of this article:What is R Shiny and why might it be useful for this dataset?How might you go about designing this dashboard if the user base was going to be large and varied?How does a local dataset get used in an R Shiny app?.What are the key things to think about when you create a dataset to be used in an R Shiny app?Read the Nature article that gave rise to this dataset.
What other ways might you design this dashboard having read this article?Originally I was a Pure Mathematician, then I became a Psychometrician and a Data Scientist.
I am passionate about applying the rigor of all those disciplines to complex people questions.
I’m also a coding geek and a massive fan of Japanese RPGs.
Find me on LinkedIn or on Twitter.