Making a map with EU data on R: Erasmus exchanges by country

We have a small data set, with figures for exits and entries of students on Erasmus exchanges by country.

Our clean data setMaking a mapLet’s load the “maps” package.

It provides us with coordinates which we can then plot as polygons to create a map.

If we want a fancy color scale which corresponds to our metric, we’re going to want to merge part of the “maps” data set with our own data set.

This is why we need country names to correspond exactly.

As you’ll see if you play around with “ggplot2” and “maps”, if you want to fill other polygons (countries which aren’t part of the Erasmus exchange but will be on the map), you need to assign “NA” values to your fill-metric (this will make sense in a bit).

So we need a list of all regions from the “world” data from “maps”, and we need to create a larger data set with NA values for all non-Erasmus countries.

Let’s create a data set with our “world” map data.

library(maps)world <- map_data("world")Next we want to create a data frame with “NA” values to start with, which we can later replace with our values for the metric.

We want it to have the same number of rows as the number of regions in the “world” data set.

Remember our goal: we want to keep our values for Erasmus countries but we also want “NA” values for other regions in the “world” data set — otherwise these countries will have the same color as the background in our map.

Let’s also create a new column with all region names.

This makes merging on the “region” variable possible.

tojoin <- as.

data.

frame(matrix( nrow = length(table(world$region)), ncol = 4, NA, dimnames = list(names(table(world$region)), colnames(big)) ))tojoin$region <- rownames(tojoin)Let’s now join our Erasmus data with these region names and sort the data set.

“dplyr’s” fulljoin() is good to start with: it will keep all rows from both variables but merge on the identical columns (and we made sure the column names were identical in the data sets).

library(dplyr)all <- full_join(big, tojoin)all <- all[order(all$region), ]There’s an obvious problem here: R created duplicates of all our Erasmus country rows (as specified by our full_join() command).

Two Norways?.One should be enough!We want to make sure we only have the first row for those countries (i.

e.

the ones from our Erasmus data set).

I’m sure there’s a neat command to avoid this during our join, but a loop can also do the trick, given that we know that the row with our values appears before the duplicate:for (i in (1:251)) { if (all$region[i] == all$region[i + 1]) { all <- all[-c(i + 1), ] }}So… what do we have now?.We have all the regions from the “world” map data, and we have figures for Erasmus entries and exits for Erasmus countries.

Norway is unique again, hooray!Last step before we can merge with our cartographic data is re-creating our ratio.

all$rat <- all$host / all$homeGreat!.Finally we can get to the graphics by joining our map with the geographical data, i.

e.

the coordinates which we stored in the “world” data set.

For this we want to use the innerjoin() function, which will match rows and columns respectively.

Given how our data is set up, it’s as simple as this:mapbig <- inner_join(world, all, by = "region")You may be wondering what the data set looks like now… Essentially, it’s like our geography data set (i.

e.

a long list of coordinates — latitude and longitude — with associated region, group and order), but with our extra metrics (home, host and rat) on every Erasmus-country row and missing values for every other region.

Aruba is an OCT but it isn’t part of our Erasmus regions/countriesTime to actually map it out!.Using the “ggplot2” package and its ggplot() function , we can start by creating a map of the world (our base if you want).

The parameters under theme() make the background light cyan (the sea), and make sure the map is bare (no axes, grids or titles).

library(ggplot2)worldmap <- ggplot() + theme( panel.

background = element_rect(fill = "lightcyan1", color = NA), panel.

grid = element_blank(), axis.

text.

x = element_blank(), axis.

text.

y = element_blank(), axis.

ticks = element_blank(), axis.

title.

x = element_blank(), axis.

title.

y = element_blank())If you display this you’ll find out that the map is entirely cyan.

Why is this?.Because we haven’t specified countries or borders, so the only color is the panel background.

Now we can “zoom in” to Europe by specifying new coordinates for our world map using the coord_fixed() function.

The ratio is the x/y ratio.

europe <- worldmap + coord_fixed(xlim = c(-9, 42.

5), ylim = c(36, 70.

1), ratio = 1.

5)As I alluded to earlier, we’ll be plotting polygons.

Our data comes from the “world” data set, and aesthetic parameters are “rat” for fill — our Erasmus metric — “long” for x, “lat” for y and “group” for group.

The latter is very important: it specifies which coordinates should be joined together to form a polygon!.Color specifies what color we want for our borders.

I choose to the “viridis” package to fill with a colorbar guide on the right hand side (play around with these settings!).

library(viridis)europe2 <- europe + geom_polygon(data = mapbig, aes( fill = rat, x = long, y = lat, group = group ), color = "grey70") + labs(title = "Net Erasmus mobility by country, 2012-2013", subtitle = "How many foreign students are hosted for every student sent abroad?", caption = "Data: EU Open Data Portal") + theme(text = element_text(size = 30), plot.

title = element_text(face = "bold")) + scale_fill_viridis( option = "plasma", direction = -1, name = "", na.

value = "grey80", guide = guide_colorbar( barheight = unit(140, units = "mm"), barwidth = unit(6, units = "mm") ) )europe2NOTE: I deliberately over-scaled the text and guide elements in order to prepare the graph for high-quality PNG export.

That’s it!.We’ve got a beautiful map, which shows us very clearly how Northern and North-Western European countries (Scandinavia, Finland, Estonia and the British Isles) had a significantly positive balance in terms of Erasmus exchanges in 2012–2013.

More surprisingly, Cyprus and Portugal also show a strong positive balance.

Overall student mobility is disproportionately northbound and westbound.

The only thing that needs some fine-tuning is the map projection (surely Finland isn’t that big!)… something for another post?Disclaimer: The code presented above is probably perfectible in many ways.

I just like to play around with data.

Feedback is welcome.

Thanks for reading!Useful resources:Case Studies in Reproducible Research: a spring seminar at UCSCThis is a bookdown that stores materials for a UCSC seminar on reproducible research with R.

The output format for this…eriqande.

github.

ioBeautiful thematic maps with ggplot2 (only)The above choropleth was created with ggplot2 (2.

2.

0) only.

Well, almost.

Of course, you need the usual suspects such…timogrossenbacher.

ch.. More details

Leave a Reply