Create a Python map from categorical data

Create a Python map from categorical dataZhongTr0nBlockedUnblockFollowFollowingMay 22There are a lot of great posts on building geographical maps using Python like this one.

However, all of these posts use choropleths for continuous data, meaning a gradient with a colorbar is used to fill the map.

As I couldn’t find a lot of information on how to build a map with categorical information, I decided to try and build something myself that turned out like this:Map from categorical data, built with Python.

RequirementsIn order to build this map you will need the following libraries:Pandas and NumPy (to work with dataframes)Geopandas (to work with shapefiles)Matplotlib (to plot charts)Prepare the dataGet the dataWhen creating geopandas maps you always need two files.

The first file, a shapefile, contains polygons to construct an empty map with boundaries.

You can find countless of shapefiles online for different regions and boundaries.

For this project I used a New York City shapefile with boroughs as boundaries.

fp = ‘geo_export_58b25928–032c-45c8-a41f-8345f112f4d4.

shp’map_df = gpd.

read_file(fp)The shapefile data.

Once the data is imported, you can already preview what the map looks like without without filling it with data.


plot()A plot of the NYC shapefile.

Now let’s use our data to color code the different boroughs.

The second file will be the actual data, meaning containing the values you will use to fill the map.

In this case I used this file containing 364.

558 311-service-requests in New York City.

df = pd.


csv’)In this file, each row represents a request (or complaint).

However, I want to see the most popular type of complaint by borough.

By using a groupby I can create this overview:df_top_compl = df.

groupby(‘Borough’)[‘Complaint Type’].

apply(lambda x: x.



reset_index()Most frequent complaint by boroughMerge the dataNow we have both files, we can merge them together so we have a combination of geodata and fill data (complaints).

As both dataframes contain boroughs, we can merge based on this column.

However in the datafile the boroughs are written in uppercase, so we will need to apply this to the shapefile too.

map_df[‘Borough’] = map_df[‘boro_name’].

apply(lambda x: x.

upper())Now both files now have a column with similar values, we can merge them on said column.

(And just in case there are boroughs in the shapefile without a matching filler data, I also added a fillna() value.

)data_df = pd.

merge(map_df, df_top_compl, how=’left’, on=’Borough’)data_df[‘level_1’].

fillna(value=’No data’, inplace=True)Now all the data is ready, we can create the map!Creating the mapPreparing the colorsBefore we create the actual map, we will do some preparations for the legend.

Given the code should be dynamic in case the data changes, we can not use a fixed number of legend colors.

So what we will do is create a dictionary of filler values with a corresponding color from a palette (in this case matplotlib tab20b), with the exception of the ‘No data’ value, which by convention should always be gray.

keys = list(data_df[‘level_1’].

unique())color_range = list(np.

linspace(0, 1, len(keys), endpoint=False))colors = [cm.

tab20b(x) for x in color_range]color_dict = dict(zip(keys, colors))color_dict[‘No data’] = ‘lightgray’Preparing the gridSince we need to plot both a legend and a map, we will split the graph in two columns; left for legend, right for map.

The left column for the legend, will have to be split up again in rows: one for each value and corresponding color.

But again, since the project should be dynamic we can not use a fixed number of rows, it should adapt to the number of filler values.

Using the code below, we can create a list of axes based on the number of unique filler values.

row_count = data_df[‘level_1’].

nunique()ax_list = []for i in range(row_count+1): ax_list.

append(‘ax’ + str(i+1))ax_string = ‘, ‘.

join(ax_list)Based of this string and list of axes names, we can create the plotgrid with 4 columns and N number of rows.

fig, (ax_string) = plt.

subplots(row_count, 4)From these 4 columns, we will use the first for the legend, and the other 3 for the actual plot (using colspan).

For example, in case we would have 5 rows (unique values to plot), the grid would look like this:Subplotgrid mockup.

Plotting the mapThis is where the difference with a ‘traditional’ choropleth with continious data comes in.

Instead of plotting one map, we will use a loop to stack different maps on top of eachother to create the final result.

Each layer consists of one borough with a color corresponding to its value, using the color dictionary we built earlier.

ax1 = plt.

subplot2grid((row_count,4),(0,1), rowspan=row_count, colspan=3)for index, row in data_df.

iterrows(): plot = data_df[data_df[‘boro_code’] == row[‘boro_code’]].

plot(color=color_dict[row[‘level_1’]], ax=ax1) ax1.

axis(‘off’)Map without legend.

The map looks great, but is obviously useless without a corresponding legend.

So let’s add that part too.

Plotting the legendNow let’s add a legend in the left column.

Since we built the map using layers, we will also need to build the legend in a less conventional way.

The grid is already finished, containing one row for each legend value we will plot.

Now in each row will put a coloured circle (using a small pie chart) and a string with the corresponding value.

row_counter = 0for i in data_df[‘level_1’].

unique():  plt.

subplot2grid((row_count,4),(row_counter,0)) plt.


4, colors=[color_dict[i]]) plt.

axis(‘off’) row_counter += 1Final result.

There it is!.A nice map with categorical data and a corresponding legend.

In case the data change would, you can simply rerun the script and both the colors and the number of values in the legend will neatly adapt.

Ofcourse, you could apply this to any shapefile and data you can find, as long as you can merge both files.

You can view and download my Jupyter Notebook for this tutorial on my GitHub.

Feel free to ask questions if something is unclear.

Finally, I would like to thank Benjamin Cooley whose post I used as an inspiration to create this addition.

As this is my first post on Medium, any comments or feedback would be greatly appreciated.

About me: My name is Bruno and I work as a data scientist with Dashmote, an AI technology scale-up headquartered in The Netherlands.

Our goal is bridging the gap between images and data thanks to AI-based solutions.

.. More details

Leave a Reply