Getting started with Geographic Data Science in Python — Part 2

Getting started with Geographic Data Science in Python — Part 2Tutorials, Real World projects & ExercisesAbdishakurBlockedUnblockFollowFollowingMay 22This is the Second article of a three-part series of articles in Getting started Geographic Data Science with Python.

You will learn about reading, manipulating and analysing Geographic data in Python.

The articles in this series are designed to be sequential where the first article lays the foundation and the second one gets into intermediate and advanced level Geographic data science topics.

The third part covers a relevant and real-world project wrapping up to cement your learning.

The first article can be accessed here.

Master Geographic Data Science with Real World projects & ExercisesReal World projects & Exercisestowardsdatascience.

comLearning Objectives for this tutorial are:1.

Understand GeodataFrames and Geoseries2.

Perform Table join and Spatial Join3.

Carry out the buffer and overlay analysis1.

GeodataFrame & GeoseriesYou have already seen how to load geographic data with Geopandas.

Once you load the data, what we get is a table with geographic geometries.

The geographic geometries allow us to perform spatial operations in addition to the typical tabular data analysis in pandas or simple excel.

If you have more than one column, it is called a Geodataframe.

If it contains only one single column (1 Dimensional), then it is called Geoseries.

This is similar to Pandas Dataframe and Series if you are familiar.

Let us see how they are different.

Let us read the countries and cites dataset again.

DataFrame vs.


A GeoDataFrame is a tabular data structure that contains a GeoSeries.

The most important property of a GeoDataFrame is that it always has one GeoSeries column that holds a special status.

This GeoSeries is referred to as the GeoDataFrame’s “geometry”.

When a spatial method is applied to a GeoDataFrame (or a spatial attribute like Area is called), this commands will always act on the “geometry” column.

If you have more than one column, you have either a dataFrame or GeodataFrame.

If One of the columns is a Geometry Column, then it is called a GeoeDataFrame.

Otherwise, it is a DataFrame if any of the columns is not a geometry column.

Similarly, One column means you have either a Series or Geoseries data type.

If the only column is the Geometry column, then it is called Geoseries.

Let us see an example of each data type.

We start with Dataframe.

We have only two columns here and none of them is a Geometry column, therefore, the type of this data will be a dataframe and the output of the type function is pandas.




If we happen to have any geometry column in our table, then it will be a Geodatframe as below.

Similarly, a Geoseries is when we have a single Geometry column and Series datatype will be when this one column is not a geometry column as shown below.

This will yield pandas.



Series and geopandas.


GeoSeries respectively.

With GeoDataFrame/GeoSeries you can carry out geographic processing tasks.

So far we have seen few including .

plot() .

Another example is getting centriods of polygons.

Let us get each country’s centroid and plot it.

And this is how the plot looks like, each point represents the country’s center.

country centroidExercise 1.

1: Create a union of all polygon geometries (Countries).

Hint use (.

unary_union)Exercise 1.

2: calculate the area of each country.

Hint use (.


Table Join vs.

Spatial joinTable joins is classical query operation where we have two separate tables, for example, sharing one column.

In that case, you can perform a table join where the two tables are joined using the shared column.

On the other hand, spatial join relates to geographic operations, for example, joining by location each city and its country.

We will see both examples below.

We can join/merge the two tables based on their shared column NAME.

This is pure pandas operation and does not entail any geographic operations.

However, in spatial join, the merging entails a geographic operation.

We will perform an example of a spatial join.

We want to join the following two tables based on their locations.

For example, which country does contain which city or which city is within which country.

We will use Geopandas function .

sjoin() to do the spatial join and show a sample of 5 rows.

As you can see from the below table, each city is matched with its corresponding country based on the location.

We have used op=within which takes city points that are within a countries polygon.

Here we could also use intersect.

Also, we could use op=contain and find out which countries contain the city points.

spatial joined table3.

Buffer AnalysisBuffer analysis is an important geoprocessing task.

It is used widely in many domains to get a distance around a point.

In this example, we will first get a city in Sweden and then do a buffer around it.

One tricky thing here is you need to know which CRS/projection you are using to get the correct output you want.

If your data is not projected into projection where meters are used, then the output will not be in meters.

This is a classical error in the world of Geodata.

I have used this resource to find out which CRS Sweden has in meters.

SWEREF99 TM: EPSG Projection — Spatial ReferenceHome | Upload Your Own | List user-contributed references | List all referencesspatialreference.

orgWe use here 3 different buffer distances, 100, 200, and 500 on a single point, Stockholm city.

Then we plot the result to show the concept of buffering.

buffer Stockholm city exampleExercise 3.

1: Create a buffer of all cities.

Try different projections and different distances.

OverlayWe sometimes need to create new features out of different data types like Points, Lines and Polygons.

Set operations or Overlays play an important role here.

We will be using the same dataset but instead of reading it from our unzipped folder we can use built-in dataset reading mechanism in Geopandas.

This example comes from Geopandas documentation.

We can subset data to select only Africa.

AfricaTo illustrate the overlay function, consider the following case in which one wishes to identify the “core” portion of each country — defined as areas within 500km of a capital — using a GeoDataFrame of Africa and a GeoDataFrame of capitals.

To select only the portion of countries within 500km of a capital, we specify the how option to be “intersect”, which creates a new set of polygons where these two layers overlap:Africa core overlayChanging the “how” option allows for different types of overlay operations.

For example, if we were interested in the portions of countries far from capitals (the peripheries), we would compute the difference between the two.

ConclusionThis tutorial covered some geoprocessing task in Geographic data using Geopandas.

First, we studied differences between dataframe and Geodataframe followed by exploring spatial join.

We have also done buffer analysis as well as Overlay analysis.

In the next tutorial, we will apply what we have learned in this and preceding part in a project.

The code is available in this GitHub repository:shakasom/GDSGeographic data science tutorials series.

Contribute to shakasom/GDS development by creating an account on GitHub.


comYou can also go directly and run Google Collaboraty Jupyter Notebooks directly from this link:shakasom/GDSGeographic data science tutorials series.

Contribute to shakasom/GDS development by creating an account on GitHub.


com.. More details

Leave a Reply